Files
vreau-digital/services/seap-scraper/scripts/import-seap-historical.sh
T
Claude VM a6c03a091e initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
2026-05-13 00:10:32 +03:00

82 lines
2.8 KiB
Bash
Executable File

#!/bin/bash
# SEAP historical CSV importer wrapper.
# Downloads a yearly+quarterly resource from data.gov.ro CKAN and imports
# it into seap.announcements via the Python normalizer + psql COPY.
#
# Usage:
# ./import-seap-historical.sh URL TYPE SOURCE [DELETE_FIRST]
# URL: full data.gov.ro CKAN download URL
# TYPE: 'contract' | 'da' | 'initiere' | 'atribuire_fara' | 'modificare'
# SOURCE: tag e.g. 'datagov_2024_t1_contracte'
# DELETE_FIRST: 'yes' to wipe rows tagged with this source before insert
#
# Example:
# bash import-seap-historical.sh \
# 'https://data.gov.ro/dataset/ed.../resource/24a.../download/...t-i-2024.csv' \
# contract datagov_2024_t1_contracte yes
set -euo pipefail
URL="$1"
TYPE="$2"
SOURCE="$3"
DELETE_FIRST="${4:-no}"
WORK=/tmp/seap-historical-$$
mkdir -p "$WORK"
trap "rm -rf $WORK" EXIT
CSV="$WORK/data.csv"
TSV="$WORK/data.tsv"
echo "[import] downloading: $URL"
curl -sk --max-time 600 -L "$URL" -o "$CSV"
echo "[import] downloaded: $(stat -c %s "$CSV") bytes"
echo "[import] normalizing CSV → TSV..."
python3 "$(dirname "$0")/import-seap-historical.py" "$CSV" "$TSV" "$TYPE" "$SOURCE"
# Stage on the DB host
echo "[import] copying TSV to satra..."
scp -q "$TSV" "satra:/tmp/seap-historical.tsv"
DELETE_SQL=""
if [ "$DELETE_FIRST" = "yes" ]; then
DELETE_SQL="DELETE FROM seap.announcements WHERE source = '$SOURCE';"
fi
echo "[import] running insert on satra..."
ssh satra "/tmp/baseline.sh <<SQL
$DELETE_SQL
CREATE TEMP TABLE _stage_seap_hist (
type text, ref_number text, authority_name text, authority_cui text,
cpv_code text, cpv_name text, contract_type text, publication_date text,
contract_date text, awarded_value text, supplier_name text, supplier_cui text,
procedure_type text, legislation text, source text
);
\\COPY _stage_seap_hist FROM '/tmp/seap-historical.tsv' WITH (FORMAT text, DELIMITER E'\\t', HEADER true);
INSERT INTO seap.announcements (
type, ref_number, authority_name, authority_cui, cpv_code, cpv_name,
contract_type, publication_date, contract_date, awarded_value,
supplier_name, supplier_cui, procedure_type, legislation, source
)
SELECT type, ref_number, authority_name, authority_cui, cpv_code, cpv_name,
contract_type,
NULLIF(publication_date, '')::timestamptz,
NULLIF(contract_date, '')::date,
NULLIF(awarded_value, '')::numeric,
supplier_name, supplier_cui, procedure_type, legislation, source
FROM _stage_seap_hist
ON CONFLICT (type, ref_number) DO NOTHING;
SELECT '$SOURCE' AS source, COUNT(*) AS rows,
MIN(publication_date)::date AS oldest,
MAX(publication_date)::date AS newest,
SUM(awarded_value)::bigint AS total_lei
FROM seap.announcements WHERE source = '$SOURCE';
SQL"
ssh satra "rm -f /tmp/seap-historical.tsv"
echo "[import] done."