Files
vreau-digital/services/seap-scraper/AFIR-HISTORICAL-PLAN.md
T
Claude VM a6c03a091e initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
2026-05-13 00:10:32 +03:00

8.9 KiB
Raw Blame History

AFIR Historical Backfill — Plan & Status

Current state (2026-05-09)

source_year rows distinct beneficiars sum UE (EUR) fund
2023 474,720 320,230 1,411,870,796 FEADR
2024 563,310 316,304 1,373,722,134 FEADR
Total 1,037,930 ~2.79 mld EUR FEADR

Schema: fonduri.afir_plati (migration 017_fonduri_afir.sql). Importer: cron/import-afir-historical.sh + scripts/import-afir-historical.py.

Source survey

AFIR official portal — https://www.afir.ro/rapoarte/beneficiari-de-fonduri-europene/

Two complementary pages:

  1. /date-deschise/ — only the most recent two years are linked.
    • Currently exposes 2023 + 2024 for FEADR (xlsx) and 2023 + 2024 for FEGA (rar).
  2. /beneficiari-fega-si-feadr/ — ASP.NET portal at https://plati.afir.info/Plati/AfisareListaPlatii. Year selector currently exposes only 2023 and 2024. 3.7M total records in the live query interface but no programmatic XLSX dump older than 2023.

data.gov.ro CKAN — searched q=afir, q=fega, q=apia, q=feadr

Findings (relevant package IDs only):

Dataset URL Notes
Date privind proiectele PNDR (a2884dcf-…) proiectepndr2020.csv (2014-2020), proiectepndr2013.csv (2007-2013) Project-level, not payment-level. Useful for joining contracts/projects but does not replace plati. Worth ingesting separately.
Contracte AFIR (8845aa0d-…) contracte-achizitii-publice-peste-5000-euro-2000.xlsx, centralizator-…2021_2022.xlsx Procurement contracts >5K EUR run by AFIR itself; not beneficiary payments. Different schema.
Lista Fermierilor Campania APIA 2024 (39e5465d-…) lista-fermieri-apia-2024.xlsx One-off small dataset; APIA campaign list.
Parcele Agricole APIA LPIS 2025 etc. shapefiles (.zip) Geographic parcels, not payments. Useful later for map overlays.

Conclusion: data.gov.ro does not have listaplati_2020/2021/2022_* payment dumps. They exist nowhere public.

opendata.afir.info

A separate CKAN-style portal (http://opendata.afir.info/) lists ProiectePNDR2020 (53K views), ProiectePS2027, AchizitiiPrivate2020. The page itself doesn't expose direct download URLs without account login. Worth investigating in next session — it may contain the 2020-2022 payment data behind an export interface.

Importer architecture

Pipeline (FEADR XLSX)

AFIR XLSX  ──curl──▶  satra:/tmp/afir-historical-{YEAR}-{FUND}/
            │
            ▼
       openpyxl read_only  (skips 9 banner rows)
            │
            ▼
       pipe-delimited TSV  (RO decimals "12.345,67" → "12345.67")
            │
            ▼
       \\copy → fonduri.staging_afir
            │
            ▼
       DELETE FROM afir_plati WHERE source_year=YEAR  (idempotent)
            │
            ▼
       INSERT INTO afir_plati (source_year=YEAR, NULLIF + ::numeric casts)

Why pipe delimiter

Beneficiar names contain commas ("FULOP ZOLTAN, GERGELY"), Obiectiv contains both , and quote chars. Pipe is safer than comma + quoting and the loader already replaces any literal | in source text with / before serialization.

Idempotency

DELETE WHERE source_year = N runs only on full ingests (not when LIMIT is set for smoke tests). Re-running for the same year is safe and produces consistent counts.

Smoke test mode

./import-afir-historical.sh URL YEAR feadr 1000

The 4th arg (LIMIT) skips the DELETE step and truncates the TSV to N rows before COPY, so you can validate end-to-end without trampling production data.

Next-session work

1. FEGA ingest (HIGHEST IMPACT, 30-60 min)

Volume: 2,476,897 rows in 2023 alone, ~580 MB CSV inside 23 MB RAR. Source URLs:

  • 2023: https://www.afir.ro/media/sxcnuvwc/listaplati_2023_fega_corectat.rar
  • 2024: https://www.afir.ro/media/dqjddti2/lista-plati-beneificiari-fega-2024.rar

Schema differences vs FEADR XLSX (column-by-column):

FEADR XLSX (RO header) FEGA CSV (concat header) Notes
Numele beneficiarului DenumireBeneficiar same
Numele de familie NumeFamilie same
Denumirea societatii-mama si codul de inregistrare fiscala Cui FEGA CSV exposes a real CUI column (mostly empty for natural persons, populated for SRL/PFA — bonus enrichment vs FEADR XLSX)
Localitate Localicate (typo in source) same content
Codul masurii/tipului de interventie Masura same; FEGA codes look like MICA / scheme acronyms instead of M 06 etc
Obiectiv ObiectivSpecific longer descriptions
Data inceperii / Data incheierii DataIncepere / DataSfarsit usually empty
Cuantum {Operatiune,Total} {FEGA,FEADR} same 4 columns decimals already in . format (English-locale, no comma swap needed)
Cuantum aferent operatiunii CuantumAferentOperatiune same
Cuantum total cofinantare beneficiari CuantumTotalCofinantareBeneficiar same
Cuantum total UE Beneficiar CuantumtotalUEBenefeciar (typo in source) same

Implementation choices:

Option A — augment afir_plati with tip_fond discriminator. Add ALTER TABLE fonduri.afir_plati ADD COLUMN tip_fond text CHECK (tip_fond IN ('FEADR','FEGA')); Re-tag existing rows as 'FEADR'. Importer writes both. Uniform downstream query.

Option B — separate table fonduri.fega_plati. Different cardinality (5x rows), different measure code namespace; some queries naturally separate. But duplicates the index/MV maintenance burden.

Recommendation: Option A. The schema is identical, the differences are namespace-of-codes only. A single discriminator keeps things simple, fits the existing gin_trgm name index, and lets the recipe code do WHERE tip_fond='FEGA' cheaply (b-tree on tip_fond if needed).

FEGA importer changes vs current FEADR script:

  1. Download → unrar x (already installed on satra now: apt install unrar was run).
  2. New python normalizer import-afir-historical-fega.py — reads CSV not XLSX; column-name remapping; no RO-decimal swap.
  3. Pass new FUND=fega flag → script writes tip_fond='FEGA' and uses CSV path.
  4. Cui column passthrough — write directly into the existing cui column when non-empty, with cui_match_method='afir_self_reported' and cui_match_score=1.0. Skip fuzzy matcher for these.

Volume budget: 2.48M rows × 2 years = ~5M rows. Same staging table works (TRUNCATE between runs). Postgres COPY @ ~100K rows/s → ~25s/year for COPY, plus ~60s for INSERT. Total ~5 min per year.

2. Historical FEADR 2020/2021/2022 (BLOCKED on source)

Status: not publicly available.

Investigation outcome:

  • AFIR /date-deschise/ page shows only 2023+2024.
  • plati.afir.info portal shows only 2023+2024.
  • data.gov.ro CKAN has no listaplati_<year> resources.

Options to unblock (in order of cost):

  1. Email AFIR directcomunicare@afir.info and request the historical payment lists 2020-2022 under Law 544/2001 (FOIA equivalent). They are legally obligated to provide. Expected: 2-4 week response.
  2. Wayback Machine archive — check https://web.archive.org/web/2023*/afir.ro/rapoarte/beneficiari-de-fonduri-europene/date-deschise/ for snapshots that still link to old XLSX files. URLs may still resolve (AFIR media folder is content-addressed: /media/<hash>/file.xlsx).
  3. opendata.afir.info account — the dataset titles AchizitiiPrivate2020, ProiectePNDR2020 suggest historical exports may live here, but the download interface needs login. Apply for an open-data access account.

Estimated row counts when obtained: ~450K-500K per year (extrapolating from 2023 = 475K and 2024 = 563K).

3. APIA-specific datasets (LOWER PRIORITY)

Lista Fermierilor Campania APIA 2024 (small file, ~50K rows expected). This is a subset of FEGA payments (only certain campaigns), so once FEGA 2024 is ingested, this dataset is partially redundant. Worth ingesting into a separate fonduri.apia_fermieri table only if it carries the geographic columns (parcel codes) the FEGA dump lacks.

Geographic LPIS shapefiles (Parcele Agricole APIA LPIS 2025, Categorii de Folosință) are map data, not payment data — defer to when we add map overlays to /achizitii/firma/[cui] profile pages.

Files modified/added in this session

  • NEW services/seap-scraper/scripts/import-afir-historical.py — XLSX→TSV normalizer
  • NEW services/seap-scraper/cron/import-afir-historical.sh — orchestrator
  • NEW services/seap-scraper/AFIR-HISTORICAL-PLAN.md (this file)

fonduri.afir_plati schema unchanged — no migration. The DELETE+INSERT flow uses the existing table as-is. Adding tip_fond discriminator is a follow-up migration when FEGA ingest is implemented.