initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix). - 22 pages migrated, 127 files total - All internal links: /achizitii/X → /X (176 occurrences fixed) - AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub - BaseLayout new (vreau.digital branding, OG tags, site URL) - astro.config.mjs: site https://vreau.digital, server output (was static) - docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital - deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log Backend shared with gov-agreg: - PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...) - Photon, Martin tiles - Infisical /vreaudigital path (DATABASE_URL etc. shared) build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
This commit is contained in:
@@ -0,0 +1,186 @@
|
||||
# AFIR Historical Backfill — Plan & Status
|
||||
|
||||
## Current state (2026-05-09)
|
||||
|
||||
| source_year | rows | distinct beneficiars | sum UE (EUR) | fund |
|
||||
|-------------|---------|----------------------|-----------------|-------|
|
||||
| **2023** | 474,720 | 320,230 | 1,411,870,796 | FEADR |
|
||||
| **2024** | 563,310 | 316,304 | 1,373,722,134 | FEADR |
|
||||
| **Total** | 1,037,930 | — | ~2.79 mld EUR | FEADR |
|
||||
|
||||
Schema: `fonduri.afir_plati` (migration `017_fonduri_afir.sql`).
|
||||
Importer: `cron/import-afir-historical.sh` + `scripts/import-afir-historical.py`.
|
||||
|
||||
## Source survey
|
||||
|
||||
### AFIR official portal — `https://www.afir.ro/rapoarte/beneficiari-de-fonduri-europene/`
|
||||
|
||||
Two complementary pages:
|
||||
|
||||
1. **`/date-deschise/`** — only the most recent two years are linked.
|
||||
- Currently exposes 2023 + 2024 for **FEADR (xlsx)** and 2023 + 2024 for **FEGA (rar)**.
|
||||
2. **`/beneficiari-fega-si-feadr/`** — ASP.NET portal at
|
||||
`https://plati.afir.info/Plati/AfisareListaPlatii`. Year selector
|
||||
currently exposes **only 2023 and 2024**. 3.7M total records in the live
|
||||
query interface but no programmatic XLSX dump older than 2023.
|
||||
|
||||
### data.gov.ro CKAN — searched `q=afir`, `q=fega`, `q=apia`, `q=feadr`
|
||||
|
||||
Findings (relevant package IDs only):
|
||||
|
||||
| Dataset | URL | Notes |
|
||||
|---|---|---|
|
||||
| `Date privind proiectele PNDR` (`a2884dcf-…`) | `proiectepndr2020.csv` (2014-2020), `proiectepndr2013.csv` (2007-2013) | **Project-level, not payment-level.** Useful for joining contracts/projects but does not replace plati. Worth ingesting separately. |
|
||||
| `Contracte AFIR` (`8845aa0d-…`) | `contracte-achizitii-publice-peste-5000-euro-2000.xlsx`, `centralizator-…2021_2022.xlsx` | Procurement contracts >5K EUR run by AFIR itself; not beneficiary payments. Different schema. |
|
||||
| `Lista Fermierilor Campania APIA 2024` (`39e5465d-…`) | `lista-fermieri-apia-2024.xlsx` | One-off small dataset; APIA campaign list. |
|
||||
| `Parcele Agricole APIA LPIS 2025` etc. | shapefiles (.zip) | Geographic parcels, not payments. Useful later for map overlays. |
|
||||
|
||||
**Conclusion**: data.gov.ro does **not** have `listaplati_2020/2021/2022_*` payment dumps. They exist nowhere public.
|
||||
|
||||
### opendata.afir.info
|
||||
|
||||
A separate CKAN-style portal (`http://opendata.afir.info/`) lists `ProiectePNDR2020` (53K views), `ProiectePS2027`, `AchizitiiPrivate2020`. The page itself doesn't expose direct download URLs without account login. **Worth investigating in next session** — it may contain the 2020-2022 payment data behind an export interface.
|
||||
|
||||
## Importer architecture
|
||||
|
||||
### Pipeline (FEADR XLSX)
|
||||
|
||||
```
|
||||
AFIR XLSX ──curl──▶ satra:/tmp/afir-historical-{YEAR}-{FUND}/
|
||||
│
|
||||
▼
|
||||
openpyxl read_only (skips 9 banner rows)
|
||||
│
|
||||
▼
|
||||
pipe-delimited TSV (RO decimals "12.345,67" → "12345.67")
|
||||
│
|
||||
▼
|
||||
\\copy → fonduri.staging_afir
|
||||
│
|
||||
▼
|
||||
DELETE FROM afir_plati WHERE source_year=YEAR (idempotent)
|
||||
│
|
||||
▼
|
||||
INSERT INTO afir_plati (source_year=YEAR, NULLIF + ::numeric casts)
|
||||
```
|
||||
|
||||
### Why pipe delimiter
|
||||
|
||||
Beneficiar names contain commas (`"FULOP ZOLTAN, GERGELY"`), Obiectiv contains
|
||||
both `,` and quote chars. Pipe is safer than comma + quoting and the loader
|
||||
already replaces any literal `|` in source text with `/` before serialization.
|
||||
|
||||
### Idempotency
|
||||
|
||||
`DELETE WHERE source_year = N` runs only on full ingests (not when
|
||||
`LIMIT` is set for smoke tests). Re-running for the same year is safe and
|
||||
produces consistent counts.
|
||||
|
||||
### Smoke test mode
|
||||
|
||||
```
|
||||
./import-afir-historical.sh URL YEAR feadr 1000
|
||||
```
|
||||
|
||||
The 4th arg (LIMIT) skips the DELETE step and truncates the TSV to N rows
|
||||
before COPY, so you can validate end-to-end without trampling production
|
||||
data.
|
||||
|
||||
## Next-session work
|
||||
|
||||
### 1. FEGA ingest (HIGHEST IMPACT, 30-60 min)
|
||||
|
||||
**Volume**: 2,476,897 rows in 2023 alone, ~580 MB CSV inside 23 MB RAR.
|
||||
**Source URLs**:
|
||||
- 2023: `https://www.afir.ro/media/sxcnuvwc/listaplati_2023_fega_corectat.rar`
|
||||
- 2024: `https://www.afir.ro/media/dqjddti2/lista-plati-beneificiari-fega-2024.rar`
|
||||
|
||||
**Schema differences vs FEADR XLSX** (column-by-column):
|
||||
|
||||
| FEADR XLSX (RO header) | FEGA CSV (concat header) | Notes |
|
||||
|---|---|---|
|
||||
| Numele beneficiarului | `DenumireBeneficiar` | same |
|
||||
| Numele de familie | `NumeFamilie` | same |
|
||||
| Denumirea societatii-mama si codul de inregistrare fiscala | `Cui` | **FEGA CSV exposes a real CUI column** (mostly empty for natural persons, populated for SRL/PFA — bonus enrichment vs FEADR XLSX) |
|
||||
| Localitate | `Localicate` *(typo in source)* | same content |
|
||||
| Codul masurii/tipului de interventie | `Masura` | same; FEGA codes look like `MICA` / scheme acronyms instead of `M 06` etc |
|
||||
| Obiectiv | `ObiectivSpecific` | longer descriptions |
|
||||
| Data inceperii / Data incheierii | `DataIncepere` / `DataSfarsit` | usually empty |
|
||||
| Cuantum {Operatiune,Total} {FEGA,FEADR} | same 4 columns | **decimals already in `.` format** (English-locale, no comma swap needed) |
|
||||
| Cuantum aferent operatiunii | `CuantumAferentOperatiune` | same |
|
||||
| Cuantum total cofinantare beneficiari | `CuantumTotalCofinantareBeneficiar` | same |
|
||||
| Cuantum total UE Beneficiar | `CuantumtotalUEBenefeciar` *(typo in source)* | same |
|
||||
|
||||
**Implementation choices**:
|
||||
|
||||
Option A — **augment afir_plati with `tip_fond` discriminator**.
|
||||
Add `ALTER TABLE fonduri.afir_plati ADD COLUMN tip_fond text CHECK (tip_fond IN ('FEADR','FEGA'));`
|
||||
Re-tag existing rows as `'FEADR'`. Importer writes both. Uniform downstream query.
|
||||
|
||||
Option B — **separate table `fonduri.fega_plati`**.
|
||||
Different cardinality (5x rows), different measure code namespace; some
|
||||
queries naturally separate. But duplicates the index/MV maintenance burden.
|
||||
|
||||
**Recommendation: Option A**. The schema is identical, the differences are
|
||||
namespace-of-codes only. A single discriminator keeps things simple, fits
|
||||
the existing `gin_trgm` name index, and lets the recipe code do
|
||||
`WHERE tip_fond='FEGA'` cheaply (b-tree on tip_fond if needed).
|
||||
|
||||
**FEGA importer changes vs current FEADR script**:
|
||||
1. Download → `unrar x` (already installed on satra now: `apt install unrar` was run).
|
||||
2. New python normalizer `import-afir-historical-fega.py` — reads CSV not XLSX; column-name remapping; *no* RO-decimal swap.
|
||||
3. Pass new `FUND=fega` flag → script writes `tip_fond='FEGA'` and uses CSV path.
|
||||
4. **Cui column passthrough** — write directly into the existing `cui` column
|
||||
when non-empty, with `cui_match_method='afir_self_reported'` and
|
||||
`cui_match_score=1.0`. Skip fuzzy matcher for these.
|
||||
|
||||
**Volume budget**: 2.48M rows × 2 years = ~5M rows. Same staging table
|
||||
works (TRUNCATE between runs). Postgres COPY @ ~100K rows/s → ~25s/year
|
||||
for COPY, plus ~60s for INSERT. Total ~5 min per year.
|
||||
|
||||
### 2. Historical FEADR 2020/2021/2022 (BLOCKED on source)
|
||||
|
||||
Status: **not publicly available.**
|
||||
|
||||
Investigation outcome:
|
||||
- AFIR `/date-deschise/` page shows only 2023+2024.
|
||||
- `plati.afir.info` portal shows only 2023+2024.
|
||||
- data.gov.ro CKAN has no `listaplati_<year>` resources.
|
||||
|
||||
**Options to unblock** (in order of cost):
|
||||
|
||||
1. **Email AFIR direct** — `comunicare@afir.info` and request the historical
|
||||
payment lists 2020-2022 under Law 544/2001 (FOIA equivalent). They are
|
||||
legally obligated to provide. Expected: 2-4 week response.
|
||||
2. **Wayback Machine archive** — check
|
||||
`https://web.archive.org/web/2023*/afir.ro/rapoarte/beneficiari-de-fonduri-europene/date-deschise/`
|
||||
for snapshots that still link to old XLSX files. URLs may still resolve
|
||||
(AFIR media folder is content-addressed: `/media/<hash>/file.xlsx`).
|
||||
3. **opendata.afir.info account** — the dataset titles `AchizitiiPrivate2020`,
|
||||
`ProiectePNDR2020` suggest historical exports may live here, but the
|
||||
download interface needs login. Apply for an open-data access account.
|
||||
|
||||
**Estimated row counts when obtained**: ~450K-500K per year (extrapolating
|
||||
from 2023 = 475K and 2024 = 563K).
|
||||
|
||||
### 3. APIA-specific datasets (LOWER PRIORITY)
|
||||
|
||||
`Lista Fermierilor Campania APIA 2024` (small file, ~50K rows expected).
|
||||
This is a *subset* of FEGA payments (only certain campaigns), so once FEGA
|
||||
2024 is ingested, this dataset is partially redundant. Worth ingesting
|
||||
into a separate `fonduri.apia_fermieri` table only if it carries the
|
||||
geographic columns (parcel codes) the FEGA dump lacks.
|
||||
|
||||
Geographic LPIS shapefiles (`Parcele Agricole APIA LPIS 2025`,
|
||||
`Categorii de Folosință`) are **map data**, not payment data — defer to
|
||||
when we add map overlays to /achizitii/firma/[cui] profile pages.
|
||||
|
||||
## Files modified/added in this session
|
||||
|
||||
- **NEW** `services/seap-scraper/scripts/import-afir-historical.py` — XLSX→TSV normalizer
|
||||
- **NEW** `services/seap-scraper/cron/import-afir-historical.sh` — orchestrator
|
||||
- **NEW** `services/seap-scraper/AFIR-HISTORICAL-PLAN.md` (this file)
|
||||
|
||||
`fonduri.afir_plati` schema unchanged — no migration. The DELETE+INSERT
|
||||
flow uses the existing table as-is. Adding `tip_fond` discriminator is
|
||||
a follow-up migration when FEGA ingest is implemented.
|
||||
Reference in New Issue
Block a user