# APIA — Lista Fermieri (data.gov.ro CKAN ingest) ## Current state (2026-05-10) | metric | value | | --------------------------- | ---------------------------------------------------- | | Schema | `apia.fermieri` + `apia.staging_fermieri` + `apia.scrape_log` + `apia.mv_per_cui` | | Migration | `services/seap-scraper/sql/036_apia.sql` | | Importer (python) | `services/seap-scraper/scripts/import-apia-fermieri.py` | | Importer (bash wrapper) | `services/seap-scraper/cron/import-apia-fermieri.sh` | | Rows ingested | **191** (Găgești, jud. Vaslui, campaign 2024) | | Resources | 1 / 1 discoverable on data.gov.ro | | Comune | 13 (rezident vs. proprietar — Găgești + diaspora) | | Suprafață totală | 1 575,17 ha | | PJ (is_legal_person) | 2 (PFA, SRL) | | CUI matched (firms.entities)| 1 / 2 (50%) — **SC WARDAMA SRL** (CUI 28501796) | | Cross-source AFIR FEGA hits | **1 firmă** (WARDAMA, 2 plăți FEGA, 26.28 EUR) | | Cross-source ANAF datornici | 0 | ## Reality check: data.gov.ro APIA scope The prompt's expectation was 500–700K farmers in a single national XLSX. **That dataset does not exist on CKAN.** The only published "Lista fermieri APIA" XLSX on data.gov.ro covers a single comuna (Găgești, Vaslui, ~192 farmers). ### Why this matters - AFIR's FEGA dump (`fonduri.afir_plati WHERE tip_fond='FEGA'`, **4 290 976 rows for 2023+2024**) is the actual national farmer-payment dataset. APIA "Lista fermieri" publishes **declarations** (suprafață, responsabil UAT, centru APIA) — APIA is the paying agency, AFIR records the actual payments. - The two are complementary, not redundant: - APIA list → "who declared and how many ha" - AFIR FEGA → "who actually got paid and how much" - A future-proof importer that auto-discovers any new `lista-fermieri-*` package on data.gov.ro is what we built. When more UATs publish, re-run and it ingests them automatically (idempotent on `source_resource_id`). ### APIA national-level data (unblocked) The actual national list of beneficiaries lives at https://www.apia.org.ro/ but the site returns HTTP 403 for non-browser User-Agents. **Out of scope for this pass.** Options to unblock (in cost order): 1. **Email APIA direct** — request structured data under Law 544/2001. 2. **Browserless / Playwright scraper** — render JS, fetch the table. Adds infra cost (one more Docker container, captcha risk). 3. **Fall back on AFIR FEGA** — already ingested; covers the question "who got subsidies in 2023/2024" at national scale, just without the suprafață breakdown. ## Schema highlights ```sql CREATE TABLE apia.fermieri ( id bigserial PRIMARY KEY, campaign_year smallint NOT NULL, name text NOT NULL, name_normalized text, cui text, cui_match_method text, -- 'exact_norm' | 'trgm_unique' cui_match_score numeric(4,3), is_legal_person boolean, -- detected from name shape (SRL/SA/PFA/II/IF/SC/COOPERATIVA) judet text, comuna_oras text, sat text, centru_apia text, -- e.g. 'MURGENI' responsabil_uat text, -- UAT employee, not the farmer suprafata_ha numeric(12,4), -- declared hectares (precedent campaign) source_dataset_id text NOT NULL, source_resource_id text NOT NULL, source_url text NOT NULL, fetched_at timestamptz NOT NULL DEFAULT now(), UNIQUE NULLS NOT DISTINCT (campaign_year, name, comuna_oras, sat) ); ``` ### Importer pipeline ``` CKAN package_search?q=lista+fermieri+APIA │ ▼ (jq filter dataset name `lista-fermier*`, format=XLSX) download XLSX on satra (curl) │ ▼ openpyxl read → header detect → pipe-TSV (NR.CRT, NUME PRENUME, RESPONSABIL UAT, COMUNA/ORAS, SAT, CENTRU APIA, SUPRAFATA) │ ▼ TRUNCATE apia.staging_fermieri \\copy apia.staging_fermieri FROM ... pipe-delimited │ ▼ DELETE FROM apia.fermieri WHERE source_resource_id = $RID -- idempotent │ ▼ INSERT ... DISTINCT ON (year, name, comuna, sat) -- in-batch dedupe ON CONFLICT (...) DO UPDATE -- cross-batch dedupe │ ▼ apia.match_cui() -- exact_norm + trgm fallback │ ▼ REFRESH MATERIALIZED VIEW apia.mv_per_cui │ ▼ INSERT INTO apia.scrape_log (rows_seen, rows_inserted, duration_ms, ...) ``` ## Operational ```bash # Full discovery + ingest (default) ./cron/import-apia-fermieri.sh # Specific year ./cron/import-apia-fermieri.sh 2024 # Smoke test (only first resource) ./cron/import-apia-fermieri.sh 2024 1 ``` Idempotent: re-running re-deletes by `source_resource_id` and re-inserts. Safe to put on a monthly cron — new UAT publications are picked up automatically. ## Cross-source recipes ### 1. "Fermier (PJ) primește subvenții și are datorii la stat" ```sql SELECT f.name, f.cui, f.comuna_oras, f.suprafata_ha AS ha_declarate, d.suma_datorata_lei FROM apia.fermieri f JOIN anaf.datornici d ON d.cui = f.cui ORDER BY d.suma_datorata_lei DESC NULLS LAST; -- Currently: 0 hits (only 1 PJ matched in this dataset). Will scale with more UATs. ``` ### 2. "Fermier APIA × FEGA AFIR plăți reale" ```sql SELECT f.name, f.cui, f.comuna_oras, f.suprafata_ha AS ha_declarate_apia, COUNT(a.id) AS plati_fega, ROUND(SUM(a.ue_total)::numeric, 2) AS total_fega_eur, ROUND((SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0))::numeric, 2) AS eur_per_ha FROM apia.fermieri f JOIN fonduri.afir_plati a ON a.cui = f.cui AND a.tip_fond = 'FEGA' GROUP BY f.name, f.cui, f.comuna_oras, f.suprafata_ha ORDER BY total_fega_eur DESC; -- Validated: SC WARDAMA SRL (28501796) → 2 plăți FEGA, 26.28 EUR pentru 1.1 ha. ``` ### 3. "Outlier EUR/ha — fermă cu plăți disproporționate" ```sql SELECT * FROM ( SELECT f.name, f.cui, f.suprafata_ha, SUM(a.ue_total) AS total_fega_eur, SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0) AS eur_per_ha FROM apia.fermieri f JOIN fonduri.afir_plati a ON a.cui = f.cui AND a.tip_fond = 'FEGA' GROUP BY f.name, f.cui, f.suprafata_ha ) x WHERE eur_per_ha > 500 ORDER BY eur_per_ha DESC LIMIT 50; -- Threshold 500 EUR/ha is high for plăți FEGA directe (~150-300 EUR/ha standard); -- > 500 = atipic (cuplate cu măsuri de mediu sau scheme speciale). ``` ### 4. "Fermier (PF) cu suprafață mare în mai multe comune" ```sql SELECT name, array_agg(DISTINCT comuna_oras) AS comune, COUNT(*) AS aparitii, SUM(suprafata_ha) AS total_ha FROM apia.fermieri WHERE is_legal_person IS NOT TRUE GROUP BY name HAVING COUNT(*) > 1 ORDER BY total_ha DESC; -- Detectează "fermieri-fantomă" cu același nume în mai multe UAT-uri. ``` ### 5. "Cross UAT — responsabili APIA cu cele mai multe ferme" ```sql SELECT responsabil_uat, centru_apia, COUNT(*) AS n_ferme, SUM(suprafata_ha) AS ha_totale FROM apia.fermieri WHERE responsabil_uat IS NOT NULL GROUP BY responsabil_uat, centru_apia ORDER BY ha_totale DESC NULLS LAST; -- Operational view — cine la APIA gestionează ce volum. ``` ## Files added in this pass - **NEW** `services/seap-scraper/sql/036_apia.sql` - **NEW** `services/seap-scraper/scripts/import-apia-fermieri.py` - **NEW** `services/seap-scraper/cron/import-apia-fermieri.sh` - **NEW** `services/seap-scraper/APIA-PLAN.md` (this file) No edits to `lib/`, `pages/`, or any existing scraper. Slot 036 chosen to avoid collision with parallel agents who picked 035 for Curtea de Conturi and GNM (Garda Mediu). 022/023 remain reserved by other parallel agents. ## Next steps (low priority until more data) 1. **Watch CKAN for new resources** — set up monthly cron to re-run discovery. 2. **Browserless scraper for apia.org.ro** — only worth it if national lists are needed for a specific recipe page. Otherwise FEGA covers the same question at national scale. 3. **Geographic enrichment** — the LPIS shapefiles (`Parcele Agricole APIA LPIS 2025`) could overlay on a map view of /achizitii/firma/[cui]; defer to map-feature work. 4. **judet field population** — currently NULL. When more UATs ingest, derive from centru_apia mapping (centre APIA → judet is 1-N but enumerable).