Files
Claude VM a6c03a091e initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
2026-05-13 00:10:32 +03:00

8.6 KiB
Raw Permalink Blame History

APIA — Lista Fermieri (data.gov.ro CKAN ingest)

Current state (2026-05-10)

metric value
Schema apia.fermieri + apia.staging_fermieri + apia.scrape_log + apia.mv_per_cui
Migration services/seap-scraper/sql/036_apia.sql
Importer (python) services/seap-scraper/scripts/import-apia-fermieri.py
Importer (bash wrapper) services/seap-scraper/cron/import-apia-fermieri.sh
Rows ingested 191 (Găgești, jud. Vaslui, campaign 2024)
Resources 1 / 1 discoverable on data.gov.ro
Comune 13 (rezident vs. proprietar — Găgești + diaspora)
Suprafață totală 1 575,17 ha
PJ (is_legal_person) 2 (PFA, SRL)
CUI matched (firms.entities) 1 / 2 (50%) — SC WARDAMA SRL (CUI 28501796)
Cross-source AFIR FEGA hits 1 firmă (WARDAMA, 2 plăți FEGA, 26.28 EUR)
Cross-source ANAF datornici 0

Reality check: data.gov.ro APIA scope

The prompt's expectation was 500700K farmers in a single national XLSX. That dataset does not exist on CKAN. The only published "Lista fermieri APIA" XLSX on data.gov.ro covers a single comuna (Găgești, Vaslui, ~192 farmers).

Why this matters

  • AFIR's FEGA dump (fonduri.afir_plati WHERE tip_fond='FEGA', 4 290 976 rows for 2023+2024) is the actual national farmer-payment dataset. APIA "Lista fermieri" publishes declarations (suprafață, responsabil UAT, centru APIA) — APIA is the paying agency, AFIR records the actual payments.
  • The two are complementary, not redundant:
    • APIA list → "who declared and how many ha"
    • AFIR FEGA → "who actually got paid and how much"
  • A future-proof importer that auto-discovers any new lista-fermieri-* package on data.gov.ro is what we built. When more UATs publish, re-run and it ingests them automatically (idempotent on source_resource_id).

APIA national-level data (unblocked)

The actual national list of beneficiaries lives at https://www.apia.org.ro/ but the site returns HTTP 403 for non-browser User-Agents. Out of scope for this pass. Options to unblock (in cost order):

  1. Email APIA direct — request structured data under Law 544/2001.
  2. Browserless / Playwright scraper — render JS, fetch the table. Adds infra cost (one more Docker container, captcha risk).
  3. Fall back on AFIR FEGA — already ingested; covers the question "who got subsidies in 2023/2024" at national scale, just without the suprafață breakdown.

Schema highlights

CREATE TABLE apia.fermieri (
  id                   bigserial PRIMARY KEY,
  campaign_year        smallint NOT NULL,
  name                 text NOT NULL,
  name_normalized      text,
  cui                  text,
  cui_match_method     text,                  -- 'exact_norm' | 'trgm_unique'
  cui_match_score      numeric(4,3),
  is_legal_person      boolean,               -- detected from name shape (SRL/SA/PFA/II/IF/SC/COOPERATIVA)
  judet                text,
  comuna_oras          text,
  sat                  text,
  centru_apia          text,                  -- e.g. 'MURGENI'
  responsabil_uat      text,                  -- UAT employee, not the farmer
  suprafata_ha         numeric(12,4),         -- declared hectares (precedent campaign)
  source_dataset_id    text NOT NULL,
  source_resource_id   text NOT NULL,
  source_url           text NOT NULL,
  fetched_at           timestamptz NOT NULL DEFAULT now(),
  UNIQUE NULLS NOT DISTINCT (campaign_year, name, comuna_oras, sat)
);

Importer pipeline

CKAN package_search?q=lista+fermieri+APIA
       │
       ▼  (jq filter dataset name `lista-fermier*`, format=XLSX)
  download XLSX on satra (curl)
       │
       ▼
  openpyxl read → header detect → pipe-TSV
  (NR.CRT, NUME PRENUME, RESPONSABIL UAT, COMUNA/ORAS, SAT, CENTRU APIA, SUPRAFATA)
       │
       ▼
  TRUNCATE apia.staging_fermieri
  \\copy apia.staging_fermieri FROM ... pipe-delimited
       │
       ▼
  DELETE FROM apia.fermieri WHERE source_resource_id = $RID  -- idempotent
       │
       ▼
  INSERT ... DISTINCT ON (year, name, comuna, sat)  -- in-batch dedupe
  ON CONFLICT (...) DO UPDATE                       -- cross-batch dedupe
       │
       ▼
  apia.match_cui()    -- exact_norm + trgm fallback
       │
       ▼
  REFRESH MATERIALIZED VIEW apia.mv_per_cui
       │
       ▼
  INSERT INTO apia.scrape_log (rows_seen, rows_inserted, duration_ms, ...)

Operational

# Full discovery + ingest (default)
./cron/import-apia-fermieri.sh

# Specific year
./cron/import-apia-fermieri.sh 2024

# Smoke test (only first resource)
./cron/import-apia-fermieri.sh 2024 1

Idempotent: re-running re-deletes by source_resource_id and re-inserts. Safe to put on a monthly cron — new UAT publications are picked up automatically.

Cross-source recipes

1. "Fermier (PJ) primește subvenții și are datorii la stat"

SELECT
  f.name,
  f.cui,
  f.comuna_oras,
  f.suprafata_ha       AS ha_declarate,
  d.suma_datorata_lei
FROM apia.fermieri f
JOIN anaf.datornici  d ON d.cui = f.cui
ORDER BY d.suma_datorata_lei DESC NULLS LAST;
-- Currently: 0 hits (only 1 PJ matched in this dataset). Will scale with more UATs.

2. "Fermier APIA × FEGA AFIR plăți reale"

SELECT
  f.name,
  f.cui,
  f.comuna_oras,
  f.suprafata_ha            AS ha_declarate_apia,
  COUNT(a.id)                AS plati_fega,
  ROUND(SUM(a.ue_total)::numeric, 2) AS total_fega_eur,
  ROUND((SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0))::numeric, 2) AS eur_per_ha
FROM apia.fermieri f
JOIN fonduri.afir_plati a
  ON a.cui = f.cui
 AND a.tip_fond = 'FEGA'
GROUP BY f.name, f.cui, f.comuna_oras, f.suprafata_ha
ORDER BY total_fega_eur DESC;
-- Validated: SC WARDAMA SRL (28501796) → 2 plăți FEGA, 26.28 EUR pentru 1.1 ha.

3. "Outlier EUR/ha — fermă cu plăți disproporționate"

SELECT *
FROM (
  SELECT
    f.name,
    f.cui,
    f.suprafata_ha,
    SUM(a.ue_total)               AS total_fega_eur,
    SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0) AS eur_per_ha
  FROM apia.fermieri f
  JOIN fonduri.afir_plati a ON a.cui = f.cui AND a.tip_fond = 'FEGA'
  GROUP BY f.name, f.cui, f.suprafata_ha
) x
WHERE eur_per_ha > 500
ORDER BY eur_per_ha DESC
LIMIT 50;
-- Threshold 500 EUR/ha is high for plăți FEGA directe (~150-300 EUR/ha standard);
-- > 500 = atipic (cuplate cu măsuri de mediu sau scheme speciale).

4. "Fermier (PF) cu suprafață mare în mai multe comune"

SELECT
  name,
  array_agg(DISTINCT comuna_oras) AS comune,
  COUNT(*) AS aparitii,
  SUM(suprafata_ha) AS total_ha
FROM apia.fermieri
WHERE is_legal_person IS NOT TRUE
GROUP BY name
HAVING COUNT(*) > 1
ORDER BY total_ha DESC;
-- Detectează "fermieri-fantomă" cu același nume în mai multe UAT-uri.

5. "Cross UAT — responsabili APIA cu cele mai multe ferme"

SELECT
  responsabil_uat,
  centru_apia,
  COUNT(*) AS n_ferme,
  SUM(suprafata_ha) AS ha_totale
FROM apia.fermieri
WHERE responsabil_uat IS NOT NULL
GROUP BY responsabil_uat, centru_apia
ORDER BY ha_totale DESC NULLS LAST;
-- Operational view — cine la APIA gestionează ce volum.

Files added in this pass

  • NEW services/seap-scraper/sql/036_apia.sql
  • NEW services/seap-scraper/scripts/import-apia-fermieri.py
  • NEW services/seap-scraper/cron/import-apia-fermieri.sh
  • NEW services/seap-scraper/APIA-PLAN.md (this file)

No edits to lib/, pages/, or any existing scraper. Slot 036 chosen to avoid collision with parallel agents who picked 035 for Curtea de Conturi and GNM (Garda Mediu). 022/023 remain reserved by other parallel agents.

Next steps (low priority until more data)

  1. Watch CKAN for new resources — set up monthly cron to re-run discovery.
  2. Browserless scraper for apia.org.ro — only worth it if national lists are needed for a specific recipe page. Otherwise FEGA covers the same question at national scale.
  3. Geographic enrichment — the LPIS shapefiles (Parcele Agricole APIA LPIS 2025) could overlay on a map view of /achizitii/firma/[cui]; defer to map-feature work.
  4. judet field population — currently NULL. When more UATs ingest, derive from centru_apia mapping (centre APIA → judet is 1-N but enumerable).