Files
Claude VM a6c03a091e initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
2026-05-13 00:10:32 +03:00

237 lines
8.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# APIA — Lista Fermieri (data.gov.ro CKAN ingest)
## Current state (2026-05-10)
| metric | value |
| --------------------------- | ---------------------------------------------------- |
| Schema | `apia.fermieri` + `apia.staging_fermieri` + `apia.scrape_log` + `apia.mv_per_cui` |
| Migration | `services/seap-scraper/sql/036_apia.sql` |
| Importer (python) | `services/seap-scraper/scripts/import-apia-fermieri.py` |
| Importer (bash wrapper) | `services/seap-scraper/cron/import-apia-fermieri.sh` |
| Rows ingested | **191** (Găgești, jud. Vaslui, campaign 2024) |
| Resources | 1 / 1 discoverable on data.gov.ro |
| Comune | 13 (rezident vs. proprietar — Găgești + diaspora) |
| Suprafață totală | 1 575,17 ha |
| PJ (is_legal_person) | 2 (PFA, SRL) |
| CUI matched (firms.entities)| 1 / 2 (50%) — **SC WARDAMA SRL** (CUI 28501796) |
| Cross-source AFIR FEGA hits | **1 firmă** (WARDAMA, 2 plăți FEGA, 26.28 EUR) |
| Cross-source ANAF datornici | 0 |
## Reality check: data.gov.ro APIA scope
The prompt's expectation was 500700K farmers in a single national XLSX. **That dataset
does not exist on CKAN.** The only published "Lista fermieri APIA" XLSX on data.gov.ro
covers a single comuna (Găgești, Vaslui, ~192 farmers).
### Why this matters
- AFIR's FEGA dump (`fonduri.afir_plati WHERE tip_fond='FEGA'`, **4 290 976 rows for
2023+2024**) is the actual national farmer-payment dataset. APIA "Lista fermieri"
publishes **declarations** (suprafață, responsabil UAT, centru APIA) — APIA is the
paying agency, AFIR records the actual payments.
- The two are complementary, not redundant:
- APIA list → "who declared and how many ha"
- AFIR FEGA → "who actually got paid and how much"
- A future-proof importer that auto-discovers any new `lista-fermieri-*` package on
data.gov.ro is what we built. When more UATs publish, re-run and it ingests them
automatically (idempotent on `source_resource_id`).
### APIA national-level data (unblocked)
The actual national list of beneficiaries lives at https://www.apia.org.ro/ but the
site returns HTTP 403 for non-browser User-Agents. **Out of scope for this pass.**
Options to unblock (in cost order):
1. **Email APIA direct** — request structured data under Law 544/2001.
2. **Browserless / Playwright scraper** — render JS, fetch the table. Adds infra cost
(one more Docker container, captcha risk).
3. **Fall back on AFIR FEGA** — already ingested; covers the question "who got
subsidies in 2023/2024" at national scale, just without the suprafață breakdown.
## Schema highlights
```sql
CREATE TABLE apia.fermieri (
id bigserial PRIMARY KEY,
campaign_year smallint NOT NULL,
name text NOT NULL,
name_normalized text,
cui text,
cui_match_method text, -- 'exact_norm' | 'trgm_unique'
cui_match_score numeric(4,3),
is_legal_person boolean, -- detected from name shape (SRL/SA/PFA/II/IF/SC/COOPERATIVA)
judet text,
comuna_oras text,
sat text,
centru_apia text, -- e.g. 'MURGENI'
responsabil_uat text, -- UAT employee, not the farmer
suprafata_ha numeric(12,4), -- declared hectares (precedent campaign)
source_dataset_id text NOT NULL,
source_resource_id text NOT NULL,
source_url text NOT NULL,
fetched_at timestamptz NOT NULL DEFAULT now(),
UNIQUE NULLS NOT DISTINCT (campaign_year, name, comuna_oras, sat)
);
```
### Importer pipeline
```
CKAN package_search?q=lista+fermieri+APIA
▼ (jq filter dataset name `lista-fermier*`, format=XLSX)
download XLSX on satra (curl)
openpyxl read → header detect → pipe-TSV
(NR.CRT, NUME PRENUME, RESPONSABIL UAT, COMUNA/ORAS, SAT, CENTRU APIA, SUPRAFATA)
TRUNCATE apia.staging_fermieri
\\copy apia.staging_fermieri FROM ... pipe-delimited
DELETE FROM apia.fermieri WHERE source_resource_id = $RID -- idempotent
INSERT ... DISTINCT ON (year, name, comuna, sat) -- in-batch dedupe
ON CONFLICT (...) DO UPDATE -- cross-batch dedupe
apia.match_cui() -- exact_norm + trgm fallback
REFRESH MATERIALIZED VIEW apia.mv_per_cui
INSERT INTO apia.scrape_log (rows_seen, rows_inserted, duration_ms, ...)
```
## Operational
```bash
# Full discovery + ingest (default)
./cron/import-apia-fermieri.sh
# Specific year
./cron/import-apia-fermieri.sh 2024
# Smoke test (only first resource)
./cron/import-apia-fermieri.sh 2024 1
```
Idempotent: re-running re-deletes by `source_resource_id` and re-inserts. Safe to put on
a monthly cron — new UAT publications are picked up automatically.
## Cross-source recipes
### 1. "Fermier (PJ) primește subvenții și are datorii la stat"
```sql
SELECT
f.name,
f.cui,
f.comuna_oras,
f.suprafata_ha AS ha_declarate,
d.suma_datorata_lei
FROM apia.fermieri f
JOIN anaf.datornici d ON d.cui = f.cui
ORDER BY d.suma_datorata_lei DESC NULLS LAST;
-- Currently: 0 hits (only 1 PJ matched in this dataset). Will scale with more UATs.
```
### 2. "Fermier APIA × FEGA AFIR plăți reale"
```sql
SELECT
f.name,
f.cui,
f.comuna_oras,
f.suprafata_ha AS ha_declarate_apia,
COUNT(a.id) AS plati_fega,
ROUND(SUM(a.ue_total)::numeric, 2) AS total_fega_eur,
ROUND((SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0))::numeric, 2) AS eur_per_ha
FROM apia.fermieri f
JOIN fonduri.afir_plati a
ON a.cui = f.cui
AND a.tip_fond = 'FEGA'
GROUP BY f.name, f.cui, f.comuna_oras, f.suprafata_ha
ORDER BY total_fega_eur DESC;
-- Validated: SC WARDAMA SRL (28501796) → 2 plăți FEGA, 26.28 EUR pentru 1.1 ha.
```
### 3. "Outlier EUR/ha — fermă cu plăți disproporționate"
```sql
SELECT *
FROM (
SELECT
f.name,
f.cui,
f.suprafata_ha,
SUM(a.ue_total) AS total_fega_eur,
SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0) AS eur_per_ha
FROM apia.fermieri f
JOIN fonduri.afir_plati a ON a.cui = f.cui AND a.tip_fond = 'FEGA'
GROUP BY f.name, f.cui, f.suprafata_ha
) x
WHERE eur_per_ha > 500
ORDER BY eur_per_ha DESC
LIMIT 50;
-- Threshold 500 EUR/ha is high for plăți FEGA directe (~150-300 EUR/ha standard);
-- > 500 = atipic (cuplate cu măsuri de mediu sau scheme speciale).
```
### 4. "Fermier (PF) cu suprafață mare în mai multe comune"
```sql
SELECT
name,
array_agg(DISTINCT comuna_oras) AS comune,
COUNT(*) AS aparitii,
SUM(suprafata_ha) AS total_ha
FROM apia.fermieri
WHERE is_legal_person IS NOT TRUE
GROUP BY name
HAVING COUNT(*) > 1
ORDER BY total_ha DESC;
-- Detectează "fermieri-fantomă" cu același nume în mai multe UAT-uri.
```
### 5. "Cross UAT — responsabili APIA cu cele mai multe ferme"
```sql
SELECT
responsabil_uat,
centru_apia,
COUNT(*) AS n_ferme,
SUM(suprafata_ha) AS ha_totale
FROM apia.fermieri
WHERE responsabil_uat IS NOT NULL
GROUP BY responsabil_uat, centru_apia
ORDER BY ha_totale DESC NULLS LAST;
-- Operational view — cine la APIA gestionează ce volum.
```
## Files added in this pass
- **NEW** `services/seap-scraper/sql/036_apia.sql`
- **NEW** `services/seap-scraper/scripts/import-apia-fermieri.py`
- **NEW** `services/seap-scraper/cron/import-apia-fermieri.sh`
- **NEW** `services/seap-scraper/APIA-PLAN.md` (this file)
No edits to `lib/`, `pages/`, or any existing scraper. Slot 036 chosen to
avoid collision with parallel agents who picked 035 for Curtea de Conturi
and GNM (Garda Mediu). 022/023 remain reserved by other parallel agents.
## Next steps (low priority until more data)
1. **Watch CKAN for new resources** — set up monthly cron to re-run discovery.
2. **Browserless scraper for apia.org.ro** — only worth it if national lists are needed
for a specific recipe page. Otherwise FEGA covers the same question at national
scale.
3. **Geographic enrichment** — the LPIS shapefiles (`Parcele Agricole APIA LPIS 2025`)
could overlay on a map view of /achizitii/firma/[cui]; defer to map-feature work.
4. **judet field population** — currently NULL. When more UATs ingest, derive from
centru_apia mapping (centre APIA → judet is 1-N but enumerable).