a6c03a091e
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix). - 22 pages migrated, 127 files total - All internal links: /achizitii/X → /X (176 occurrences fixed) - AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub - BaseLayout new (vreau.digital branding, OG tags, site URL) - astro.config.mjs: site https://vreau.digital, server output (was static) - docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital - deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log Backend shared with gov-agreg: - PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...) - Photon, Martin tiles - Infisical /vreaudigital path (DATABASE_URL etc. shared) build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
237 lines
8.6 KiB
Markdown
237 lines
8.6 KiB
Markdown
# APIA — Lista Fermieri (data.gov.ro CKAN ingest)
|
||
|
||
## Current state (2026-05-10)
|
||
|
||
| metric | value |
|
||
| --------------------------- | ---------------------------------------------------- |
|
||
| Schema | `apia.fermieri` + `apia.staging_fermieri` + `apia.scrape_log` + `apia.mv_per_cui` |
|
||
| Migration | `services/seap-scraper/sql/036_apia.sql` |
|
||
| Importer (python) | `services/seap-scraper/scripts/import-apia-fermieri.py` |
|
||
| Importer (bash wrapper) | `services/seap-scraper/cron/import-apia-fermieri.sh` |
|
||
| Rows ingested | **191** (Găgești, jud. Vaslui, campaign 2024) |
|
||
| Resources | 1 / 1 discoverable on data.gov.ro |
|
||
| Comune | 13 (rezident vs. proprietar — Găgești + diaspora) |
|
||
| Suprafață totală | 1 575,17 ha |
|
||
| PJ (is_legal_person) | 2 (PFA, SRL) |
|
||
| CUI matched (firms.entities)| 1 / 2 (50%) — **SC WARDAMA SRL** (CUI 28501796) |
|
||
| Cross-source AFIR FEGA hits | **1 firmă** (WARDAMA, 2 plăți FEGA, 26.28 EUR) |
|
||
| Cross-source ANAF datornici | 0 |
|
||
|
||
## Reality check: data.gov.ro APIA scope
|
||
|
||
The prompt's expectation was 500–700K farmers in a single national XLSX. **That dataset
|
||
does not exist on CKAN.** The only published "Lista fermieri APIA" XLSX on data.gov.ro
|
||
covers a single comuna (Găgești, Vaslui, ~192 farmers).
|
||
|
||
### Why this matters
|
||
|
||
- AFIR's FEGA dump (`fonduri.afir_plati WHERE tip_fond='FEGA'`, **4 290 976 rows for
|
||
2023+2024**) is the actual national farmer-payment dataset. APIA "Lista fermieri"
|
||
publishes **declarations** (suprafață, responsabil UAT, centru APIA) — APIA is the
|
||
paying agency, AFIR records the actual payments.
|
||
- The two are complementary, not redundant:
|
||
- APIA list → "who declared and how many ha"
|
||
- AFIR FEGA → "who actually got paid and how much"
|
||
- A future-proof importer that auto-discovers any new `lista-fermieri-*` package on
|
||
data.gov.ro is what we built. When more UATs publish, re-run and it ingests them
|
||
automatically (idempotent on `source_resource_id`).
|
||
|
||
### APIA national-level data (unblocked)
|
||
|
||
The actual national list of beneficiaries lives at https://www.apia.org.ro/ but the
|
||
site returns HTTP 403 for non-browser User-Agents. **Out of scope for this pass.**
|
||
Options to unblock (in cost order):
|
||
|
||
1. **Email APIA direct** — request structured data under Law 544/2001.
|
||
2. **Browserless / Playwright scraper** — render JS, fetch the table. Adds infra cost
|
||
(one more Docker container, captcha risk).
|
||
3. **Fall back on AFIR FEGA** — already ingested; covers the question "who got
|
||
subsidies in 2023/2024" at national scale, just without the suprafață breakdown.
|
||
|
||
## Schema highlights
|
||
|
||
```sql
|
||
CREATE TABLE apia.fermieri (
|
||
id bigserial PRIMARY KEY,
|
||
campaign_year smallint NOT NULL,
|
||
name text NOT NULL,
|
||
name_normalized text,
|
||
cui text,
|
||
cui_match_method text, -- 'exact_norm' | 'trgm_unique'
|
||
cui_match_score numeric(4,3),
|
||
is_legal_person boolean, -- detected from name shape (SRL/SA/PFA/II/IF/SC/COOPERATIVA)
|
||
judet text,
|
||
comuna_oras text,
|
||
sat text,
|
||
centru_apia text, -- e.g. 'MURGENI'
|
||
responsabil_uat text, -- UAT employee, not the farmer
|
||
suprafata_ha numeric(12,4), -- declared hectares (precedent campaign)
|
||
source_dataset_id text NOT NULL,
|
||
source_resource_id text NOT NULL,
|
||
source_url text NOT NULL,
|
||
fetched_at timestamptz NOT NULL DEFAULT now(),
|
||
UNIQUE NULLS NOT DISTINCT (campaign_year, name, comuna_oras, sat)
|
||
);
|
||
```
|
||
|
||
### Importer pipeline
|
||
|
||
```
|
||
CKAN package_search?q=lista+fermieri+APIA
|
||
│
|
||
▼ (jq filter dataset name `lista-fermier*`, format=XLSX)
|
||
download XLSX on satra (curl)
|
||
│
|
||
▼
|
||
openpyxl read → header detect → pipe-TSV
|
||
(NR.CRT, NUME PRENUME, RESPONSABIL UAT, COMUNA/ORAS, SAT, CENTRU APIA, SUPRAFATA)
|
||
│
|
||
▼
|
||
TRUNCATE apia.staging_fermieri
|
||
\\copy apia.staging_fermieri FROM ... pipe-delimited
|
||
│
|
||
▼
|
||
DELETE FROM apia.fermieri WHERE source_resource_id = $RID -- idempotent
|
||
│
|
||
▼
|
||
INSERT ... DISTINCT ON (year, name, comuna, sat) -- in-batch dedupe
|
||
ON CONFLICT (...) DO UPDATE -- cross-batch dedupe
|
||
│
|
||
▼
|
||
apia.match_cui() -- exact_norm + trgm fallback
|
||
│
|
||
▼
|
||
REFRESH MATERIALIZED VIEW apia.mv_per_cui
|
||
│
|
||
▼
|
||
INSERT INTO apia.scrape_log (rows_seen, rows_inserted, duration_ms, ...)
|
||
```
|
||
|
||
## Operational
|
||
|
||
```bash
|
||
# Full discovery + ingest (default)
|
||
./cron/import-apia-fermieri.sh
|
||
|
||
# Specific year
|
||
./cron/import-apia-fermieri.sh 2024
|
||
|
||
# Smoke test (only first resource)
|
||
./cron/import-apia-fermieri.sh 2024 1
|
||
```
|
||
|
||
Idempotent: re-running re-deletes by `source_resource_id` and re-inserts. Safe to put on
|
||
a monthly cron — new UAT publications are picked up automatically.
|
||
|
||
## Cross-source recipes
|
||
|
||
### 1. "Fermier (PJ) primește subvenții și are datorii la stat"
|
||
|
||
```sql
|
||
SELECT
|
||
f.name,
|
||
f.cui,
|
||
f.comuna_oras,
|
||
f.suprafata_ha AS ha_declarate,
|
||
d.suma_datorata_lei
|
||
FROM apia.fermieri f
|
||
JOIN anaf.datornici d ON d.cui = f.cui
|
||
ORDER BY d.suma_datorata_lei DESC NULLS LAST;
|
||
-- Currently: 0 hits (only 1 PJ matched in this dataset). Will scale with more UATs.
|
||
```
|
||
|
||
### 2. "Fermier APIA × FEGA AFIR plăți reale"
|
||
|
||
```sql
|
||
SELECT
|
||
f.name,
|
||
f.cui,
|
||
f.comuna_oras,
|
||
f.suprafata_ha AS ha_declarate_apia,
|
||
COUNT(a.id) AS plati_fega,
|
||
ROUND(SUM(a.ue_total)::numeric, 2) AS total_fega_eur,
|
||
ROUND((SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0))::numeric, 2) AS eur_per_ha
|
||
FROM apia.fermieri f
|
||
JOIN fonduri.afir_plati a
|
||
ON a.cui = f.cui
|
||
AND a.tip_fond = 'FEGA'
|
||
GROUP BY f.name, f.cui, f.comuna_oras, f.suprafata_ha
|
||
ORDER BY total_fega_eur DESC;
|
||
-- Validated: SC WARDAMA SRL (28501796) → 2 plăți FEGA, 26.28 EUR pentru 1.1 ha.
|
||
```
|
||
|
||
### 3. "Outlier EUR/ha — fermă cu plăți disproporționate"
|
||
|
||
```sql
|
||
SELECT *
|
||
FROM (
|
||
SELECT
|
||
f.name,
|
||
f.cui,
|
||
f.suprafata_ha,
|
||
SUM(a.ue_total) AS total_fega_eur,
|
||
SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0) AS eur_per_ha
|
||
FROM apia.fermieri f
|
||
JOIN fonduri.afir_plati a ON a.cui = f.cui AND a.tip_fond = 'FEGA'
|
||
GROUP BY f.name, f.cui, f.suprafata_ha
|
||
) x
|
||
WHERE eur_per_ha > 500
|
||
ORDER BY eur_per_ha DESC
|
||
LIMIT 50;
|
||
-- Threshold 500 EUR/ha is high for plăți FEGA directe (~150-300 EUR/ha standard);
|
||
-- > 500 = atipic (cuplate cu măsuri de mediu sau scheme speciale).
|
||
```
|
||
|
||
### 4. "Fermier (PF) cu suprafață mare în mai multe comune"
|
||
|
||
```sql
|
||
SELECT
|
||
name,
|
||
array_agg(DISTINCT comuna_oras) AS comune,
|
||
COUNT(*) AS aparitii,
|
||
SUM(suprafata_ha) AS total_ha
|
||
FROM apia.fermieri
|
||
WHERE is_legal_person IS NOT TRUE
|
||
GROUP BY name
|
||
HAVING COUNT(*) > 1
|
||
ORDER BY total_ha DESC;
|
||
-- Detectează "fermieri-fantomă" cu același nume în mai multe UAT-uri.
|
||
```
|
||
|
||
### 5. "Cross UAT — responsabili APIA cu cele mai multe ferme"
|
||
|
||
```sql
|
||
SELECT
|
||
responsabil_uat,
|
||
centru_apia,
|
||
COUNT(*) AS n_ferme,
|
||
SUM(suprafata_ha) AS ha_totale
|
||
FROM apia.fermieri
|
||
WHERE responsabil_uat IS NOT NULL
|
||
GROUP BY responsabil_uat, centru_apia
|
||
ORDER BY ha_totale DESC NULLS LAST;
|
||
-- Operational view — cine la APIA gestionează ce volum.
|
||
```
|
||
|
||
## Files added in this pass
|
||
|
||
- **NEW** `services/seap-scraper/sql/036_apia.sql`
|
||
- **NEW** `services/seap-scraper/scripts/import-apia-fermieri.py`
|
||
- **NEW** `services/seap-scraper/cron/import-apia-fermieri.sh`
|
||
- **NEW** `services/seap-scraper/APIA-PLAN.md` (this file)
|
||
|
||
No edits to `lib/`, `pages/`, or any existing scraper. Slot 036 chosen to
|
||
avoid collision with parallel agents who picked 035 for Curtea de Conturi
|
||
and GNM (Garda Mediu). 022/023 remain reserved by other parallel agents.
|
||
|
||
## Next steps (low priority until more data)
|
||
|
||
1. **Watch CKAN for new resources** — set up monthly cron to re-run discovery.
|
||
2. **Browserless scraper for apia.org.ro** — only worth it if national lists are needed
|
||
for a specific recipe page. Otherwise FEGA covers the same question at national
|
||
scale.
|
||
3. **Geographic enrichment** — the LPIS shapefiles (`Parcele Agricole APIA LPIS 2025`)
|
||
could overlay on a map view of /achizitii/firma/[cui]; defer to map-feature work.
|
||
4. **judet field population** — currently NULL. When more UATs ingest, derive from
|
||
centru_apia mapping (centre APIA → judet is 1-N but enumerable).
|