initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix). - 22 pages migrated, 127 files total - All internal links: /achizitii/X → /X (176 occurrences fixed) - AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub - BaseLayout new (vreau.digital branding, OG tags, site URL) - astro.config.mjs: site https://vreau.digital, server output (was static) - docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital - deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log Backend shared with gov-agreg: - PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...) - Photon, Martin tiles - Infisical /vreaudigital path (DATABASE_URL etc. shared) build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
This commit is contained in:
@@ -0,0 +1,236 @@
|
||||
# APIA — Lista Fermieri (data.gov.ro CKAN ingest)
|
||||
|
||||
## Current state (2026-05-10)
|
||||
|
||||
| metric | value |
|
||||
| --------------------------- | ---------------------------------------------------- |
|
||||
| Schema | `apia.fermieri` + `apia.staging_fermieri` + `apia.scrape_log` + `apia.mv_per_cui` |
|
||||
| Migration | `services/seap-scraper/sql/036_apia.sql` |
|
||||
| Importer (python) | `services/seap-scraper/scripts/import-apia-fermieri.py` |
|
||||
| Importer (bash wrapper) | `services/seap-scraper/cron/import-apia-fermieri.sh` |
|
||||
| Rows ingested | **191** (Găgești, jud. Vaslui, campaign 2024) |
|
||||
| Resources | 1 / 1 discoverable on data.gov.ro |
|
||||
| Comune | 13 (rezident vs. proprietar — Găgești + diaspora) |
|
||||
| Suprafață totală | 1 575,17 ha |
|
||||
| PJ (is_legal_person) | 2 (PFA, SRL) |
|
||||
| CUI matched (firms.entities)| 1 / 2 (50%) — **SC WARDAMA SRL** (CUI 28501796) |
|
||||
| Cross-source AFIR FEGA hits | **1 firmă** (WARDAMA, 2 plăți FEGA, 26.28 EUR) |
|
||||
| Cross-source ANAF datornici | 0 |
|
||||
|
||||
## Reality check: data.gov.ro APIA scope
|
||||
|
||||
The prompt's expectation was 500–700K farmers in a single national XLSX. **That dataset
|
||||
does not exist on CKAN.** The only published "Lista fermieri APIA" XLSX on data.gov.ro
|
||||
covers a single comuna (Găgești, Vaslui, ~192 farmers).
|
||||
|
||||
### Why this matters
|
||||
|
||||
- AFIR's FEGA dump (`fonduri.afir_plati WHERE tip_fond='FEGA'`, **4 290 976 rows for
|
||||
2023+2024**) is the actual national farmer-payment dataset. APIA "Lista fermieri"
|
||||
publishes **declarations** (suprafață, responsabil UAT, centru APIA) — APIA is the
|
||||
paying agency, AFIR records the actual payments.
|
||||
- The two are complementary, not redundant:
|
||||
- APIA list → "who declared and how many ha"
|
||||
- AFIR FEGA → "who actually got paid and how much"
|
||||
- A future-proof importer that auto-discovers any new `lista-fermieri-*` package on
|
||||
data.gov.ro is what we built. When more UATs publish, re-run and it ingests them
|
||||
automatically (idempotent on `source_resource_id`).
|
||||
|
||||
### APIA national-level data (unblocked)
|
||||
|
||||
The actual national list of beneficiaries lives at https://www.apia.org.ro/ but the
|
||||
site returns HTTP 403 for non-browser User-Agents. **Out of scope for this pass.**
|
||||
Options to unblock (in cost order):
|
||||
|
||||
1. **Email APIA direct** — request structured data under Law 544/2001.
|
||||
2. **Browserless / Playwright scraper** — render JS, fetch the table. Adds infra cost
|
||||
(one more Docker container, captcha risk).
|
||||
3. **Fall back on AFIR FEGA** — already ingested; covers the question "who got
|
||||
subsidies in 2023/2024" at national scale, just without the suprafață breakdown.
|
||||
|
||||
## Schema highlights
|
||||
|
||||
```sql
|
||||
CREATE TABLE apia.fermieri (
|
||||
id bigserial PRIMARY KEY,
|
||||
campaign_year smallint NOT NULL,
|
||||
name text NOT NULL,
|
||||
name_normalized text,
|
||||
cui text,
|
||||
cui_match_method text, -- 'exact_norm' | 'trgm_unique'
|
||||
cui_match_score numeric(4,3),
|
||||
is_legal_person boolean, -- detected from name shape (SRL/SA/PFA/II/IF/SC/COOPERATIVA)
|
||||
judet text,
|
||||
comuna_oras text,
|
||||
sat text,
|
||||
centru_apia text, -- e.g. 'MURGENI'
|
||||
responsabil_uat text, -- UAT employee, not the farmer
|
||||
suprafata_ha numeric(12,4), -- declared hectares (precedent campaign)
|
||||
source_dataset_id text NOT NULL,
|
||||
source_resource_id text NOT NULL,
|
||||
source_url text NOT NULL,
|
||||
fetched_at timestamptz NOT NULL DEFAULT now(),
|
||||
UNIQUE NULLS NOT DISTINCT (campaign_year, name, comuna_oras, sat)
|
||||
);
|
||||
```
|
||||
|
||||
### Importer pipeline
|
||||
|
||||
```
|
||||
CKAN package_search?q=lista+fermieri+APIA
|
||||
│
|
||||
▼ (jq filter dataset name `lista-fermier*`, format=XLSX)
|
||||
download XLSX on satra (curl)
|
||||
│
|
||||
▼
|
||||
openpyxl read → header detect → pipe-TSV
|
||||
(NR.CRT, NUME PRENUME, RESPONSABIL UAT, COMUNA/ORAS, SAT, CENTRU APIA, SUPRAFATA)
|
||||
│
|
||||
▼
|
||||
TRUNCATE apia.staging_fermieri
|
||||
\\copy apia.staging_fermieri FROM ... pipe-delimited
|
||||
│
|
||||
▼
|
||||
DELETE FROM apia.fermieri WHERE source_resource_id = $RID -- idempotent
|
||||
│
|
||||
▼
|
||||
INSERT ... DISTINCT ON (year, name, comuna, sat) -- in-batch dedupe
|
||||
ON CONFLICT (...) DO UPDATE -- cross-batch dedupe
|
||||
│
|
||||
▼
|
||||
apia.match_cui() -- exact_norm + trgm fallback
|
||||
│
|
||||
▼
|
||||
REFRESH MATERIALIZED VIEW apia.mv_per_cui
|
||||
│
|
||||
▼
|
||||
INSERT INTO apia.scrape_log (rows_seen, rows_inserted, duration_ms, ...)
|
||||
```
|
||||
|
||||
## Operational
|
||||
|
||||
```bash
|
||||
# Full discovery + ingest (default)
|
||||
./cron/import-apia-fermieri.sh
|
||||
|
||||
# Specific year
|
||||
./cron/import-apia-fermieri.sh 2024
|
||||
|
||||
# Smoke test (only first resource)
|
||||
./cron/import-apia-fermieri.sh 2024 1
|
||||
```
|
||||
|
||||
Idempotent: re-running re-deletes by `source_resource_id` and re-inserts. Safe to put on
|
||||
a monthly cron — new UAT publications are picked up automatically.
|
||||
|
||||
## Cross-source recipes
|
||||
|
||||
### 1. "Fermier (PJ) primește subvenții și are datorii la stat"
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
f.name,
|
||||
f.cui,
|
||||
f.comuna_oras,
|
||||
f.suprafata_ha AS ha_declarate,
|
||||
d.suma_datorata_lei
|
||||
FROM apia.fermieri f
|
||||
JOIN anaf.datornici d ON d.cui = f.cui
|
||||
ORDER BY d.suma_datorata_lei DESC NULLS LAST;
|
||||
-- Currently: 0 hits (only 1 PJ matched in this dataset). Will scale with more UATs.
|
||||
```
|
||||
|
||||
### 2. "Fermier APIA × FEGA AFIR plăți reale"
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
f.name,
|
||||
f.cui,
|
||||
f.comuna_oras,
|
||||
f.suprafata_ha AS ha_declarate_apia,
|
||||
COUNT(a.id) AS plati_fega,
|
||||
ROUND(SUM(a.ue_total)::numeric, 2) AS total_fega_eur,
|
||||
ROUND((SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0))::numeric, 2) AS eur_per_ha
|
||||
FROM apia.fermieri f
|
||||
JOIN fonduri.afir_plati a
|
||||
ON a.cui = f.cui
|
||||
AND a.tip_fond = 'FEGA'
|
||||
GROUP BY f.name, f.cui, f.comuna_oras, f.suprafata_ha
|
||||
ORDER BY total_fega_eur DESC;
|
||||
-- Validated: SC WARDAMA SRL (28501796) → 2 plăți FEGA, 26.28 EUR pentru 1.1 ha.
|
||||
```
|
||||
|
||||
### 3. "Outlier EUR/ha — fermă cu plăți disproporționate"
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM (
|
||||
SELECT
|
||||
f.name,
|
||||
f.cui,
|
||||
f.suprafata_ha,
|
||||
SUM(a.ue_total) AS total_fega_eur,
|
||||
SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0) AS eur_per_ha
|
||||
FROM apia.fermieri f
|
||||
JOIN fonduri.afir_plati a ON a.cui = f.cui AND a.tip_fond = 'FEGA'
|
||||
GROUP BY f.name, f.cui, f.suprafata_ha
|
||||
) x
|
||||
WHERE eur_per_ha > 500
|
||||
ORDER BY eur_per_ha DESC
|
||||
LIMIT 50;
|
||||
-- Threshold 500 EUR/ha is high for plăți FEGA directe (~150-300 EUR/ha standard);
|
||||
-- > 500 = atipic (cuplate cu măsuri de mediu sau scheme speciale).
|
||||
```
|
||||
|
||||
### 4. "Fermier (PF) cu suprafață mare în mai multe comune"
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
name,
|
||||
array_agg(DISTINCT comuna_oras) AS comune,
|
||||
COUNT(*) AS aparitii,
|
||||
SUM(suprafata_ha) AS total_ha
|
||||
FROM apia.fermieri
|
||||
WHERE is_legal_person IS NOT TRUE
|
||||
GROUP BY name
|
||||
HAVING COUNT(*) > 1
|
||||
ORDER BY total_ha DESC;
|
||||
-- Detectează "fermieri-fantomă" cu același nume în mai multe UAT-uri.
|
||||
```
|
||||
|
||||
### 5. "Cross UAT — responsabili APIA cu cele mai multe ferme"
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
responsabil_uat,
|
||||
centru_apia,
|
||||
COUNT(*) AS n_ferme,
|
||||
SUM(suprafata_ha) AS ha_totale
|
||||
FROM apia.fermieri
|
||||
WHERE responsabil_uat IS NOT NULL
|
||||
GROUP BY responsabil_uat, centru_apia
|
||||
ORDER BY ha_totale DESC NULLS LAST;
|
||||
-- Operational view — cine la APIA gestionează ce volum.
|
||||
```
|
||||
|
||||
## Files added in this pass
|
||||
|
||||
- **NEW** `services/seap-scraper/sql/036_apia.sql`
|
||||
- **NEW** `services/seap-scraper/scripts/import-apia-fermieri.py`
|
||||
- **NEW** `services/seap-scraper/cron/import-apia-fermieri.sh`
|
||||
- **NEW** `services/seap-scraper/APIA-PLAN.md` (this file)
|
||||
|
||||
No edits to `lib/`, `pages/`, or any existing scraper. Slot 036 chosen to
|
||||
avoid collision with parallel agents who picked 035 for Curtea de Conturi
|
||||
and GNM (Garda Mediu). 022/023 remain reserved by other parallel agents.
|
||||
|
||||
## Next steps (low priority until more data)
|
||||
|
||||
1. **Watch CKAN for new resources** — set up monthly cron to re-run discovery.
|
||||
2. **Browserless scraper for apia.org.ro** — only worth it if national lists are needed
|
||||
for a specific recipe page. Otherwise FEGA covers the same question at national
|
||||
scale.
|
||||
3. **Geographic enrichment** — the LPIS shapefiles (`Parcele Agricole APIA LPIS 2025`)
|
||||
could overlay on a map view of /achizitii/firma/[cui]; defer to map-feature work.
|
||||
4. **judet field population** — currently NULL. When more UATs ingest, derive from
|
||||
centru_apia mapping (centre APIA → judet is 1-N but enumerable).
|
||||
Reference in New Issue
Block a user