initial: split from gov-agreg — vreau.digital standalone platform

Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
This commit is contained in:
Claude VM
2026-05-13 00:10:32 +03:00
commit a6c03a091e
352 changed files with 75295 additions and 0 deletions
+236
View File
@@ -0,0 +1,236 @@
# APIA — Lista Fermieri (data.gov.ro CKAN ingest)
## Current state (2026-05-10)
| metric | value |
| --------------------------- | ---------------------------------------------------- |
| Schema | `apia.fermieri` + `apia.staging_fermieri` + `apia.scrape_log` + `apia.mv_per_cui` |
| Migration | `services/seap-scraper/sql/036_apia.sql` |
| Importer (python) | `services/seap-scraper/scripts/import-apia-fermieri.py` |
| Importer (bash wrapper) | `services/seap-scraper/cron/import-apia-fermieri.sh` |
| Rows ingested | **191** (Găgești, jud. Vaslui, campaign 2024) |
| Resources | 1 / 1 discoverable on data.gov.ro |
| Comune | 13 (rezident vs. proprietar — Găgești + diaspora) |
| Suprafață totală | 1 575,17 ha |
| PJ (is_legal_person) | 2 (PFA, SRL) |
| CUI matched (firms.entities)| 1 / 2 (50%) — **SC WARDAMA SRL** (CUI 28501796) |
| Cross-source AFIR FEGA hits | **1 firmă** (WARDAMA, 2 plăți FEGA, 26.28 EUR) |
| Cross-source ANAF datornici | 0 |
## Reality check: data.gov.ro APIA scope
The prompt's expectation was 500700K farmers in a single national XLSX. **That dataset
does not exist on CKAN.** The only published "Lista fermieri APIA" XLSX on data.gov.ro
covers a single comuna (Găgești, Vaslui, ~192 farmers).
### Why this matters
- AFIR's FEGA dump (`fonduri.afir_plati WHERE tip_fond='FEGA'`, **4 290 976 rows for
2023+2024**) is the actual national farmer-payment dataset. APIA "Lista fermieri"
publishes **declarations** (suprafață, responsabil UAT, centru APIA) — APIA is the
paying agency, AFIR records the actual payments.
- The two are complementary, not redundant:
- APIA list → "who declared and how many ha"
- AFIR FEGA → "who actually got paid and how much"
- A future-proof importer that auto-discovers any new `lista-fermieri-*` package on
data.gov.ro is what we built. When more UATs publish, re-run and it ingests them
automatically (idempotent on `source_resource_id`).
### APIA national-level data (unblocked)
The actual national list of beneficiaries lives at https://www.apia.org.ro/ but the
site returns HTTP 403 for non-browser User-Agents. **Out of scope for this pass.**
Options to unblock (in cost order):
1. **Email APIA direct** — request structured data under Law 544/2001.
2. **Browserless / Playwright scraper** — render JS, fetch the table. Adds infra cost
(one more Docker container, captcha risk).
3. **Fall back on AFIR FEGA** — already ingested; covers the question "who got
subsidies in 2023/2024" at national scale, just without the suprafață breakdown.
## Schema highlights
```sql
CREATE TABLE apia.fermieri (
id bigserial PRIMARY KEY,
campaign_year smallint NOT NULL,
name text NOT NULL,
name_normalized text,
cui text,
cui_match_method text, -- 'exact_norm' | 'trgm_unique'
cui_match_score numeric(4,3),
is_legal_person boolean, -- detected from name shape (SRL/SA/PFA/II/IF/SC/COOPERATIVA)
judet text,
comuna_oras text,
sat text,
centru_apia text, -- e.g. 'MURGENI'
responsabil_uat text, -- UAT employee, not the farmer
suprafata_ha numeric(12,4), -- declared hectares (precedent campaign)
source_dataset_id text NOT NULL,
source_resource_id text NOT NULL,
source_url text NOT NULL,
fetched_at timestamptz NOT NULL DEFAULT now(),
UNIQUE NULLS NOT DISTINCT (campaign_year, name, comuna_oras, sat)
);
```
### Importer pipeline
```
CKAN package_search?q=lista+fermieri+APIA
▼ (jq filter dataset name `lista-fermier*`, format=XLSX)
download XLSX on satra (curl)
openpyxl read → header detect → pipe-TSV
(NR.CRT, NUME PRENUME, RESPONSABIL UAT, COMUNA/ORAS, SAT, CENTRU APIA, SUPRAFATA)
TRUNCATE apia.staging_fermieri
\\copy apia.staging_fermieri FROM ... pipe-delimited
DELETE FROM apia.fermieri WHERE source_resource_id = $RID -- idempotent
INSERT ... DISTINCT ON (year, name, comuna, sat) -- in-batch dedupe
ON CONFLICT (...) DO UPDATE -- cross-batch dedupe
apia.match_cui() -- exact_norm + trgm fallback
REFRESH MATERIALIZED VIEW apia.mv_per_cui
INSERT INTO apia.scrape_log (rows_seen, rows_inserted, duration_ms, ...)
```
## Operational
```bash
# Full discovery + ingest (default)
./cron/import-apia-fermieri.sh
# Specific year
./cron/import-apia-fermieri.sh 2024
# Smoke test (only first resource)
./cron/import-apia-fermieri.sh 2024 1
```
Idempotent: re-running re-deletes by `source_resource_id` and re-inserts. Safe to put on
a monthly cron — new UAT publications are picked up automatically.
## Cross-source recipes
### 1. "Fermier (PJ) primește subvenții și are datorii la stat"
```sql
SELECT
f.name,
f.cui,
f.comuna_oras,
f.suprafata_ha AS ha_declarate,
d.suma_datorata_lei
FROM apia.fermieri f
JOIN anaf.datornici d ON d.cui = f.cui
ORDER BY d.suma_datorata_lei DESC NULLS LAST;
-- Currently: 0 hits (only 1 PJ matched in this dataset). Will scale with more UATs.
```
### 2. "Fermier APIA × FEGA AFIR plăți reale"
```sql
SELECT
f.name,
f.cui,
f.comuna_oras,
f.suprafata_ha AS ha_declarate_apia,
COUNT(a.id) AS plati_fega,
ROUND(SUM(a.ue_total)::numeric, 2) AS total_fega_eur,
ROUND((SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0))::numeric, 2) AS eur_per_ha
FROM apia.fermieri f
JOIN fonduri.afir_plati a
ON a.cui = f.cui
AND a.tip_fond = 'FEGA'
GROUP BY f.name, f.cui, f.comuna_oras, f.suprafata_ha
ORDER BY total_fega_eur DESC;
-- Validated: SC WARDAMA SRL (28501796) → 2 plăți FEGA, 26.28 EUR pentru 1.1 ha.
```
### 3. "Outlier EUR/ha — fermă cu plăți disproporționate"
```sql
SELECT *
FROM (
SELECT
f.name,
f.cui,
f.suprafata_ha,
SUM(a.ue_total) AS total_fega_eur,
SUM(a.ue_total) / NULLIF(f.suprafata_ha, 0) AS eur_per_ha
FROM apia.fermieri f
JOIN fonduri.afir_plati a ON a.cui = f.cui AND a.tip_fond = 'FEGA'
GROUP BY f.name, f.cui, f.suprafata_ha
) x
WHERE eur_per_ha > 500
ORDER BY eur_per_ha DESC
LIMIT 50;
-- Threshold 500 EUR/ha is high for plăți FEGA directe (~150-300 EUR/ha standard);
-- > 500 = atipic (cuplate cu măsuri de mediu sau scheme speciale).
```
### 4. "Fermier (PF) cu suprafață mare în mai multe comune"
```sql
SELECT
name,
array_agg(DISTINCT comuna_oras) AS comune,
COUNT(*) AS aparitii,
SUM(suprafata_ha) AS total_ha
FROM apia.fermieri
WHERE is_legal_person IS NOT TRUE
GROUP BY name
HAVING COUNT(*) > 1
ORDER BY total_ha DESC;
-- Detectează "fermieri-fantomă" cu același nume în mai multe UAT-uri.
```
### 5. "Cross UAT — responsabili APIA cu cele mai multe ferme"
```sql
SELECT
responsabil_uat,
centru_apia,
COUNT(*) AS n_ferme,
SUM(suprafata_ha) AS ha_totale
FROM apia.fermieri
WHERE responsabil_uat IS NOT NULL
GROUP BY responsabil_uat, centru_apia
ORDER BY ha_totale DESC NULLS LAST;
-- Operational view — cine la APIA gestionează ce volum.
```
## Files added in this pass
- **NEW** `services/seap-scraper/sql/036_apia.sql`
- **NEW** `services/seap-scraper/scripts/import-apia-fermieri.py`
- **NEW** `services/seap-scraper/cron/import-apia-fermieri.sh`
- **NEW** `services/seap-scraper/APIA-PLAN.md` (this file)
No edits to `lib/`, `pages/`, or any existing scraper. Slot 036 chosen to
avoid collision with parallel agents who picked 035 for Curtea de Conturi
and GNM (Garda Mediu). 022/023 remain reserved by other parallel agents.
## Next steps (low priority until more data)
1. **Watch CKAN for new resources** — set up monthly cron to re-run discovery.
2. **Browserless scraper for apia.org.ro** — only worth it if national lists are needed
for a specific recipe page. Otherwise FEGA covers the same question at national
scale.
3. **Geographic enrichment** — the LPIS shapefiles (`Parcele Agricole APIA LPIS 2025`)
could overlay on a map view of /achizitii/firma/[cui]; defer to map-feature work.
4. **judet field population** — currently NULL. When more UATs ingest, derive from
centru_apia mapping (centre APIA → judet is 1-N but enumerable).