# GNM — Garda Națională de Mediu (Hand-off Plan) **Status:** PARTIAL. Source publishes only aggregate stats. We capture the publicly-named violators (the headline cases) — full per-CUI fines history is NOT available without an OUG 109/2007 access-to-info request. ## Sources investigated (2026-05-10) | Source | URL | Verdict | |---|---|---| | gnm.ro homepage | https://www.gnm.ro/ | Only links to PDFs + press releases | | Annual reports | gnm.ro/rapoarte-si-note-de-activitate/ | Aggregate stats only — `raport_activitate_.pdf` (2012-2024) | | Monthly synthesis | gnm.ro/wp-content/.../sinteza__.pdf | 3-page PDFs: per-judet TOTALS only, no per-firm rows | | Press releases | gnm.ro/noutati/ | ~358 articles, ~10% enforcement, sporadic firm names | | RSS feed | gnm.ro/feed/?paged=N | Same articles, structured XML, 36 pages × 10 items | | data.gov.ro `q=mediu` | 45 datasets | Air quality, IPPC, SEVESO inventories — **no fines dataset** | | ANPM rapoarte | anpm.ro | IPPC/SEVESO only (already covered by other agents) | **Why per-CUI is impossible:** GNM is exempt from the open-data registry obligation (OUG 109/2007). They cite "secret de serviciu" + "operatori economici personali date" for not publishing the contravention register. The only legal path is per-firm FOIA requests. ## Schema applied `services/seap-scraper/sql/037_gnm.sql` — three tables in schema `gnm`: | Table | Purpose | Rows after first run | |---|---|---| | `gnm.comunicate` | Raw archive of every press release (RSS) | **348** | | `gnm.amenzi_extrase` | Regex-extracted (firm, fine_lei) tuples | **1** (after dedup) | | `gnm.scrape_log` | Run history (mirrors anre/ancom) | 4 | `is_enforcement` flag = 36/348 (10.3%) of articles match the `/amenz|sancțiun|sistare|confiscat|sesizare penal/i` filter. ## Files added ``` services/seap-scraper/sql/037_gnm.sql (130 lines) services/seap-scraper/src/scrape-gnm.ts (~440 lines) services/seap-scraper/cron/scrape-gnm.sh ( 90 lines) services/seap-scraper/GNM-PLAN.md (this file) ``` ## Sample ingest stats First full backfill (2026-05-10): ``` seen=348 inserted=348 updated=0 skipped=0 enforcement=36 violators=2 → 1 after dedup duration=58s ``` After running the Stage-B fuzzy matcher against `firms.entities`: ``` gnm.amenzi_extrase id=1 contravenient_name = "Retim Ecologic Service SA" contravenient_cui = 9112229 (RETIM ECOLOGIC SERVICE SA, jud. BIHOR) cui_match_score = 1.0 suma_lei = 150000 context = "Depozitul de Deșeuri Nepericuloase Ghizela, operat de Retim Ecologic Service SA. Operatorul a fost sancționat cu 150.000 lei amendă..." ``` ## Realistic yield estimate Press-release named violators per year ≈ 50-200 firms (out of ~5,000 actual fines). Coverage = 1-4%. Acceptable trade-off: the firms that appear in press releases are the **biggest** offenders (refineries, large landfills, mining operators) — exactly the firms most likely to also win SEAP contracts. The tail is invisible but the top of the distribution is captured. ## Cross-source SQL recipes ### 1. Firms with GNM environmental fines that win SEAP construction contracts ```sql -- Environmental violators winning state contracts. -- Construction CPV codes start with 45; mining/extraction CPV 14/77. SELECT ge.contravenient_cui, ge.contravenient_name, ge.suma_lei AS gnm_amenda_lei, ge.fapta, c.titlu AS gnm_articol, c.publicat_la AS gnm_data, COUNT(DISTINCT a.id) AS seap_contracte_castigate, SUM(a.contract_value_lei) AS seap_valoare_totala_lei, STRING_AGG(DISTINCT LEFT(a.cpv_code, 2), ',') AS seap_cpv_prefixes FROM gnm.amenzi_extrase ge JOIN gnm.comunicate c ON ge.comunicat_id = c.id LEFT JOIN seap.announcements a ON a.supplier_cui = ge.contravenient_cui AND a.cpv_code LIKE '45%' -- construction WHERE ge.contravenient_cui IS NOT NULL GROUP BY ge.contravenient_cui, ge.contravenient_name, ge.suma_lei, ge.fapta, c.titlu, c.publicat_la HAVING COUNT(DISTINCT a.id) > 0 ORDER BY ge.suma_lei DESC NULLS LAST; ``` ### 2. EU funds POIM-Mediu beneficiaries with GNM fines (the double-irony) ```sql -- POIM = Programul Operațional Infrastructură Mare (Mediu axis). -- A firm that receives EU money for environmental projects WHILE being fined -- by GNM for environmental violations is the headline scandal pattern. SELECT ge.contravenient_cui, ge.contravenient_name, ge.suma_lei AS gnm_amenda_lei, ge.fapta AS gnm_fapta, fb.proiect_titlu AS eu_proiect, fb.valoare_eligibila_eur AS eu_valoare_eur, fb.program_finantator AS eu_program FROM gnm.amenzi_extrase ge JOIN fonduri.beneficiar_proiect fb ON fb.beneficiar_cui = ge.contravenient_cui WHERE ge.contravenient_cui IS NOT NULL AND fb.program_finantator ILIKE '%POIM%' -- or ILIKE '%mediu%' for broader ORDER BY fb.valoare_eligibila_eur DESC NULLS LAST; ``` ### 3. Top GNM violators sorted by total fines mentioned across press releases ```sql SELECT contravenient_cui, MIN(contravenient_name) AS firma, COUNT(*) AS nr_mentions, SUM(suma_lei) AS total_amenzi_lei, STRING_AGG(DISTINCT judet, ', ') AS judete_implicate FROM gnm.amenzi_extrase WHERE contravenient_cui IS NOT NULL GROUP BY contravenient_cui ORDER BY total_amenzi_lei DESC NULLS LAST LIMIT 50; ``` ## Stage-B fuzzy matcher The scraper stores `contravenient_name_norm` but leaves `contravenient_cui` NULL. To populate CUIs, run the following after each scrape (idempotent — only updates rows where CUI is NULL): ```sql WITH unmatched AS ( SELECT id, contravenient_name_norm FROM gnm.amenzi_extrase WHERE contravenient_cui IS NULL AND contravenient_name_norm IS NOT NULL ) UPDATE gnm.amenzi_extrase a SET contravenient_cui = m.cui, cui_match_method = 'fuzzy_name', cui_match_score = m.score, matched_at = now() FROM ( SELECT u.id, f.cui, similarity(u.contravenient_name_norm, firms.normalize_company_name(f.name)) AS score FROM unmatched u CROSS JOIN LATERAL ( SELECT cui, name FROM firms.entities WHERE firms.normalize_company_name(name) % u.contravenient_name_norm ORDER BY similarity(firms.normalize_company_name(name), u.contravenient_name_norm) DESC LIMIT 1 ) f ) m WHERE a.id = m.id AND m.score >= 0.85; ``` ## Operational guidance * **Cron schedule:** weekly (Sundays 03:00) — RSS rarely changes, ~5-10 new articles per week. Use `SINCE_DAYS=14` for incremental runs after the first full backfill. * **Rate limits:** gnm.ro returns `RateLimit-Limit: 100/min`, `1000/hr`. We use ~36 requests per full scrape with 800 ms sleep — well within budget. * **Idempotency:** `gnm.comunicate` UPSERTs on `guid` (WordPress post ID, immutable). Skip when `raw_hash` unchanged. Re-extraction wipes only the child rows for changed articles. * **404 on page 36:** harmless — currently 35.8 pages so we 404 on the trailing empty fetch. Captured by retry loop, exits cleanly. ## Future enhancements (not in this hand-off) 1. **OCR of monthly synthesis PDFs** — IF in future they add per-judet tabular detail (currently 3 pages, totals only, OCR adds nothing). 2. **Annual report PDF** has more granular judet × sector breakdowns (waste / air / water / biodiversity) — could add a second extractor for `gnm.amenzi_per_judet_sector` aggregates. 3. **Local press archives** (e.g. monitoruldebuzau.ro, focuspress.ro) often name specific firms when GNM does press conferences regionally — could harvest via a curated whitelist of regional outlets that beat-cover GNM. Estimated +50-100 named firms/year. Risk: licensing. 4. **FOIA submissions** via the `gnm@gnm.ro` legea-544 path — could request the contravention register annually. Civic-tech precedent: prefectura.ro data was successfully unblocked this way in 2024. ## Time spent ~75 minutes: - 20 min investigation (gnm.ro / data.gov.ro / RSS reconnaissance) - 5 min schema design + apply - 35 min scraper write + 3 iterations to tune the regex extractors - 5 min Stage-B fuzzy match validation - 10 min documentation