Files

T

Claude VM a6c03a091e initial: split from gov-agreg — vreau.digital standalone platform

Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)

2026-05-13 00:10:32 +03:00

8.3 KiB

Raw Blame History

GNM — Garda Națională de Mediu (Hand-off Plan)

Status: PARTIAL. Source publishes only aggregate stats. We capture the publicly-named violators (the headline cases) — full per-CUI fines history is NOT available without an OUG 109/2007 access-to-info request.

Sources investigated (2026-05-10)

Source	URL	Verdict
gnm.ro homepage	https://www.gnm.ro/	Only links to PDFs + press releases
Annual reports	gnm.ro/rapoarte-si-note-de-activitate/	Aggregate stats only — `raport_activitate_<an>.pdf` (2012-2024)
Monthly synthesis	gnm.ro/wp-content/.../sinteza__.pdf	3-page PDFs: per-judet TOTALS only, no per-firm rows
Press releases	gnm.ro/noutati/	~358 articles, ~10% enforcement, sporadic firm names
RSS feed	gnm.ro/feed/?paged=N	Same articles, structured XML, 36 pages × 10 items
data.gov.ro `q=mediu`	45 datasets	Air quality, IPPC, SEVESO inventories — no fines dataset
ANPM rapoarte	anpm.ro	IPPC/SEVESO only (already covered by other agents)

Why per-CUI is impossible: GNM is exempt from the open-data registry obligation (OUG 109/2007). They cite "secret de serviciu" + "operatori economici personali date" for not publishing the contravention register. The only legal path is per-firm FOIA requests.

Schema applied

services/seap-scraper/sql/037_gnm.sql — three tables in schema gnm:

Table	Purpose	Rows after first run
`gnm.comunicate`	Raw archive of every press release (RSS)	348
`gnm.amenzi_extrase`	Regex-extracted (firm, fine_lei) tuples	1 (after dedup)
`gnm.scrape_log`	Run history (mirrors anre/ancom)	4

is_enforcement flag = 36/348 (10.3%) of articles match the /amenz|sancțiun|sistare|confiscat|sesizare penal/i filter.

Files added

services/seap-scraper/sql/037_gnm.sql                 (130 lines)
services/seap-scraper/src/scrape-gnm.ts               (~440 lines)
services/seap-scraper/cron/scrape-gnm.sh              ( 90 lines)
services/seap-scraper/GNM-PLAN.md                     (this file)

Sample ingest stats

First full backfill (2026-05-10):

seen=348 inserted=348 updated=0 skipped=0
enforcement=36 violators=2 → 1 after dedup
duration=58s

After running the Stage-B fuzzy matcher against firms.entities:

gnm.amenzi_extrase id=1
  contravenient_name = "Retim Ecologic Service SA"
  contravenient_cui  = 9112229   (RETIM ECOLOGIC SERVICE SA, jud. BIHOR)
  cui_match_score    = 1.0
  suma_lei           = 150000
  context            = "Depozitul de Deșeuri Nepericuloase Ghizela, operat de
                        Retim Ecologic Service SA. Operatorul a fost
                        sancționat cu 150.000 lei amendă..."

Realistic yield estimate

Press-release named violators per year ≈ 50-200 firms (out of ~5,000 actual fines). Coverage = 1-4%. Acceptable trade-off: the firms that appear in press releases are the biggest offenders (refineries, large landfills, mining operators) — exactly the firms most likely to also win SEAP contracts. The tail is invisible but the top of the distribution is captured.

Cross-source SQL recipes

1. Firms with GNM environmental fines that win SEAP construction contracts

-- Environmental violators winning state contracts.
-- Construction CPV codes start with 45; mining/extraction CPV 14/77.
SELECT
  ge.contravenient_cui,
  ge.contravenient_name,
  ge.suma_lei                                  AS gnm_amenda_lei,
  ge.fapta,
  c.titlu                                      AS gnm_articol,
  c.publicat_la                                AS gnm_data,
  COUNT(DISTINCT a.id)                         AS seap_contracte_castigate,
  SUM(a.contract_value_lei)                    AS seap_valoare_totala_lei,
  STRING_AGG(DISTINCT LEFT(a.cpv_code, 2), ',') AS seap_cpv_prefixes
FROM gnm.amenzi_extrase ge
JOIN gnm.comunicate c           ON ge.comunicat_id = c.id
LEFT JOIN seap.announcements a  ON a.supplier_cui = ge.contravenient_cui
                                AND a.cpv_code LIKE '45%'    -- construction
WHERE ge.contravenient_cui IS NOT NULL
GROUP BY ge.contravenient_cui, ge.contravenient_name, ge.suma_lei, ge.fapta,
         c.titlu, c.publicat_la
HAVING COUNT(DISTINCT a.id) > 0
ORDER BY ge.suma_lei DESC NULLS LAST;

2. EU funds POIM-Mediu beneficiaries with GNM fines (the double-irony)

-- POIM = Programul Operațional Infrastructură Mare (Mediu axis).
-- A firm that receives EU money for environmental projects WHILE being fined
-- by GNM for environmental violations is the headline scandal pattern.
SELECT
  ge.contravenient_cui,
  ge.contravenient_name,
  ge.suma_lei                       AS gnm_amenda_lei,
  ge.fapta                          AS gnm_fapta,
  fb.proiect_titlu                  AS eu_proiect,
  fb.valoare_eligibila_eur          AS eu_valoare_eur,
  fb.program_finantator             AS eu_program
FROM gnm.amenzi_extrase ge
JOIN fonduri.beneficiar_proiect fb ON fb.beneficiar_cui = ge.contravenient_cui
WHERE ge.contravenient_cui IS NOT NULL
  AND fb.program_finantator ILIKE '%POIM%'    -- or ILIKE '%mediu%' for broader
ORDER BY fb.valoare_eligibila_eur DESC NULLS LAST;

3. Top GNM violators sorted by total fines mentioned across press releases

SELECT
  contravenient_cui,
  MIN(contravenient_name) AS firma,
  COUNT(*)                AS nr_mentions,
  SUM(suma_lei)           AS total_amenzi_lei,
  STRING_AGG(DISTINCT judet, ', ') AS judete_implicate
FROM gnm.amenzi_extrase
WHERE contravenient_cui IS NOT NULL
GROUP BY contravenient_cui
ORDER BY total_amenzi_lei DESC NULLS LAST
LIMIT 50;

Stage-B fuzzy matcher

The scraper stores contravenient_name_norm but leaves contravenient_cui NULL. To populate CUIs, run the following after each scrape (idempotent — only updates rows where CUI is NULL):

WITH unmatched AS (
  SELECT id, contravenient_name_norm
  FROM gnm.amenzi_extrase
  WHERE contravenient_cui IS NULL AND contravenient_name_norm IS NOT NULL
)
UPDATE gnm.amenzi_extrase a
SET contravenient_cui  = m.cui,
    cui_match_method   = 'fuzzy_name',
    cui_match_score    = m.score,
    matched_at         = now()
FROM (
  SELECT u.id, f.cui,
         similarity(u.contravenient_name_norm, firms.normalize_company_name(f.name)) AS score
  FROM unmatched u
  CROSS JOIN LATERAL (
    SELECT cui, name
    FROM firms.entities
    WHERE firms.normalize_company_name(name) % u.contravenient_name_norm
    ORDER BY similarity(firms.normalize_company_name(name), u.contravenient_name_norm) DESC
    LIMIT 1
  ) f
) m
WHERE a.id = m.id AND m.score >= 0.85;

Operational guidance

Cron schedule: weekly (Sundays 03:00) — RSS rarely changes, ~5-10 new articles per week. Use SINCE_DAYS=14 for incremental runs after the first full backfill.
Rate limits: gnm.ro returns RateLimit-Limit: 100/min, 1000/hr. We use ~36 requests per full scrape with 800 ms sleep — well within budget.
Idempotency: gnm.comunicate UPSERTs on guid (WordPress post ID, immutable). Skip when raw_hash unchanged. Re-extraction wipes only the child rows for changed articles.
404 on page 36: harmless — currently 35.8 pages so we 404 on the trailing empty fetch. Captured by retry loop, exits cleanly.

Future enhancements (not in this hand-off)

OCR of monthly synthesis PDFs — IF in future they add per-judet tabular detail (currently 3 pages, totals only, OCR adds nothing).
Annual report PDF has more granular judet × sector breakdowns (waste / air / water / biodiversity) — could add a second extractor for gnm.amenzi_per_judet_sector aggregates.
Local press archives (e.g. monitoruldebuzau.ro, focuspress.ro) often name specific firms when GNM does press conferences regionally — could harvest via a curated whitelist of regional outlets that beat-cover GNM. Estimated +50-100 named firms/year. Risk: licensing.
FOIA submissions via the gnm@gnm.ro legea-544 path — could request the contravention register annually. Civic-tech precedent: prefectura.ro data was successfully unblocked this way in 2024.

Time spent

~75 minutes:

20 min investigation (gnm.ro / data.gov.ro / RSS reconnaissance)
5 min schema design + apply
35 min scraper write + 3 iterations to tune the regex extractors
5 min Stage-B fuzzy match validation
10 min documentation

8.3 KiB Raw Blame History Unescape Escape