initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix). - 22 pages migrated, 127 files total - All internal links: /achizitii/X → /X (176 occurrences fixed) - AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub - BaseLayout new (vreau.digital branding, OG tags, site URL) - astro.config.mjs: site https://vreau.digital, server output (was static) - docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital - deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log Backend shared with gov-agreg: - PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...) - Photon, Martin tiles - Infisical /vreaudigital path (DATABASE_URL etc. shared) build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
This commit is contained in:
@@ -0,0 +1,356 @@
|
||||
# Audit prospețime + completitudine — gov-agreg DB
|
||||
**Data:** 2026-05-10
|
||||
**Sub-agent:** G3 (data quality)
|
||||
**Bază date:** `architools_db` @ 10.10.10.166 — **dimensiune totală 29 GB**
|
||||
**Acoperire audit:** 17 schemas / 33 tabele de date (excludem staging și scrape_log)
|
||||
**Total rânduri reconciliat:** **17,907,148** (~17.9M, vs ~6.94M citate anterior — schimbarea majoră vine din `fonduri.afir_plati` cu 5.33M rânduri și `firms.entities` la 3.99M).
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive summary — Tabel sinteză 17 schemas
|
||||
|
||||
| Schema | Rânduri | Ultima înregistrare | Ultim scrape | Sursă (frecvență) | Gap | Acțiune | Prioritate |
|
||||
|---|---:|---|---|---|---|---|---|
|
||||
| **seap** | 4,011,832 | 2026-05-30 | 2026-05-10 | live API + WSP | live OK; gap 2020-21 + 2024 + DA pre-2025 | Backfill DA 2017-24 (~8M) + WSP retake 2020-21 | 🔴 |
|
||||
| **firms** | 8,640,978 | 2026-05-09 | 2026-05-09 | ONRC weekly | OK | menține cron weekly | 🟢 |
|
||||
| **fonduri** | 5,430,381 | 2026-05-10 | 2026-05-10 | data.gov.ro | OK | (afir 2025 nepublicat încă) | 🟢 |
|
||||
| **regas** | 78,546 | 2026-05-07 | 2026-05-09 | C.Concurenței lunar | OK | menține cron lunar | 🟢 |
|
||||
| **anaf** | 140,777 | 2016-03-31 (datornici!) | 2026-05-09 (no-op) | data.gov.ro Q | **3,693 zile** | scrape Q4 2025 (date nouă necesită captcha) | 🔴 |
|
||||
| **aep** | 379,977 | 2024-12-27 | 2026-05-09 | banipartide.ro | ~140 zile | re-scrape 2025 (anual e OK) | 🟡 |
|
||||
| **ani** | 25 PDFs / 0 parsate | 2023 | n/a | live ANI | parser ne-implementat | dezvoltare parser ANI 1.3M PDFs | 🔴 |
|
||||
| **bugetar** | 18,822 entități / 0 execuție | n/a | 2026-05-09 | mfinante.gov.ro | execuție 0 rows!!! | repară pipeline `bugetar.executie` | 🔴 |
|
||||
| **anre** | 29,536 | 2027-11-20 (data_emitere) | 2026-05-10 | live ANRE | OK; 2025 fresh | adaugă electricieni pipeline | 🟢 |
|
||||
| **ancom** | 3,054 | live | 2026-05-10 | live ANCOM | OK | menține cron | 🟢 |
|
||||
| **cnsc** | 29,488 | 2026 listing | 2026-05-10 | live CNSC | listing OK; 0% PDF parse | extracție decision_type din PDF (medium) | 🟡 |
|
||||
| **cnas** | 36,244 (61 doc + 36k furnizori) | 2025-03-31 | 2026-05-10 | WP media CNAS | OK; 0% CUI match | activează matcher CUI | 🟡 |
|
||||
| **asf** | 849 | 2022-12-19 | 2026-05-10 | live ASF | OK (nightly) | menține | 🟢 |
|
||||
| **aaas** | 11 | n/a (last_action_date NULL) | 2026-05-10 | aaas.ro portfolio | only 11 firme — incomplete | backfill ORDIN 278/2005 PDF (~150 firme) | 🟡 |
|
||||
| **curteacont** | 1,133 | 2026-05-15 | 2026-05-10 | live curteadeconturi.ro | listing OK; 0% PDF + 0 CUI | Stage 2 detail-page resolve | 🟡 |
|
||||
| **apia** | 191 | 2024 (campaign) | 2026-05-10 | data.gov.ro CKAN | doar 1 CUI matched (191 PF) | re-rulează matcher cu fuzzy + adaugă camp.2025 | 🔴 |
|
||||
| **gnm** | 349 (348 com + 1 amendă) | 2026-03-18 | 2026-05-10 | live gnm.ro RSS | listing OK; 0.6% amenzi parsate | finalizează Stage B (fuzzy matcher live) | 🟡 |
|
||||
|
||||
Legendă: 🟢 sănătos · 🟡 are gap-uri rezolvabile <2 zile · 🔴 problemă structurală sau backlog mare
|
||||
|
||||
---
|
||||
|
||||
## 2. Per-schema deep dive
|
||||
|
||||
### 2.1 SEAP (`seap.*`)
|
||||
|
||||
| Tabel | Rânduri | Min - Max date | Distinct CUI authority/supplier |
|
||||
|---|---:|---|---|
|
||||
| `announcements` | **781,029** | 2015-04-29 → 2026-05-30 | 14,616 / 65,643 |
|
||||
| `direct_acquisitions` | **2,229,285** | 2025-01-01 → 2025-12-31 | 14,642 / 74,239 |
|
||||
| `cui_location` | 96,523 | upd 2026-04-13 → 2026-05-09 | 96,523 |
|
||||
| `entities` | 432 | 2026-04-13 (one shot) | 430 |
|
||||
| `cpv_codes` | ~9,500 | static | — |
|
||||
| `public_notices`, `notice_contracts` | **0 / 0** | gol | (legacy goale) |
|
||||
|
||||
**Distribuție anuală announcements:**
|
||||
```
|
||||
2015: 4,368 2016: 39 2017: 26,871 2018: 17,871
|
||||
2019: 16,570 2020: 0 2021: 0 2022: 24,676
|
||||
2023: 46,996 2024: 750 2025: 607,256 2026: 26,178
|
||||
```
|
||||
**Probleme observate:**
|
||||
- ❌ **2020 + 2021 lipsă completă** (gap de 2 ani — confirmat în CLAUDE.md). Sursa: WSP scraper a sărit fereastra când a fost lansat în 2022.
|
||||
- ❌ **2024 cvasi-absent** (doar 750 rows în martie). Backfill nu a recoperit 2024.
|
||||
- ❌ **direct_acquisitions doar pentru 2025** (2,2M rows!) — istoric 2017-2024 = ~8M rânduri pierdute. CLAUDE.md confirmă "direct procurement 2017-2024 not ingested (~8M rows pending)".
|
||||
- ❌ `seap.sync_state` arată feed `da` în `running` din **2025-10-16**, ultim update 2026-04-13 — backfill istoric blocat, nu mai progresează.
|
||||
- ❌ `wsp_sync_state` nu a mai rulat din **2026-05-07** (3 zile stale; scraper rulează cron 2-4 ori/zi de obicei).
|
||||
- ❌ `seap.public_notices` și `seap.notice_contracts` complet goale (legacy schema sau pipeline dezactivat).
|
||||
- ⚠️ TED import: `import_ted.py` linia 22-38 — array `FIELDS` **NU conține `'publication-date'`**, deși codul îl folosește la linia 152. Toate `publication_date` din TED sunt **NULL** (1-line fix).
|
||||
|
||||
**Completitudine recentă:** announcements ultimele 30 zile = 3,474 rânduri ✅. DA ultimele 30 zile = **0** ❌.
|
||||
|
||||
### 2.2 firms (`firms.*`)
|
||||
|
||||
| Tabel | Rânduri | Coverage |
|
||||
|---|---:|---|
|
||||
| `entities` | **3,985,967** | 3.99M total · 3.32M active ANAF · 3.74M cu CAEN · 3.64M geocodate · 2.62M cu reprezentanți |
|
||||
| `financials` | 4,245,749 | 2020-2024 · 1.18M CUI distincți |
|
||||
| `financials_banks` | 66 | 2024 |
|
||||
| `financials_ong` | 286,240 | 2020-2024 · 74,862 ONG |
|
||||
| `reprezentanti_if` | 122,956 | sucursale UE |
|
||||
|
||||
**Completitudine:**
|
||||
- 91.3% au CAEN (`caen_principal NOT NULL`)
|
||||
- 91.3% sunt geocodate (`lat NOT NULL`)
|
||||
- 65.7% au reprezentanți legali în JSON
|
||||
- 83.4% activi ANAF (restul radiate / suspendate)
|
||||
|
||||
**Probleme:** niciuna critică. Last update 2026-05-09. Cron weekly OK. Există `staging_onrc_*` (~3GB) — probabil de șters după backfill.
|
||||
|
||||
### 2.3 fonduri (`fonduri.*`)
|
||||
|
||||
| Tabel | Rânduri | Date range | CUI matched |
|
||||
|---|---:|---|---|
|
||||
| `afir_plati` | **5,329,006** | source_year 2023-2024 | 37,647 distincți |
|
||||
| `beneficiar_anunt` | 41,494 | 2013-10 → 2026-05-08 | 8,772 |
|
||||
| `beneficiar_anunt_lot` | 48,392 | — | — |
|
||||
| `beneficiar_proiect` | 11,489 | 2010-05 → 2026-05-08 | **0 matched** ⚠️ |
|
||||
|
||||
**Probleme:**
|
||||
- `beneficiar_proiect` are 11,489 rânduri dar **0 CUI matched** (column `cui` populat?, dar `count(distinct cui)` = 0 — necesită investigație: probabil toate NULL).
|
||||
- AFIR plăți istoric 2007-2022 nepublicat (sursa data.gov.ro publică doar 2023-2024 unificat).
|
||||
- AFIR 2025 — sursa de obicei publică în Q1 anul următor; nu e gap real, e timing.
|
||||
|
||||
### 2.4 regas (`regas.ajutoare`)
|
||||
|
||||
- **78,546 rânduri**, 2016-01-13 → 2026-05-07 (live, lunar)
|
||||
- 23,805 CUI distincți cu ajutoare de stat
|
||||
- Distribuție: 2020-2023 sunt anii vârf (12k-21k/an), 2024 = 10,245, 2025 abia început
|
||||
- ✅ **Sănătos** — last fetch 2026-05-09
|
||||
|
||||
### 2.5 anaf (`anaf.*`)
|
||||
|
||||
| Tabel | Rânduri | Min/Max date | Status |
|
||||
|---|---:|---|---|
|
||||
| `datornici` | 140,777 | **2016-03-31** *(static!)* | 🔴 stale ~10 ani |
|
||||
| `lista_alba` | **0** | — | gol |
|
||||
| `datornici_latest` | view | — | reflect static |
|
||||
|
||||
**Probleme catastrofale:**
|
||||
- `anaf.datornici` are **doar Q1 2016** (publication_date = 2016-03-31). Sursa data.gov.ro publică trimestrial; ultimul Q4 2025 ar trebui ingerat.
|
||||
- `anaf.lista_alba` complet gol — 0 rânduri.
|
||||
- CLAUDE.md confirmă blocaj: "ANAF datornici via 2captcha" — site-ul actual ANAF cere captcha, ingestul automat a fost blocat după 2016.
|
||||
|
||||
### 2.6 aep (`aep.*`)
|
||||
|
||||
| Tabel | Rânduri | Min/Max | Note |
|
||||
|---|---:|---|---|
|
||||
| `donatii_pf` | 30,173 | 1997-03-29 → 2024-12-27 | persoane fizice |
|
||||
| `donatii_pj` | 3,567 | 2000-05-16 → 2024-12-13 | persoane juridice (2,148 CUI distincți) |
|
||||
| `donatii_rvc` | **346,237** | 2000-01-11 → **2034-01-31** ⚠️ | venituri (date eronate viitor) |
|
||||
| `partide` | 64 | — | partide active |
|
||||
|
||||
**Probleme:**
|
||||
- ✅ Coverage 2024 prezent — bun.
|
||||
- ⚠️ `donatii_rvc` are date până la **2034-01-31** — câteva rânduri cu data eronată în viitor (probabil OCR error pe banipartide.ro).
|
||||
- ⚠️ Surse 2025 lipsă pentru toate sursele AEP (raportările partidelor pe 2025 se publică abia Q2 2026).
|
||||
|
||||
### 2.7 ani (`ani.*`)
|
||||
|
||||
| Tabel | Rânduri |
|
||||
|---|---:|
|
||||
| `declaratii` | **25** (toate `parse_status='pending'`) |
|
||||
| `officials`, `bunuri`, `donatii`, `functii`, `shareholdings` | **0** |
|
||||
|
||||
**Status:** Schema definită, **pipeline ne-implementat**. CLAUDE.md confirmă: "ANI 1.3M PDFs" — multi-week effort.
|
||||
|
||||
### 2.8 bugetar (`bugetar.*`)
|
||||
|
||||
| Tabel | Rânduri |
|
||||
|---|---:|
|
||||
| `entitate` | 18,822 (6,564 cu CUI matched, 12,258 fără) |
|
||||
| `executie` | **0** ❌ |
|
||||
| `crawl_job` | **0** ❌ |
|
||||
|
||||
**Probleme catastrofale:**
|
||||
- `bugetar.entitate` populat cu 18,822 entități publice, dar `executie` și `crawl_job` complet goale.
|
||||
- Pipeline-ul mfinante.gov.ro pentru execuție bugetară nu rulează (sau rulează dar respinge toate datele).
|
||||
|
||||
### 2.9 anre (`anre.*`)
|
||||
|
||||
| Tabel | Rânduri | Stare |
|
||||
|---|---:|---|
|
||||
| `licente` | **29,536** | 1999-09-20 → 2027-11-20 (autorizări viitoare incluse) |
|
||||
| `electricieni` | **0** | nu rulează |
|
||||
| Source breakdown | atestat: 23,996 · electricitate: 4,541 · gaze: 999 | |
|
||||
|
||||
**Distribuție stare:** 11,957 expirate · 8,077 atestate · 3,436 retrase · 1,332 acordate · ~5k alte stări.
|
||||
**Problemă:** `anre.electricieni` complet gol — pipeline pentru registrul electricienilor neimplementat sau eșuat.
|
||||
|
||||
### 2.10 ancom (`ancom.*`)
|
||||
|
||||
| Tabel | Rânduri |
|
||||
|---|---:|
|
||||
| `operatori` | 518 (toți cu CUI matched ✅) |
|
||||
| `drepturi` | 2,536 (1,311 servicii + 1,225 rețea) |
|
||||
|
||||
✅ **Sănătos** — registru live, 100% CUI match. Last fetch 2026-05-10.
|
||||
|
||||
### 2.11 cnsc (`cnsc.decizii`)
|
||||
|
||||
- **29,488 rânduri**, distribute pe 2015-2026 (medie ~2,800/an)
|
||||
- **0% au `decision_type`, `decision_summary`, `pdf_text_sha1`** — listing OK, dar PDF-uri **complet neparsate**
|
||||
- CLAUDE.md target: "50/page × 617 pages = ~30,850" — captura curentă (29,488) ≈ 96% din target. ✅ aproape complet.
|
||||
- Last fetch 2026-05-10.
|
||||
|
||||
### 2.12 cnas (`cnas.*`)
|
||||
|
||||
| Tabel | Rânduri | Status |
|
||||
|---|---:|---|
|
||||
| `documents` | 61 (46 ok · 14 no_table · 1 unsupported) | 2022-03 → 2025-03 |
|
||||
| `furnizori` | **36,183** | **0 CUI matched** ⚠️ |
|
||||
|
||||
**Probleme:**
|
||||
- 100% furnizori extrași, **0% matched la CUI** — câmpul `cui_match_method` este gol pentru toate rândurile.
|
||||
- 25% PDF-uri (15/61) eșuat la parsing (no_table sau format necunoscut).
|
||||
|
||||
### 2.13 asf (`asf.entitati`)
|
||||
|
||||
- **849 rânduri** (788 brokeri + 61 asigurători)
|
||||
- Live nightly, `data_autorizare` 1900-2022 (1900 = data lipsă în sursă)
|
||||
- ✅ Sănătos.
|
||||
|
||||
### 2.14 aaas (`aaas.firme`)
|
||||
|
||||
- **11 firme** (toate `aaas_status='active_holding'`)
|
||||
- **`last_action_date` = NULL pentru toate** — câmp ne-populat
|
||||
- CLAUDE.md target: "12-15 firme active portfolio" — captura curentă (11) ≈ 73-92% din target.
|
||||
- ❌ Backfill **ORDIN 278/2005** PDF (~150 firme istorice) **deferred**.
|
||||
|
||||
### 2.15 curteacont (`curteacont.rapoarte`)
|
||||
|
||||
- **1,133 rânduri**: 500 conformitate + 499 financiar + 114 follow-up + 20 performanță
|
||||
- Last finished_at: 2026-05-10 (Stage 1 = listing OK)
|
||||
- ❌ **0% au `pdf_path`** (zero PDF-uri descărcate)
|
||||
- ❌ **0% au `audited_entity_cui`** (entitatea auditată nu e extrasă)
|
||||
- ❌ **0% `parsed_at`** — Stage 2 (detail-page resolve) ne-implementat
|
||||
- audit_year: 2021(1), 2022(5), 2023(74), 2024(415), 2025(4)
|
||||
|
||||
### 2.16 apia (`apia.fermieri`)
|
||||
|
||||
- **191 rânduri** — campania 2024
|
||||
- ⚠️ **Doar 1 CUI matched** — 190/191 sunt PF (persoane fizice fără CUI), legitim, dar și **PJ-urile nu sunt matchuite**
|
||||
- CLAUDE.md target: "monthly via CKAN" — sursa publică doar lista anuală
|
||||
- Lipsește campania 2025 (în mod normal disponibilă din martie 2026)
|
||||
- Sub-utilizat — datasetul real APIA are ~800k fermieri/an, captura noastră are 191 (probabil un eșantion)
|
||||
|
||||
### 2.17 gnm (`gnm.*`)
|
||||
|
||||
| Tabel | Rânduri |
|
||||
|---|---:|
|
||||
| `comunicate` | 348 |
|
||||
| `amenzi_extrase` | **1** |
|
||||
|
||||
- Distribuție: 2016(23), 2020-2023(8-51/an), 2024(51), 2025(92), **2026(5)**
|
||||
- Last `publicat_la` = 2026-03-18 (~7 săptămâni stale față de scrape 2026-05-10)
|
||||
- 36/348 (10%) flagged `is_enforcement=true`, 20/348 (5.7%) cu `total_amenzi_lei`
|
||||
- Stage B fuzzy matcher recent comise (cf. commit `82b64b3`) dar a produs doar 1 amendă — pipeline necesită testare.
|
||||
|
||||
---
|
||||
|
||||
## 3. Quick wins (≤2h fixes — ranking by impact)
|
||||
|
||||
| # | Fix | Schema | Effort | Impact | Comandă/path |
|
||||
|---|---|---|--:|---|---|
|
||||
| 1 | **Adaugă `'publication-date'` în `FIELDS` array (TED import)** | seap (TED) | 5 min | 100% TED publication_date populat | `services/seap-scraper/import_ted.py` linia 22-38 |
|
||||
| 2 | **Re-rulează scraper SEAP WSP** (3 zile stale, sync_state blocat la 2025-10-16) | seap | 30 min | recoperare daily live + deblochează backfill istoric | `services/seap-scraper/wsp/` + `seap.sync_state` reset manual |
|
||||
| 3 | **Re-rulează matcher CUI pentru `cnas.furnizori`** (36k rows, 0% matched) | cnas | 20 min | 36k furnizori legabili la entități firme | `services/seap-scraper/cron/match-cui-external.sh` (extindere) |
|
||||
| 4 | **Re-rulează matcher CUI pentru `apia.fermieri`** | apia | 10 min | match PJ (cu CUI explicit) la firms.entities | `cron/match-cui-external.sh` |
|
||||
| 5 | **Curățare date eronate `aep.donatii_rvc`** (date 2034-01-31) | aep | 10 min | UPDATE … SET data_donatie = NULL WHERE data_donatie > now() | direct SQL |
|
||||
| 6 | **Re-rulează scrape AEP donatii** pentru 2025 | aep | 1 h | adaugă raportările financiare 2024 finale | `cron/scrape-aep-donatii.sh` |
|
||||
| 7 | **Drop staging tables firms.staging_onrc_*** (~3GB liberi) | firms | 5 min | recuperare spațiu DB după backfill | DROP TABLE manual |
|
||||
| 8 | **Drop seap.public_notices, seap.notice_contracts** (legacy goale) | seap | 1 min | curățare schema | DROP TABLE |
|
||||
| 9 | **Repornire scraper GNM** (last comunicat 2026-03-18, gap 53 zile) | gnm | 15 min | aducerea la zi a comunicatelor martie-mai 2026 | `cron/scrape-gnm.sh` |
|
||||
|
||||
**Total quick wins recomandate: ~3h** pentru a rezolva 9 issues cu impact direct vizibil.
|
||||
|
||||
---
|
||||
|
||||
## 4. Medium effort (1-2 zile fiecare)
|
||||
|
||||
| # | Fix | Schema | Effort | Impact |
|
||||
|---|---|---|--:|---|
|
||||
| 1 | **CNSC PDF parse pentru `decision_type` + `decision_summary`** | cnsc | 1-2 zile | 29,488 decizii devin filtrabile pe tip (admisă/respinsă) |
|
||||
| 2 | **Curtea Conturi Stage 2** — detail-page resolve + extract `audited_entity_cui` + descarcă PDF | curteacont | 2 zile | 1,133 rapoarte legate la CUI + PDF disponibile |
|
||||
| 3 | **AAAS ORDIN 278/2005 backfill** — parse PDF cu lista istorică ~150 firme | aaas | 1 zi | 11 → ~150 firme acoperire (12-13× growth) |
|
||||
| 4 | **bugetar.executie pipeline repair** — entitate populat dar executie 0 rows | bugetar | 1-2 zile | adaugă date execuție pe ~6,564 instituții cu CUI matched |
|
||||
| 5 | **APIA campania 2025** + **fixează volumul** (191 rânduri pare mic vs ~800k fermieri reali) | apia | 1 zi | datasetul devine real reprezentativ |
|
||||
| 6 | **CNAS PDF parse upgrade** pentru 14 doc cu `parse_status='no_table'` | cnas | 1 zi | +25% acoperire furnizori CNAS |
|
||||
| 7 | **GNM Stage B finalizare** — fuzzy matcher activ pe toate cele 348 comunicate (acum capturat 1/348) | gnm | 1 zi | extragerea efectivă a violatorilor de mediu |
|
||||
| 8 | **ANRE electricieni** — pipeline neimplementat | anre | 1 zi | adaugă registrul electricienilor (~10k entries) |
|
||||
| 9 | **Reset `seap.sync_state` pentru `da`** (blocat în `running` din 2025-10-16) | seap | 30 min + replay | deblochează backfill direct_acquisitions |
|
||||
| 10 | **anaf.lista_alba** populare din data.gov.ro | anaf | 1 zi | listă albă completă (paralel datornici) |
|
||||
| 11 | **`fonduri.beneficiar_proiect` matcher CUI** (11,489 rows, 0 matched) | fonduri | 1 zi | proiectele POIM/POR devin filtrabile pe CUI |
|
||||
|
||||
---
|
||||
|
||||
## 5. Heavy lifts (multi-week)
|
||||
|
||||
| # | Investiție | Schema | Effort | Impact |
|
||||
|---|---|---|--:|---|
|
||||
| 1 | **ANI 1.3M PDFs** — declaratii avere + interese, parser + match officials | ani | **4-6 săptămâni** | unlock declaratii politicieni — feature flagship |
|
||||
| 2 | **SEAP direct_acquisitions backfill 2017-2024** — ~8M rânduri | seap | **2-3 săptămâni** | acoperire achiziții directe completă (acum doar 2025) |
|
||||
| 3 | **SEAP announcements backfill 2020-2021** + **2024 lipsă** | seap | **1-2 săptămâni** | închidere gap istoric anunțuri |
|
||||
| 4 | **ANAF datornici via 2captcha** — re-acoperire 2017-2025 (33 trimestre stale) | anaf | **2-3 săptămâni** | reactivare datornici (acum static la Q1 2016) |
|
||||
| 5 | **Curtea Conturi PDF text extraction + entity resolution** | curteacont | **3-4 săptămâni** | rapoarte audit devin căutabile pe text + linked la firme |
|
||||
| 6 | **ONRC raw → entities pipeline complet** (există staging 791MB + 938MB + 443MB nefolosit) | firms | **2 săptămâni** | refresh weekly al `firms.entities` din ONRC fresh dump |
|
||||
|
||||
---
|
||||
|
||||
## 6. Refresh cadence recommendation (cron schedule sustenabil)
|
||||
|
||||
Propunere `/etc/cron.d/govagreg-refresh` pentru steady-state:
|
||||
|
||||
```cron
|
||||
# === LIVE / NEAR-REAL-TIME (multiple ori pe zi) ===
|
||||
0 */4 * * * satra scrape-seap-wsp.sh # SEAP live feed (4h cycle, ~3-4k rows/zi)
|
||||
30 2 * * * satra scrape-cnsc.sh # CNSC daily (~30 decizii noi/zi)
|
||||
|
||||
# === DAILY (o dată pe zi, off-peak 02:00-06:00) ===
|
||||
0 3 * * * satra scrape-anre.sh # ANRE licențe (live registry)
|
||||
0 4 * * * satra scrape-ancom.sh # ANCOM operatori (live)
|
||||
0 5 * * * satra scrape-asf.sh # ASF entitati (rebuilt nightly)
|
||||
30 5 * * * satra scrape-curteacont.sh # Curtea Conturi listing (Stage 1)
|
||||
0 6 * * * satra refresh-mvs.sh # MV refresh (post-toate-scrape-urile)
|
||||
|
||||
# === WEEKLY (luni dimineață) ===
|
||||
0 2 * * 1 satra scrape-gnm.sh # GNM weekly RSS (~5-15 noi)
|
||||
0 3 * * 1 satra scrape-aaas.sh # AAAS portfolio (rar schimbă)
|
||||
0 4 * * 1 satra scrape-cnas.sh # CNAS WP media (lunar dar ieftin weekly)
|
||||
0 5 * * 1 satra import-onrc-fresh.sh # ONRC update săptămânal
|
||||
|
||||
# === MONTHLY (1 ale lunii) ===
|
||||
0 2 1 * * satra scrape-regas.sh # RegAS — monthly publish
|
||||
0 3 1 * * satra scrape-bugetar.sh # Bugetar mfinante (lunar)
|
||||
0 5 1 * * satra import-apia-fermieri.sh # APIA CKAN
|
||||
|
||||
# === QUARTERLY (1 ale trim) ===
|
||||
0 2 1 1,4,7,10 * satra scrape-anaf-datornici.sh # ANAF datornici Q (după activare 2captcha)
|
||||
0 3 15 1,4,7,10 * satra scrape-aep-donatii.sh # AEP — raportări trimestriale partide
|
||||
|
||||
# === ANUAL (15 ianuarie) ===
|
||||
0 2 15 1 * satra import-afir-historical.sh # AFIR plăți an precedent (CSV)
|
||||
0 4 15 1 * satra import-financials.sh # Bilanțuri ANAF anul precedent
|
||||
```
|
||||
|
||||
### Estimări runtime per scraper (best-effort, observed)
|
||||
|
||||
| Scraper | Frecv | Runtime | Notes |
|
||||
|---|---|---|---|
|
||||
| scrape-seap-wsp | 4h | 5-15 min | depinde de volum daily |
|
||||
| scrape-cnsc | daily | 2-5 min | (full re-scan ~617 pages = 30 min) |
|
||||
| scrape-anre | daily | 3-5 min | 3 surse (atestat/electricitate/gaze) |
|
||||
| scrape-ancom | daily | 1-2 min | 518 operatori |
|
||||
| scrape-asf | daily | 2-3 min | 849 entități |
|
||||
| scrape-curteacont | daily | 1-3 min | listing only |
|
||||
| scrape-gnm | weekly | 1-2 min | RSS feed |
|
||||
| scrape-aaas | weekly | 30 sec | 11 firme |
|
||||
| scrape-cnas | weekly | 5-10 min | 61 PDF + parse |
|
||||
| import-onrc-fresh | weekly | 30-60 min | 4M rows ETL |
|
||||
| scrape-regas | monthly | 10-15 min | 78k rows update |
|
||||
| scrape-bugetar | monthly | 30-60 min | 6,5k rapoarte |
|
||||
| import-apia-fermieri | monthly | 5-10 min | CKAN API |
|
||||
| scrape-anaf-datornici | quarterly | 30-60 min | dependent de captcha |
|
||||
| import-afir-historical | yearly | 2-4 ore | 5M rows CSV |
|
||||
|
||||
**Total cron load:** ~30 min CPU/zi în steady-state, ~2h/lună în rafale lunare. Sustenabil pe `satra` Docker host.
|
||||
|
||||
---
|
||||
|
||||
## Concluzie executivă (200 cuvinte)
|
||||
|
||||
Baza de date `architools_db` (29 GB) conține 17.9M rânduri pe 17 schemas. **6 schemas sunt sănătoase** (firms, fonduri, regas, anre, ancom, asf), **6 au gap-uri rezolvabile sub 2 zile** (aep, cnsc, cnas, aaas, curteacont, gnm), iar **5 au probleme structurale** (seap istoric, anaf datornici stale 10 ani, ani neimplementat, bugetar executie 0 rows, apia subvolum).
|
||||
|
||||
**Quick wins (3h total):** (1) adaugă `'publication-date'` în `FIELDS` la `import_ted.py`, (2) reset `seap.sync_state` pentru deblocare backfill DA, (3) rerulează matcher CUI pentru `cnas.furnizori` (36k rows, 0% match) și `apia.fermieri`.
|
||||
|
||||
**Priorități critice:** (a) backfill SEAP DA 2017-2024 = ~8M rânduri lipsă (CLAUDE.md confirmat), (b) reactivare ANAF datornici via 2captcha (date înghețate la Q1 2016), (c) repară pipeline `bugetar.executie` (entități populate dar execuție 0).
|
||||
|
||||
Cron-ul propus rulează în 30 min CPU/zi steady-state. ANI 1.3M PDFs rămâne flagship-ul de 4-6 săptămâni — singura sursă cu adevărat blocată din cauze tehnice (parser PDF complex), restul sunt operaționale.
|
||||
|
||||
---
|
||||
|
||||
**Raport complet:** `/home/orchestrator/Code/gov-agreg/chatGPT/data-quality/freshness-audit-2026-05-10.md`
|
||||
@@ -0,0 +1,62 @@
|
||||
# Geocoding strategy — firms.entities
|
||||
|
||||
Data: 2026-05-11. Sub-agent A2.
|
||||
|
||||
## Final coverage
|
||||
|
||||
| Source | Rows | Accuracy | Notes |
|
||||
|---|---:|---|---|
|
||||
| `geonames_postal` | 2,128,990 | ~100m–2km | Exact 5/6-digit RO postal match against geonames RO.zip (firms.postal_codes). |
|
||||
| `photon` | 839,643 | ~50–500m | Komoot Photon OSM geocoder, free-text `adr_full`. Earlier batch (services/seap-scraper/src/geocode-photon.ts). |
|
||||
| `uat_centroid` | 670,657 | 5–30km | UAT polygon centroid match by locality+county. |
|
||||
| `judet_centroid` | 346,675 | 30–150km | Median of all postal codes within the judet. Filled the 2026-05-11 gap where `judet_fallback` was tagged but lat/lng never written. |
|
||||
| `seap_siruta_centroid` | 4,681 | 5–30km | NEW stub rows for SEAP-only CUIs (not present in ONRC firme dataset) using SIRUTA → gis_uats centroid. |
|
||||
| `seap_judet_centroid` | 2,497 | 30–150km | NEW stub rows for SEAP-only CUIs with city/county data in seap.cui_location. |
|
||||
| _unmapped_ | 2 | — | Two firms with literally zero address fields. Out of reach. |
|
||||
|
||||
**Total: 3,993,143 / 3,993,145 = 100.00 %.**
|
||||
|
||||
## Fallback chain (priority order)
|
||||
|
||||
For any new row entering firms.entities, apply in this order, stop at first hit:
|
||||
|
||||
1. **Postal-code exact match** → `firms.postal_codes.postal_code = adr_cod_postal` (5/6 digit). Source = `geonames_postal`.
|
||||
2. **Postal-code normalized** (strip non-digit), same lookup. (Adds ~9K to the bucket — already covered in current dataset.)
|
||||
3. **Photon free-text** on `adr_full` (OSM geocoder, requires network — see geocode-photon.ts).
|
||||
4. **UAT centroid** by `(adr_localitate, adr_judet)` → `firms.postal_codes` median of matching place_name + county_code, OR `public.gis_uats` polygon centroid.
|
||||
5. **Judet centroid** — median of all `firms.postal_codes` rows for the normalized judet name (`upper(unaccent(replace(adr_judet,'MUNICIPIUL ','')))`). 42 distinct judet keys cover all of RO + București.
|
||||
6. **SIRUTA centroid** — for SEAP-mentioned CUIs only, where firms.entities row didn't exist: `seap.announcements.{authority,supplier}_siruta` → `gis_uats.siruta` centroid (transformed 3844→4326).
|
||||
7. **City+county from seap.cui_location** → judet centroid fallback (`seap_judet_centroid`).
|
||||
|
||||
## Authority / supplier coverage (downstream)
|
||||
|
||||
After backfill, JOIN-based coverage from SEAP:
|
||||
|
||||
| Bucket | Total distinct CUIs | Geocoded | Pct |
|
||||
|---|---:|---:|---:|
|
||||
| authority_cui | 14,617 | 14,119 | 96.6 % |
|
||||
| supplier_cui | 65,675 | 64,793 | 98.7 % |
|
||||
|
||||
Residual: 498 authorities + 882 suppliers (~1,373 unique) — these CUIs appear nowhere with address data (no siruta, no city/county in seap.cui_location, no usable address in any announcement). Most are malformed CUI strings (commas, semicolons, trailing punctuation) — should be cleaned up at SEAP ingestion. Out of scope for geocoding.
|
||||
|
||||
## Cross-schema enrichment
|
||||
|
||||
- `aaas.firme` — 11 rows total, all 11 have geocoded parent in firms.entities via CUI. No action needed; UI agents JOIN.
|
||||
- `anre.licente` — 27,275 rows with titular_cui populated, 11,043 distinct. All 11,043 CUIs match a geocoded firm. UI agents JOIN on `firms.entities.cui = anre.licente.titular_cui`.
|
||||
- `seap.announcements` — `supplier_address`, `authority_address`, `supplier_siruta`, `authority_siruta` are populated. After this batch, almost every announcement can render on a map via firms.entities lookup.
|
||||
|
||||
## Geom integrity
|
||||
|
||||
- `firms.entities.geom` (geography 4326) is now 1:1 with lat/lng (12,735 prior mismatches fixed where judet_fallback had stale geom from an older run).
|
||||
- 2 unmapped firms have NULL on both. PostGIS spatial indexes still valid.
|
||||
|
||||
## Forward maintenance
|
||||
|
||||
1. Anyone ingesting new firms (ANAF/ONRC weekly refresh) must apply the fallback chain in code before INSERT.
|
||||
2. The seap_siruta_centroid and seap_judet_centroid stubs should be **upgraded** the moment an ANAF/ONRC record arrives for the same CUI — re-run the chain with the real `adr_full`.
|
||||
3. If the SEAP CUI hygiene gets fixed (A1's domain), the 1,373 residual can be re-attempted.
|
||||
4. `judet_centroid` (and the two seap variants) have only `geocode_score = 0.1` and `0.3`. UI clustering should down-weight or hide these at high zoom.
|
||||
|
||||
## Queries used
|
||||
|
||||
All idempotent UPDATEs filtered on `lat IS NULL`. Centroid sources read from `firms.postal_codes` and `public.gis_uats` (SRID 3844 → 4326). Saved in-line in the agent transcript; the strategy itself is the artifact.
|
||||
@@ -0,0 +1,624 @@
|
||||
# Refresh cadence master strategy — gov-agreg / vreaudigital.ro
|
||||
**Data:** 2026-05-11
|
||||
**Sub-agent:** S1 (refresh cadence master strategy)
|
||||
**Bază date:** `architools_db` @ 10.10.10.166 — 29 GB
|
||||
**Cuprinde:** 17 schemas, 2 sub-pipeline-uri (ANAF v9 + ANAF datornici), strategie captcha, monitorizare, idempotență, DR
|
||||
**Audit-ul de prospețime anterior:** `chatGPT/data-quality/freshness-audit-2026-05-10.md`
|
||||
|
||||
---
|
||||
|
||||
## 0. Context & constrângeri
|
||||
|
||||
| Constrângere | Stare actuală |
|
||||
|---|---|
|
||||
| Host orchestrare | `satra` (10.10.10.166), Docker, Ubuntu, **disc la 85% (299/371 GB)** ⚠️ |
|
||||
| Sistem de scheduling | systemd timers (3 active) + ad-hoc shell wrappers; **nu există crontab agregat pentru toți 13 scraperi** |
|
||||
| Secrete | Infisical Machine Identity (`/opt/vreaudigital/.infisical-mi`) — refresh per wrapper |
|
||||
| Anti-pattern interzis | `docker run -e $DATABASE_URL` (leakă via `ps`); folosim `--env-file` 600 + delete |
|
||||
| Run-as | `bulibasa` (systemd), `root` (cron actual eterra/backup) |
|
||||
| Captcha sources | ANAF datornici live, Bugetar Faza 2, ANI e-DAI 2022+ (Cloudflare Turnstile) |
|
||||
| Buget | Mic — 2captcha (~$1/1000), playwright headless OK, headed pe Orchi doar la nevoie |
|
||||
|
||||
**Stat actual systemd (verificat azi):**
|
||||
- `vreaudigital-anaf-daily.timer` → 02:00 zilnic, enrich-anaf.sh tier=daily, concurrency=2
|
||||
- `vreaudigital-onrc-weekly.timer` → marți 03:00, import-onrc-fresh.sh
|
||||
- `vreaudigital-mvs.timer` → 04:00 zilnic, refresh-mvs.sh (9 MV-uri seap)
|
||||
|
||||
**13 wrappers existente NE-programate prin systemd** (rulează doar manual sau via cron neagregat încă):
|
||||
`scrape-aaas`, `scrape-aep-donatii`, `scrape-anaf-datornici`, `scrape-ancom`, `scrape-anre`, `scrape-asf`, `scrape-bugetar`, `scrape-cnas`, `scrape-cnsc`, `scrape-curteacont`, `scrape-gnm`, `scrape-regas`, `import-afir-historical`, `import-apia-fermieri`, `import-financials*`.
|
||||
|
||||
Audit-ul `scrape_log` confirmă totuși că **toți cei 9 scraperi cu schema dedicată au rulat în ultimele 24h** — deci există un cron ascuns (probabil în `bulibasa` user crontab, nu în `sudo crontab`). Strategia de mai jos **înlocuiește cron-ul ascuns cu un /etc/cron.d/ vizibil + systemd timers per scraper**.
|
||||
|
||||
---
|
||||
|
||||
## 1. Per-schema cadence table
|
||||
|
||||
Coloane: Schema · Sursă (ritm publicare) · Cadență recomandată · Wrapper · Runtime · Risc · Monitor signal (max age tolerat)
|
||||
|
||||
| # | Schema | Sursă upstream — ritm | Cadență recomandată | Wrapper | Runtime | Risc | Monitor signal (max age) |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| 1 | **seap.announcements** (WSP) | live | la 4h | `scrape-seap-wsp` (lipsește wrapper!) | 5-15 min | F5 WAF, ASP session | `wsp_sync_state.last_run_at` ≤ 6h |
|
||||
| 2 | **seap.direct_acquisitions** | live | la 6h | `scrape-seap-da` (lipsește wrapper!) | 10-30 min | session expiry, retry storms | `sync_state[source=da].updated_at` ≤ 8h |
|
||||
| 3 | **seap.entities + cui_location** | după WSP/DA refresh | seara, după daily | inclus în WSP wrapper | (incl.) | n/a | `entities.fetched_at` ≤ 24h |
|
||||
| 4 | **anaf** (v9 enrichment — daily delta) | live API | zilnic 02:00 | `enrich-anaf.sh` TIER=daily | 1-2h | rate limit ANAF 503 | `firms.entities WHERE anaf_fetched_at > now-2d` count ≥ 1000 |
|
||||
| 5 | **anaf.datornici** (data.gov.ro Q) | quarterly | trim 15-ian/15-apr/15-iul/15-oct | `scrape-anaf-datornici` SOURCE=datagov<Q> | 30-60 min | NEW — necesită captcha doar pt live | `anaf.datornici WHERE publication_date > now-180d` ≥ 1 |
|
||||
| 6 | **anaf.datornici** (anaf.ro live) | live, captcha | trim — **opțional dacă plătim 2captcha** | `scrape-anaf-datornici` SOURCE=live | 2-4h | reCAPTCHA v2 | (decis în §3) |
|
||||
| 7 | **firms.entities** (ONRC weekly) | săptămânal | marți 03:00 | `import-onrc-fresh.sh` | 30-60 min | bulk diff fail | `firms.entities.updated_at` ≤ 8 zile |
|
||||
| 8 | **firms.financials** (ANAF bilanțuri) | anual (15-iul publicare an N-1) | 15 iul + 15 aug rerun | `import-financials.sh` | 2-4h | mărime CSV ~3GB | `firms.financials WHERE source_year = year(now)-1` ≥ 800k |
|
||||
| 9 | **firms.financials_ong / banks** | anual | 20-iul | `import-financials-ong-banks.sh` | 1h | n/a | acelaşi |
|
||||
| 10 | **fonduri.afir_plati** | anual data.gov.ro | 15-feb (date an N-1) | `import-afir-historical.sh` | 2-4h | CSV mare | `fonduri.afir_plati WHERE source_year = year(now)-1` ≥ 1M |
|
||||
| 11 | **fonduri.beneficiar_anunt / proiect** (FEADR + FEGA) | live data.gov.ro | săptămânal lun 02:00 | `import-fonduri-beneficiari` (lipsește!) | 15-30 min | n/a | `fonduri.beneficiar_anunt.fetched_at` ≤ 8d |
|
||||
| 12 | **regas.ajutoare** (Consiliul Concurenței) | lunar | luna 1 ale lunii 02:00 | `scrape-regas` | 10-15 min | n/a | `regas.ajutoare.fetched_at` ≤ 35d |
|
||||
| 13 | **bugetar.entitate** (mfinante public registry) | lunar | luna 1 ale lunii 03:00 | `scrape-bugetar` | 30-60 min | n/a | `bugetar.entitate.fetched_at` ≤ 35d |
|
||||
| 14 | **bugetar.executie** (Faza 2 — captcha) | lunar (raportare 30 zile decalaj) | **deferred** — vezi §3 | `scrape-bugetar-executie` (lipsește) | 4-8h pt 1000 entități | captcha + 1000 detail pages | (deferred) |
|
||||
| 15 | **anre.licente** (3 surse: atestat/electricitate/gaze) | live | zilnic 03:00 | `scrape-anre` SOURCE=all | 3-5 min | TLS cert intermediary | `anre.licente.fetched_at` ≤ 36h |
|
||||
| 16 | **anre.electricieni** | live (~100k entries) | săptămânal duminică 04:00 | `scrape-anre` SOURCE=electricieni | 30-60 min | pagination volume | `anre.electricieni.fetched_at` ≤ 8d *(when implemented)* |
|
||||
| 17 | **ancom.operatori + drepturi** | live registry | zilnic 04:00 | `scrape-ancom` | 1-2 min | n/a | `ancom.operatori.fetched_at` ≤ 36h |
|
||||
| 18 | **asf.entitati** | live (rebuild nightly) | zilnic 05:00 | `scrape-asf` | 2-3 min | "omit g-recaptcha" trick must hold | `asf.entitati.fetched_at` ≤ 36h |
|
||||
| 19 | **cnsc.decizii** (listing) | live | zilnic 02:30 | `scrape-cnsc` MAX_PAGES=10 (incremental) | 2-5 min | session-based | `cnsc.decizii.fetched_at` ≤ 36h |
|
||||
| 20 | **cnsc Stage 2** (PDF parse → decision_type) | după listing | săptămânal sâmbătă 02:00 | `cnsc-parse-pdfs` (lipsește) | 4-8h pt 30k | I/O storage PDFs | % decizii `WHERE decision_type IS NOT NULL` ≥ 90% |
|
||||
| 21 | **cnas.documents** | lunar pe WP media | săptămânal lun 04:00 | `scrape-cnas` | 5-10 min | format CNAS schimbabil | `cnas.documents.fetched_at` ≤ 8d |
|
||||
| 22 | **cnas.furnizori** (parse din PDF) | inclus în .documents | săptămânal | (incl.) | (incl.) | parser failure 25% | % docs `parse_status='ok'` ≥ 75% |
|
||||
| 23 | **aaas.firme** | live portal | săptămânal lun 04:30 | `scrape-aaas` | 30s | listă mică (11 firme) | `aaas.firme.fetched_at` ≤ 8d |
|
||||
| 24 | **curteacont.rapoarte** (Stage 1 listing) | live săptămânal | zilnic 05:30 | `scrape-curteacont` | 1-3 min | n/a | `curteacont.rapoarte.fetched_at` ≤ 36h |
|
||||
| 25 | **curteacont Stage 2** (detail + PDF + audited CUI) | după Stage 1 | săptămânal duminică 03:00 | `curteacont-detail` (lipsește) | 4-6h pt 1133 | n/a | % rapoarte `WHERE audited_entity_cui IS NOT NULL` ≥ 50% |
|
||||
| 26 | **aep.donatii_pf/pj/rvc + partide** | trimestrial (raportări) | trim 15-ian/15-apr/15-iul/15-oct + lunar smoke check | `scrape-aep-donatii` | 1h | banipartide.ro mortality | `aep.donatii_pj.fetched_at` ≤ 95d |
|
||||
| 27 | **ani.declaratii** (PDFs) | live ANI dar **parser ne-implementat** | **deferred** | n/a | n/a | Cloudflare Turnstile | (deferred — multi-week) |
|
||||
| 28 | **apia.fermieri** (CKAN data.gov.ro) | anual (campania an N publicată 1-mar an N+1) | 15-mar + lunar smoke | `import-apia-fermieri` | 5-10 min | volum mic actual (191 rows — needs investigation) | `apia.fermieri.fetched_at` ≤ 35d |
|
||||
| 29 | **gnm.comunicate** (RSS) | săptămânal | zilnic 06:00 | `scrape-gnm` | 1-2 min | RSS format change | `gnm.comunicate.fetched_at` ≤ 36h ŞI `publicat_la_max > now-30d` |
|
||||
| 30 | **gnm.amenzi_extrase** (Stage B fuzzy) | după Stage A | săptămânal duminică 05:00 | `gnm-extract-amenzi` (post-A2) | 30 min | NLP false positives | % comunicate flagged enforcement cu amendă extrasă ≥ 50% |
|
||||
| 31 | **seap MV refresh** (9 materialized views) | după toate SEAP scrape | zilnic 06:00 (după WSP+DA) | `refresh-mvs.sh` | 5-15 min | dependență de WSP/DA | `mv_authority_concentration` ultim refresh ≤ 26h |
|
||||
|
||||
**Note critice:**
|
||||
- **Wrappere lipsă:** `scrape-seap-wsp`, `scrape-seap-da`, `import-fonduri-beneficiari`, `scrape-bugetar-executie`, `cnsc-parse-pdfs`, `curteacont-detail`, `gnm-extract-amenzi`. Scraperele TypeScript există în `src/`, dar nu au wrapper `cron/scrape-*.sh` cu pattern Infisical MI → env-file → docker run. **Aceasta este lacuna #1 înainte de oricărei programări noi.**
|
||||
- ANRE rulează deja zilnic via cron ascuns dar nu via systemd vizibil — strategia mută totul în systemd timers per scraper, ca **mvs.timer** azi.
|
||||
|
||||
---
|
||||
|
||||
## 2. Cron schedule recommendation
|
||||
|
||||
Două opțiuni implementabile:
|
||||
- **(A) /etc/cron.d/govagreg-refresh** — un singur fișier vizibil, ușor de auditat.
|
||||
- **(B) systemd timers per scraper** — match-uiește patternul existent (`vreaudigital-*.timer`), permite `journalctl -u`, status uniform.
|
||||
|
||||
**Recomandare: B (systemd timers)**, pentru că:
|
||||
1. Patternul există deja (3 timere), iar `journalctl` e mai util decât `/var/log/cron`.
|
||||
2. Per-unit `OnFailure=` permite alerting nativ.
|
||||
3. `Persistent=true` reia rulările pierdute după reboot (cron-ul de pe satra nu are anacron).
|
||||
4. `RandomizedDelaySec=` evită contenția în vârful 02:00-06:00.
|
||||
|
||||
### 2.1 Timer skeleton (canonical pattern)
|
||||
|
||||
Un template pentru fiecare scraper:
|
||||
|
||||
```ini
|
||||
# /etc/systemd/system/vreaudigital-<scraper>.service
|
||||
[Unit]
|
||||
Description=vreaudigital — <scraper> refresh
|
||||
Wants=network.target docker.service
|
||||
After=network.target docker.service vreaudigital-prerequisites.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
User=bulibasa
|
||||
ExecStart=/opt/vreaudigital/services/seap-scraper/cron/scrape-<scraper>.sh
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
TimeoutStartSec=4h
|
||||
OnFailure=vreaudigital-alert@%n.service
|
||||
|
||||
# /etc/systemd/system/vreaudigital-<scraper>.timer
|
||||
[Unit]
|
||||
Description=vreaudigital — <scraper> at <time>
|
||||
|
||||
[Timer]
|
||||
OnCalendar=<schedule>
|
||||
Persistent=true
|
||||
RandomizedDelaySec=600
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
```
|
||||
|
||||
### 2.2 Recommended schedule (eșalonat pentru a evita contenție pe satra)
|
||||
|
||||
```
|
||||
# === LIVE / NEAR-REAL-TIME (multiple ori pe zi) ===
|
||||
vreaudigital-seap-wsp.timer OnCalendar=*-*-* 00,04,08,12,16,20:15:00 # 6× zi
|
||||
vreaudigital-seap-da.timer OnCalendar=*-*-* 02,10,18:30:00 # 3× zi (mai greu)
|
||||
|
||||
# === DAILY off-peak (02:00-06:00, eșalonat la 5-15 min) ===
|
||||
vreaudigital-anaf-daily.timer OnCalendar=*-*-* 02:00:00 # exista (enrich v9 daily)
|
||||
vreaudigital-cnsc.timer OnCalendar=*-*-* 02:30:00
|
||||
vreaudigital-anre.timer OnCalendar=*-*-* 03:00:00
|
||||
vreaudigital-curteacont.timer OnCalendar=*-*-* 03:30:00
|
||||
vreaudigital-ancom.timer OnCalendar=*-*-* 04:00:00
|
||||
vreaudigital-asf.timer OnCalendar=*-*-* 04:30:00
|
||||
vreaudigital-gnm.timer OnCalendar=*-*-* 05:00:00
|
||||
vreaudigital-mvs.timer OnCalendar=*-*-* 06:00:00 # exista; mut de la 04:00 la 06:00 ca să fie după toate scraperele
|
||||
|
||||
# === WEEKLY (luni 04:00-06:00, sâmbătă/duminică pentru heavy) ===
|
||||
vreaudigital-cnas.timer OnCalendar=Mon *-*-* 04:00:00
|
||||
vreaudigital-aaas.timer OnCalendar=Mon *-*-* 04:30:00
|
||||
vreaudigital-onrc-weekly.timer OnCalendar=Tue *-*-* 03:00:00 # exista
|
||||
vreaudigital-fonduri-week.timer OnCalendar=Mon *-*-* 02:00:00
|
||||
vreaudigital-anre-electricieni.timer OnCalendar=Sun *-*-* 04:00:00
|
||||
vreaudigital-cnsc-pdfs.timer OnCalendar=Sat *-*-* 02:00:00 # Stage 2 heavy
|
||||
vreaudigital-curteacont-detail.timer OnCalendar=Sun *-*-* 03:00:00 # Stage 2
|
||||
vreaudigital-gnm-amenzi.timer OnCalendar=Sun *-*-* 05:00:00 # Stage B
|
||||
|
||||
# === MONTHLY (1 ale lunii, 02:00-04:00) ===
|
||||
vreaudigital-regas.timer OnCalendar=*-*-01 02:00:00
|
||||
vreaudigital-bugetar.timer OnCalendar=*-*-01 03:00:00
|
||||
vreaudigital-apia.timer OnCalendar=*-*-01 04:00:00 # smoke check, full pe martie
|
||||
|
||||
# === QUARTERLY (15 ale luni 1/4/7/10) ===
|
||||
vreaudigital-anaf-datornici.timer OnCalendar=*-01,04,07,10-15 02:00:00
|
||||
vreaudigital-aep-donatii.timer OnCalendar=*-01,04,07,10-15 03:00:00
|
||||
|
||||
# === ANNUAL ===
|
||||
vreaudigital-afir-historical.timer OnCalendar=*-02-15 02:00:00
|
||||
vreaudigital-financials.timer OnCalendar=*-07-15 02:00:00
|
||||
vreaudigital-financials-ong.timer OnCalendar=*-07-20 02:00:00
|
||||
vreaudigital-apia-full.timer OnCalendar=*-03-15 02:00:00
|
||||
|
||||
# === DEAD-MAN'S SWITCH (vezi §4) ===
|
||||
vreaudigital-heartbeat.timer OnCalendar=*-*-* 07:00:00 # alert dacă lipsesc date proaspete
|
||||
```
|
||||
|
||||
**Total estimat încărcare:** ~35 min CPU/zi steady-state daily slot, ~2-4h/sâmbătă-duminică (heavy stages), ~6-10h în 15 ale lunilor Q (datornici + AEP), ~8h în 15-iul (financials annual).
|
||||
|
||||
### 2.3 Resource contention checklist
|
||||
|
||||
- **02:00-04:00 daily:** anaf (1-2h) + cnsc (2-5 min) + anre (3-5 min) + curteacont (1-3 min). ANAF rulează long, restul tick-uri scurte cu RandomizedDelaySec=300-600 evită overlap.
|
||||
- **04:00-06:00 daily:** ancom + asf + gnm + mvs. Toate sub 15 min total. mvs (5-15 min) e ultimul.
|
||||
- **Luni 02:00-05:00:** fonduri + cnas + aaas + apia smoke. ONRC pe MARȚI ca să nu se ciocnească cu nimic.
|
||||
- **Weekend:** cnsc-pdfs + curteacont-detail + anre-electricieni + gnm-amenzi. Heavy lifts, niciun overlap.
|
||||
- **Disc:** la 85% pe satra ⚠️. **Înainte de orice scraper PDF nou (cnsc-pdfs, curteacont-detail) — rezolvă §6 disc**.
|
||||
|
||||
---
|
||||
|
||||
## 3. CAPTCHA-blocked sources strategy
|
||||
|
||||
### 3.1 ANAF datornici live (anaf.ro/restante)
|
||||
|
||||
**Stare:** Singura sursă publică bulk (data.gov.ro Q1 2016) e statică. Pentru a actualiza 2016-Q2 → 2026-Q1 (38 trimestre) trebuie scrape live cu captcha.
|
||||
|
||||
**Two paths:**
|
||||
|
||||
| Path | Cost | Timp implementare | Acoperire |
|
||||
|---|---|---|---|
|
||||
| **(a) data.gov.ro Q-snapshots** | $0 | 2 zile (sursa trebuie verificată dacă publică Q-uri noi) | depinde de mfinante |
|
||||
| **(b) 2captcha pe anaf.ro/restante live** | $1-3/1000 captcha | 1 săptămână + Playwright | toate Q-urile, on-demand |
|
||||
|
||||
**Recomandare:** path (a) prima — verifică data.gov.ro listing pentru dataset-uri `anaf-datornici-202X`. Dacă publică, scraperul existent (`SOURCE=datagovYYYY-QN`) deja gestionează. Path (b) doar dacă data.gov.ro nu publică sau e cu lag mare.
|
||||
|
||||
**Buget 2captcha pentru path (b) — backfill 5 ani × 4 Q × 1 captcha per fetch = 20 captchas total** (un Q = un download pe full set, nu per-entitate). **Buget: ~$0.10/an** (neglijabil). Costul real: timpul dev pentru integrare Playwright + 2captcha SDK = 2-3 zile.
|
||||
|
||||
**Pre-req decision:**
|
||||
```
|
||||
DACĂ Q4-2025 publicat pe data.gov.ro
|
||||
ATUNCI nu plătim 2captcha — extindem `scrape-anaf-datornici.sh` cu SOURCE=datagov2025Q4
|
||||
ALTFEL plătim 2captcha (~$0.10/an) ȘI investim 2-3 zile dev
|
||||
```
|
||||
|
||||
### 3.2 Bugetar Faza 2 — execuție bugetară per entitate
|
||||
|
||||
**Stare:** `bugetar.entitate` = 18,822 entități; `bugetar.executie` = **0 rows**.
|
||||
|
||||
**Captcha analiza:** mfinante.gov.ro/static/10/Mfp/sit-Trezor/situatie_trezorerie.html — pagina detail per entitate cere captcha (Google reCAPTCHA v2). Per fetch = 1 captcha.
|
||||
|
||||
**Strategie scope:**
|
||||
- Total entități × 60 luni = 18,822 × 60 = **1,129,320 fetches** dacă acoperim TOATE entitățile × tot istoricul (5 ani).
|
||||
- Cu 2captcha la $1/1000: **$1,129/total**, ~$226/an pentru 5 ani amortizat.
|
||||
- Reducem la **top-1000 entități după buget** × 60 luni = **60,000 fetches = $60 total**, ~$12/an. ← **RECOMANDARE**.
|
||||
|
||||
**Buget total bugetar Faza 2: $60-100 one-shot pentru top-1000 entități**. Refresh lunar incremental: 1000 × 1 lună = 1000 fetch/lună = $1/lună.
|
||||
|
||||
### 3.3 ANI new e-DAI 2022+ (Cloudflare Turnstile)
|
||||
|
||||
**Stare:** ANI mută `e-DAI` pe noua platformă (post-2022) protejat cu Cloudflare Turnstile (nu reCAPTCHA). 2captcha **suportă Turnstile** ($3/1000) dar e mai puțin fiabil; Playwright **headed** (cu browser real) e fallback.
|
||||
|
||||
**Volum:** ~1.3M PDFs (CLAUDE.md). Chiar la $3/1000 = **$3,900** pentru backfill complet — depășește bugetul. Refresh incremental ~50k/an = $150/an.
|
||||
|
||||
**Recomandare:**
|
||||
- **Faza 0:** parserul PDF nu e implementat încă. Investește 4-6 săptămâni dev în parser ÎNAINTE de a cheltui pe captcha.
|
||||
- **Faza 1:** scraping curent — folosește **Playwright headed pe Orchi** (RTX A4000, neutilizat noaptea) pentru sample 10k declarații, manual solving / fingerprint rotation. **Cost direct: $0.**
|
||||
- **Faza 2 (dacă scalează):** 2captcha Turnstile pentru deltas anuale ~$150/an.
|
||||
|
||||
### 3.4 ASF "omit g-recaptcha-response" trick
|
||||
|
||||
ASF nu necesită 2captcha — scraperul curent omite parametrul `g-recaptcha-response` din POST și serverul răspunde oricum (bug în implementarea ASF). **Risk:** ASF poate fixa oricând acest bug. Monitor: dacă `scrape-asf.sh` începe să returneze 0 rows constant, investighează.
|
||||
|
||||
### 3.5 Decision rubric — investim captcha sau nu?
|
||||
|
||||
| Sursă | 2captcha cost/an | Valoare unlock | Vot |
|
||||
|---|---|---|---|
|
||||
| ANAF datornici live | ~$0.10 | mediu (path-a probabil rezolvă) | **NU prioritar** — verifică path-a întâi |
|
||||
| Bugetar top-1000 | ~$12 (incremental) | mare (fluxuri bani publici) | **DA** după parser execuție repaired |
|
||||
| ANI e-DAI 2022+ | ~$150 | flagship | **DEFER** până la parser PDF implementat |
|
||||
| Bugetar toate 18,822 | ~$226 | mare dar redundant cu top-1000 | **NU** — top-1000 e suficient |
|
||||
|
||||
**Buget total 2captcha pe an pentru acoperire completă recomandată:** **$15-25/an** (Bugetar top-1000 incremental + ANAF safety net + ASF backup dacă trick-ul se strică).
|
||||
|
||||
**Buget total 2captcha pentru backfill one-shot:** **$60-100** (Bugetar top-1000 × 5 ani istoric).
|
||||
|
||||
**Buget extins dacă includem ANI:** **+$150/an pentru deltas**, $3,900 backfill (deferred).
|
||||
|
||||
---
|
||||
|
||||
## 4. Monitoring & alerting
|
||||
|
||||
### 4.1 Dead-man's switch — heartbeat zilnic
|
||||
|
||||
**Concept:** o singură query rulează la 07:00 zilnic, verifică `max(fetched_at)` per tabel cheie, alertează dacă > expected_cadence × 1.5.
|
||||
|
||||
**Implementare:** `vreaudigital-heartbeat.service` + Brevo SMTP (deja config).
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# /opt/vreaudigital/services/seap-scraper/cron/heartbeat.sh
|
||||
set -euo pipefail
|
||||
LOG=/var/log/vreaudigital-heartbeat.log
|
||||
|
||||
source /opt/vreaudigital/.infisical-mi
|
||||
# ... (fetch DATABASE_URL + SMTP creds via Infisical, same pattern as refresh-mvs.sh)
|
||||
|
||||
# Define expected freshness (hours)
|
||||
declare -A EXPECTED=(
|
||||
["seap.announcements"]="6"
|
||||
["seap.direct_acquisitions"]="8"
|
||||
["anre.licente"]="36"
|
||||
["ancom.operatori"]="36"
|
||||
["asf.entitati"]="36"
|
||||
["cnsc.decizii"]="36"
|
||||
["curteacont.rapoarte"]="36"
|
||||
["gnm.comunicate"]="36"
|
||||
["firms.entities"]="192" # 8 days (weekly cron + buffer)
|
||||
["cnas.documents"]="192"
|
||||
["aaas.firme"]="192"
|
||||
["fonduri.beneficiar_anunt"]="192"
|
||||
["regas.ajutoare"]="840" # 35 days (monthly)
|
||||
["bugetar.entitate"]="840"
|
||||
["apia.fermieri"]="840"
|
||||
["anaf.datornici"]="4320" # 180 days (quarterly)
|
||||
["aep.donatii_pj"]="2280" # 95 days
|
||||
)
|
||||
|
||||
ALERTS=()
|
||||
for table in "${!EXPECTED[@]}"; do
|
||||
schema="${table%.*}"
|
||||
tbl="${table#*.}"
|
||||
max_age=$(psql -tA -c "SELECT EXTRACT(EPOCH FROM (now() - max(fetched_at)))/3600 FROM ${table}")
|
||||
threshold="${EXPECTED[$table]}"
|
||||
if (( $(echo "$max_age > $threshold * 1.5" | bc -l) )); then
|
||||
ALERTS+=("$table: ${max_age}h stale (threshold ${threshold}h)")
|
||||
fi
|
||||
done
|
||||
|
||||
if [ ${#ALERTS[@]} -gt 0 ]; then
|
||||
BODY=$(printf '%s\n' "${ALERTS[@]}")
|
||||
echo "$BODY" | mail -s "[vreaudigital] heartbeat: ${#ALERTS[@]} schemas stale" \
|
||||
-S smtp="smtps://$BREVO_SMTP_HOST:$BREVO_SMTP_PORT" \
|
||||
-S smtp-auth=login \
|
||||
-S smtp-auth-user="$BREVO_SMTP_USER" \
|
||||
-S smtp-auth-password="$BREVO_SMTP_KEY" \
|
||||
-S from="alerts@beletage.ro" \
|
||||
m.tarau@beletage.ro
|
||||
fi
|
||||
```
|
||||
|
||||
**Alternativă alerting:** n8n webhook (deja deployed la `https://n8n.beletage.ro`) — POST simplu, n8n trimite mai departe pe Telegram/Slack/email cu un singur workflow.
|
||||
|
||||
```bash
|
||||
curl -fsS -X POST https://n8n.beletage.ro/webhook/vreaudigital-heartbeat \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d "$(jq -n --arg body "$BODY" '{type:"stale-data", alerts:$body}')"
|
||||
```
|
||||
|
||||
### 4.2 Per-scraper OnFailure alert
|
||||
|
||||
Adaugă `OnFailure=vreaudigital-alert@%n.service` în fiecare timer. Template service:
|
||||
|
||||
```ini
|
||||
# /etc/systemd/system/vreaudigital-alert@.service
|
||||
[Unit]
|
||||
Description=vreaudigital alert for %i
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
User=bulibasa
|
||||
ExecStart=/opt/vreaudigital/services/seap-scraper/cron/alert.sh %i
|
||||
```
|
||||
|
||||
`alert.sh %i` extrage ultimele 50 linii via `journalctl -u %i -n 50` și le trimite la n8n webhook.
|
||||
|
||||
### 4.3 Top blind spots care necesită monitor azi
|
||||
|
||||
1. **`seap.sync_state[source=da].status = pending` din 2025-10-16** (208 zile!) — DA backfill blocat și nimeni nu primește alert. **Trebuie heartbeat dedicat pentru `sync_state` și `wsp_sync_state` care alertează dacă `updated_at < now() - 24h` sau `consecutive_errors > 5`**.
|
||||
2. **WSP `last_run_at = 2026-05-07`** (4 zile stale, ar trebui la 4h). Patternul deja descris în audit ca lipsit — heartbeat fix-uiește.
|
||||
3. **Disk 85% pe satra** — heartbeat trebuie să verifice `df -h /` și să alerteze la 90%.
|
||||
|
||||
### 4.4 Sample monitor query — copy-paste într-un singur SQL
|
||||
|
||||
```sql
|
||||
SELECT 'STALE: '||t AS alert FROM (
|
||||
SELECT 'seap.announcements' AS t, max(fetched_at) AS f FROM seap.announcements
|
||||
UNION ALL SELECT 'seap.direct_acquisitions', max(fetched_at) FROM seap.direct_acquisitions
|
||||
UNION ALL SELECT 'firms.entities', max(updated_at) FROM firms.entities
|
||||
UNION ALL SELECT 'fonduri.afir_plati', max(fetched_at) FROM fonduri.afir_plati
|
||||
UNION ALL SELECT 'regas.ajutoare', max(fetched_at) FROM regas.ajutoare
|
||||
UNION ALL SELECT 'anre.licente', max(fetched_at) FROM anre.licente
|
||||
UNION ALL SELECT 'ancom.operatori', max(fetched_at) FROM ancom.operatori
|
||||
UNION ALL SELECT 'asf.entitati', max(fetched_at) FROM asf.entitati
|
||||
UNION ALL SELECT 'cnsc.decizii', max(fetched_at) FROM cnsc.decizii
|
||||
UNION ALL SELECT 'cnas.documents', max(fetched_at) FROM cnas.documents
|
||||
UNION ALL SELECT 'aaas.firme', max(fetched_at) FROM aaas.firme
|
||||
UNION ALL SELECT 'curteacont.rapoarte', max(fetched_at) FROM curteacont.rapoarte
|
||||
UNION ALL SELECT 'apia.fermieri', max(fetched_at) FROM apia.fermieri
|
||||
UNION ALL SELECT 'aep.donatii_pj', max(fetched_at) FROM aep.donatii_pj
|
||||
UNION ALL SELECT 'gnm.comunicate', max(fetched_at) FROM gnm.comunicate
|
||||
UNION ALL SELECT 'bugetar.entitate', max(fetched_at) FROM bugetar.entitate
|
||||
UNION ALL SELECT 'anaf.datornici', max(fetched_at) FROM anaf.datornici
|
||||
) x
|
||||
WHERE f < now() - (
|
||||
CASE
|
||||
WHEN t LIKE 'seap.%' THEN interval '12 hours'
|
||||
WHEN t IN ('anre.licente','ancom.operatori','asf.entitati','cnsc.decizii','curteacont.rapoarte','gnm.comunicate') THEN interval '54 hours'
|
||||
WHEN t IN ('firms.entities','cnas.documents','aaas.firme','fonduri.afir_plati') THEN interval '12 days'
|
||||
WHEN t IN ('regas.ajutoare','apia.fermieri','bugetar.entitate') THEN interval '52 days'
|
||||
WHEN t = 'aep.donatii_pj' THEN interval '143 days'
|
||||
WHEN t = 'anaf.datornici' THEN interval '270 days'
|
||||
END
|
||||
);
|
||||
```
|
||||
|
||||
Rulează zilnic la 07:00. Dacă returnează rânduri → email/n8n.
|
||||
|
||||
---
|
||||
|
||||
## 5. Idempotency contract per source
|
||||
|
||||
**Cerință:** fiecare scraper TREBUIE să fie idempotent — re-rularea NU duplică, doar refresh `fetched_at`.
|
||||
|
||||
| Schema | Idempotency key | Mecanism (din cod existent verificat sau menționat în audit) | Status |
|
||||
|---|---|---|---|
|
||||
| seap.announcements | `(source, source_id)` | `ON CONFLICT (source, source_id) DO UPDATE` (confirmat audit) | ✅ |
|
||||
| seap.direct_acquisitions | similar | similar | ✅ |
|
||||
| firms.entities | `cui` PK | `ON CONFLICT (cui) DO UPDATE` | ✅ |
|
||||
| firms.financials | `(cui, source_year)` | UPSERT | ✅ |
|
||||
| fonduri.afir_plati | `(cnp_cui_hash, source_year, suma)` | hash unique | ✅ (audit) |
|
||||
| fonduri.beneficiar_anunt | `(announcement_id)` | UPSERT | ✅ |
|
||||
| regas.ajutoare | `(cui, an, masura)` | UPSERT | ✅ |
|
||||
| anaf.datornici | `(cui, publication_date)` | `ON CONFLICT (cui, publication_date) DO UPDATE` (confirmat wrapper) | ✅ |
|
||||
| anaf.lista_alba | TBD | gol — pipeline neimplementat | ⚠️ |
|
||||
| aep.donatii_pf | `(partid, donator_nume, data, suma)` | composite UNIQUE | ✅ |
|
||||
| aep.donatii_pj | similar | composite UNIQUE | ✅ |
|
||||
| aep.donatii_rvc | similar | composite UNIQUE | ⚠️ are date eronate 2034 — necesită cleanup, dar UPSERT funcționează |
|
||||
| bugetar.entitate | `cif` | UPSERT | ✅ |
|
||||
| bugetar.executie | TBD | gol | ⚠️ |
|
||||
| anre.licente | `(source, nr_autorizare)` sau sha1 | UPSERT pe sha1 (wrapper confirmă) | ✅ |
|
||||
| anre.electricieni | `UNIQUE(nr_autorizare, nume_prenume)` (wrapper) | UPSERT | ✅ (când rulează) |
|
||||
| ancom.operatori | `cui` | UPSERT | ✅ |
|
||||
| ancom.drepturi | `(cui, tip_drept)` | UPSERT | ✅ |
|
||||
| asf.entitati | `cui` | UPSERT | ✅ |
|
||||
| cnsc.decizii | `(decision_no, decision_year)` | `ON CONFLICT (decision_no, decision_year) DO UPDATE` (wrapper confirmat) | ✅ |
|
||||
| cnas.documents | `source_url_sha1` | UPSERT | ✅ |
|
||||
| cnas.furnizori | `(document_id, row_index)` | UPSERT | ✅ |
|
||||
| aaas.firme | `cui` | UPSERT | ✅ |
|
||||
| curteacont.rapoarte | `(audit_year, report_no)` sau URL | UPSERT | ✅ |
|
||||
| apia.fermieri | `(cnp_cui, campania)` | UPSERT | ✅ |
|
||||
| ani.declaratii | `pdf_sha1` | UPSERT | ✅ (când parser funcționează) |
|
||||
| gnm.comunicate | `URL sha1` | UPSERT | ✅ |
|
||||
| gnm.amenzi_extrase | `(comunicat_id, violator_cui, suma)` | UPSERT | ✅ |
|
||||
|
||||
**Non-idempotent suspects (necesită review cod):**
|
||||
- `anaf.lista_alba` — gol, pipeline neexistent. Când implementat, UPSERT pe `cui`.
|
||||
- `bugetar.executie` — gol. Când implementat, UPSERT pe `(cif, an, luna, indicator)`.
|
||||
- TED import (`import_ted.py`) — `publication-date` bug confirmat în audit; UPSERT-ul probabil funcționează, dar fix-ul de 1 linie e prerequisite.
|
||||
|
||||
**Action item:** după implementarea bugetar.executie și anaf.lista_alba, verifică explicit `ON CONFLICT DO UPDATE/DO NOTHING` în INSERT statements și adaugă teste de idempotență (rulează scraperul de 2 ori la rând și verifică `count(*)` constant).
|
||||
|
||||
---
|
||||
|
||||
## 6. Disaster recovery
|
||||
|
||||
### 6.1 RTO/RPO
|
||||
|
||||
**Componente:**
|
||||
- DB `architools_db` @ 10.10.10.166 — 29 GB
|
||||
- Codul pe `gitadmin/gov-agreg` Gitea — recuperabil în <1 min
|
||||
- `.infisical-mi` files — secrets în Infisical, recuperabil cu MI restart
|
||||
- Cron-uri/timere — în git repo (path `services/seap-scraper/cron/`)
|
||||
|
||||
**RTO (Recovery Time Objective):** ~2 ore — git clone + restore dump + restart timers.
|
||||
**RPO (Recovery Point Objective):** depinde de backup cadence — vezi 6.2.
|
||||
|
||||
### 6.2 DB backup status (verified azi)
|
||||
|
||||
`sudo crontab -l` pe satra arată **DOAR**:
|
||||
- `/opt/pug-tracker-scripts/scripts/backup-db.sh` la 03:00
|
||||
- `/home/bulibasa/backup.sh` la 05:45
|
||||
- eterra stats email la 06:30
|
||||
|
||||
**NU există backup explicit pentru `architools_db`** — trebuie verificat dacă `pug-tracker-scripts/backup-db.sh` sau `bulibasa/backup.sh` include `architools_db`. **Această este o gaură critică în DR**.
|
||||
|
||||
**Acțiune imediată:** verifică conținut `/opt/pug-tracker-scripts/scripts/backup-db.sh` și `/home/bulibasa/backup.sh`. Dacă `architools_db` lipsește, adaugă:
|
||||
|
||||
```bash
|
||||
# /opt/vreaudigital/services/seap-scraper/cron/backup-db.sh
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
BACKUP_DIR=/backups/architools_db
|
||||
mkdir -p "$BACKUP_DIR"
|
||||
DATE=$(date +%Y%m%d_%H%M)
|
||||
|
||||
source /opt/vreaudigital/.infisical-mi
|
||||
# ... (fetch DATABASE_URL pattern)
|
||||
|
||||
# pg_dump custom format (compressed, parallelizable restore)
|
||||
pg_dump -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -d "$PGDATABASE" \
|
||||
--format=custom \
|
||||
--jobs=4 \
|
||||
--no-owner --no-acl \
|
||||
--exclude-table='*staging_*' \
|
||||
--exclude-table-data='*log*' \
|
||||
--file="$BACKUP_DIR/architools_${DATE}.dump"
|
||||
|
||||
# Keep 7 daily, 4 weekly, 12 monthly
|
||||
find "$BACKUP_DIR" -name 'architools_*.dump' -mtime +90 -delete
|
||||
```
|
||||
|
||||
**Programare:** `vreaudigital-backup.timer OnCalendar=*-*-* 23:00:00` (înainte de scrape-urile de 02:00).
|
||||
|
||||
**Mărime estimată:** 29GB DB → ~6-8GB compressed (custom format ratio ~4×). Disc satra: 57GB liberi, suficient pentru ~7 zile retention pe satra + rotate spre **shop** sau **NAS Synology** via rclone/rsync.
|
||||
|
||||
### 6.3 Restore procedure (documentată)
|
||||
|
||||
```
|
||||
# 1. Pe satra (sau host nou):
|
||||
git clone https://git.beletage.ro/gitadmin/gov-agreg.git /opt/vreaudigital
|
||||
cd /opt/vreaudigital/services/seap-scraper
|
||||
npm install --omit=optional
|
||||
|
||||
# 2. Restore .infisical-mi
|
||||
scp <safe-source>:/opt/vreaudigital/.infisical-mi /opt/vreaudigital/
|
||||
chmod 600 /opt/vreaudigital/.infisical-mi
|
||||
|
||||
# 3. Restore DB
|
||||
createdb architools_db
|
||||
pg_restore --jobs=4 --no-owner --no-acl \
|
||||
--dbname=architools_db \
|
||||
/backups/architools_db/architools_<latest>.dump
|
||||
|
||||
# 4. Restart timers
|
||||
sudo systemctl enable --now vreaudigital-*.timer
|
||||
sudo systemctl list-timers | grep vreaudigital
|
||||
```
|
||||
|
||||
### 6.4 Off-site backup
|
||||
|
||||
Recomandare: rsync zilnic `/backups/architools_db/` la **shop.avizero.ro:/srv/backups/satra-architools/** sau spre Synology NAS dacă există. **NU rsync direct la GitHub/Gitea** (29GB > limit).
|
||||
|
||||
```
|
||||
# /etc/systemd/system/vreaudigital-backup-offsite.timer OnCalendar=*-*-* 23:30:00
|
||||
rsync -avz --delete /backups/architools_db/ shop:/srv/backups/satra-architools/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Recommended action items, prioritized
|
||||
|
||||
### 7.1 This week (low effort, high ROI)
|
||||
|
||||
| # | Item | Effort | Impact |
|
||||
|---|---|---|---|
|
||||
| 1 | **Fix TED `publication-date` field** în `import_ted.py` (1-line) | 5 min | 100% TED publication_date populated |
|
||||
| 2 | **Reset `seap.sync_state[source=da].status` din pending → null** + relansare backfill DA | 15 min | unlock 208-day-old backfill (potential ~8M rows) |
|
||||
| 3 | **Investigate WSP stall** — `wsp_sync_state.last_run_at = 2026-05-07`. Verifică cron-ul ascuns; dacă lipsește, creează `vreaudigital-seap-wsp.timer` per §2.2 | 1h | live SEAP daily feed restored |
|
||||
| 4 | **Verifică backup DB** — citește `/opt/pug-tracker-scripts/scripts/backup-db.sh` și `/home/bulibasa/backup.sh`. Dacă `architools_db` lipsește, instalează `backup-db.sh` din §6.2 | 1h | DR readiness, RPO ≤ 24h |
|
||||
| 5 | **Implementează `vreaudigital-heartbeat.timer`** din §4.1 + 1 query în §4.4 | 2h | dead-man's switch peste 17 schemas |
|
||||
|
||||
**Total week 1:** ~5h work, unlocks 4 critical paths.
|
||||
|
||||
### 7.2 This month (medium effort)
|
||||
|
||||
| # | Item | Effort | Impact |
|
||||
|---|---|---|---|
|
||||
| 1 | **Creează wrappere lipsă** pentru `scrape-seap-wsp`, `scrape-seap-da`, `import-fonduri-beneficiari`, `gnm-extract-amenzi`, `curteacont-detail`, `cnsc-parse-pdfs` (6 wrappere cu pattern Infisical MI) | 1 zi | uniformizează scheduling |
|
||||
| 2 | **Migrează toate cele 13 wrappere existente la systemd timers vizibili** per §2.2 (înlocuiește cron-ul ascuns) | 1 zi | observabilitate `journalctl -u`, retry on failure |
|
||||
| 3 | **Investigate ANAF datornici Q4 2025 publicare pe data.gov.ro** — dacă publicat, rulează `scrape-anaf-datornici SOURCE=datagov2025Q4`. Altfel începe integrare 2captcha | 1 zi | datornici devine fresh |
|
||||
| 4 | **Disc cleanup pe satra** — staging tables 3GB (firms.staging_onrc_*) + log rotation + offsite backups să poată fi instalate | 4h | disc < 80%, room pentru cnsc PDFs Stage 2 |
|
||||
| 5 | **CUI matcher rerun pentru cnas.furnizori, apia.fermieri, fonduri.beneficiar_proiect** (3 schemas cu 0% match) | 4h | unlock cross-source recipes |
|
||||
|
||||
### 7.3 Next quarter (high effort sau lower priority)
|
||||
|
||||
| # | Item | Effort | Impact |
|
||||
|---|---|---|---|
|
||||
| 1 | **CNSC Stage 2 PDF parser** — extract decision_type/summary pentru 29k decizii | 1-2 săpt | decizii filtrabile |
|
||||
| 2 | **Curtea Conturi Stage 2** detail-page + audited_cui + PDF | 2 săpt | rapoarte legate la CUI |
|
||||
| 3 | **Bugetar.executie Faza 2** + 2captcha pentru top-1000 entități (~$60 one-shot) | 2 săpt | flux financiar public |
|
||||
| 4 | **ANI declaratii parser** (1.3M PDFs) — recommended deferred până confirmat parser ANRE/AAAS minor backlogs cleared | 4-6 săpt | flagship politicieni |
|
||||
| 5 | **SEAP DA backfill 2017-2024** (~8M rows) — post DA sync_state reset | 2-3 săpt | acoperire achiziții directe completă |
|
||||
|
||||
---
|
||||
|
||||
## Anexa A — Snapshot scrape_log azi (2026-05-11)
|
||||
|
||||
| Schema | Last successful run | OK runs 7d |
|
||||
|---|---|---:|
|
||||
| aaas | 2026-05-10 17:51 | 6 |
|
||||
| aep | 2026-05-09 20:58 | 4 |
|
||||
| ancom | 2026-05-10 18:06 | 3 |
|
||||
| anre | 2026-05-10 14:47 | 3 (4 errors) ⚠️ |
|
||||
| apia | 2026-05-10 18:53 | 1 |
|
||||
| asf | 2026-05-10 18:19 | 1 |
|
||||
| cnas | 2026-05-10 18:08 | 67 (multiple PDF parses) |
|
||||
| cnsc | 2026-05-10 19:19 | 4 |
|
||||
| gnm | 2026-05-10 19:02 | 5 |
|
||||
| **seap.wsp_sync_state** | **2026-05-07 03:01** (3 zile stale!) | n/a |
|
||||
| **seap.sync_state[da]** | **2025-10-16** (208 zile stale!) | n/a |
|
||||
|
||||
**Concluzie:** 9 din 11 schemas live au rulat în ultimele 24h. SEAP WSP + DA sunt blind spots — heartbeat trebuie să le acopere explicit.
|
||||
|
||||
---
|
||||
|
||||
## Anexa B — Quick reference: existing systemd timers (current state)
|
||||
|
||||
```
|
||||
/etc/systemd/system/vreaudigital-anaf-daily.timer → 02:00 daily → enrich-anaf.sh TIER=daily
|
||||
/etc/systemd/system/vreaudigital-onrc-weekly.timer → Tue 03:00 → import-onrc-fresh.sh
|
||||
/etc/systemd/system/vreaudigital-mvs.timer → 04:00 daily → refresh-mvs.sh
|
||||
```
|
||||
|
||||
**Recomandare:** păstrează aceste 3 ca-s sunt, adaugă alte 18-20 timere pentru a acoperi celelalte schemas.
|
||||
|
||||
---
|
||||
|
||||
**Strategy doc complete.** Implementation poate începe imediat cu §7.1 items.
|
||||
|
||||
---
|
||||
|
||||
## Anexa C — AEP donatii (banipartide.ro): lag pattern confirmat 2026-05-12
|
||||
|
||||
**Verificare directă a sursei** (`https://www.banipartide.ro/app/json.php?mode=dt&ssid=<base64-SQL>`):
|
||||
|
||||
| Dataset | Total rânduri sursă | Max an pe sursă | DB rânduri | DB max an / max `data_donatie` |
|
||||
|---|---:|---|---:|---|
|
||||
| Donatori PJ (Monitorul Oficial 10k+) | 3,612 | **2024** (114) | 3,567 | 2024 / 2024-12-13 |
|
||||
| Donatori PF (Monitorul Oficial 10k+) | 30,792 | **2024** (1,859) | 30,173 | 2024 / 2024-12-27 |
|
||||
| RVC (Rapoarte Venituri/Cheltuieli) | 353,473 | **2023** (42,791) | 346,237 | 2023 / 2034-01-31 (erori OCR) |
|
||||
|
||||
**Concluzie:** sursa **NU are date 2025 sau 2026**. Ultima rulare a cron-ului (2026-05-11 09:15 satra) a importat deja toate rândurile existente (`seen=3612/30792/353473`). Diferența DB vs sursă (45/619/7236 rânduri) e dată de:
|
||||
- PJ: 572 rânduri cu `data_donatie IS NULL` (multi-date strings ca `"11.10.2019; 13.11.2019"`) — parser-ul nu reține `an` în acele cazuri.
|
||||
- PF: similar, 9,268 NULL pe `data_donatie`.
|
||||
- RVC: 7,236 skip-uri pe upsert (rânduri cu format date neparsabil în limba română, ex. `"septembrie 2019"`).
|
||||
|
||||
### De ce nu există 2025/2026 pe sursă
|
||||
|
||||
**Mecanism legal (Legea 334/2006 + HG 10/2016):**
|
||||
- Partidele politice raportează **donațiile peste 10× salariu minim** la AEP, care le publică în **Monitorul Oficial Partea I-A**.
|
||||
- Termen legal: până la **30 aprilie anul N+1** pentru donațiile anului N (raport anual venituri/cheltuieli).
|
||||
- Pentru campanii electorale: raportare separată în 15 zile de la finalul campaniei.
|
||||
- Expert Forum (proiectul banipartide.ro) scanează MO, parsează PDF-urile și actualizează tabelul cca 1-3 luni după publicare.
|
||||
|
||||
**Calendar așteptat:**
|
||||
| Date donații | Raport AEP în MO | Apariție pe banipartide.ro | Estimare disponibilitate gov-agreg |
|
||||
|---|---|---|---|
|
||||
| 2024 (anuale) | apr 2025 | mai-aug 2025 | ✅ deja în DB |
|
||||
| 2025 (anuale) | apr 2026 | **mai-aug 2026** | 🕒 fereastră **acum** (mai 2026) – aug 2026 |
|
||||
| 2026 (anuale) | apr 2027 | mai-aug 2027 | 🕒 mai 2027+ |
|
||||
| 2024 campanii electorale (PE, prezidențiale, locale, parlamentare) | 15-30 zile post-campanie | 1-3 luni mai târziu | ✅ în DB la `data_donatie` apropiat de turul de scrutin |
|
||||
|
||||
**Notă RVC:** Rapoartele anuale de venituri/cheltuieli (RVC) sunt mai lente — 2023 a apărut probabil în 2025. Așteptăm 2024 pe sursă în **iunie-octombrie 2026**.
|
||||
|
||||
### Recomandare de cadență (revizuită)
|
||||
|
||||
Cron actual `vreaudigital-aep-donatii.timer` = 1 ale lunii la 03:30 (= **lunar**, mai des decât §1 #26 care zicea quarterly). Asta e **OK pentru fereastra mai-august 2026** când e cel mai probabil să apară 2025 — îl prinde la prima rulare.
|
||||
|
||||
**Nu schimbăm cadența**. Heartbeat-ul (§4.1) ar trebui să fie tolerant la **95 zile** stale (cum e setat), pentru că între ianuarie-aprilie nu apare nimic nou și asta e normal.
|
||||
|
||||
### Next check
|
||||
|
||||
Următoarea verificare automată **15 iunie 2026** (~o lună după aceasta) — dacă sursa tot nu publică 2025, alarmă falsă; dacă publică, cron-ul de 1 iulie 03:30 va prinde inserțiile. Verificare manuală opțională: `curl` aceeași SQL ca aici, `python3 -c "..."` pentru count years.
|
||||
|
||||
@@ -0,0 +1,486 @@
|
||||
# GovTech Commons Portal for AI and Civic Tools
|
||||
|
||||
## Executive summary
|
||||
|
||||
A citizen-friendly govtech aggregator that *hosts runnable demos and MVPs* can become a practical accelerator for digitalization—if it behaves less like a “showcase website” and more like a **trusted, inspectable distribution channel** with a **security-first sandbox**, **standardized metadata**, and a **clear trust ladder**. The window is good in the entity["organization","European Union","supranational union"] because reuse infrastructure has matured (e.g., the EU Open Source Solutions Catalogue launched in 2025 and is expanding to include more individual modules and libraries), and public-sector metadata standards like publiccode.yml are already operational at national scale (Italy) and being adopted across Europe. citeturn6search8turn6search1turn6search0turn5search12turn5search0
|
||||
|
||||
A “foolproof” plan is less about a perfect product spec and more about **enforceable constraints**: (1) demo environments that default to **no personal data and no outbound network**; (2) **supply-chain controls** (SBOM + provenance + signing) on everything that runs; (3) a **badge system** that makes risk legible to normal citizens while also giving administrations procurement-grade evidence; and (4) governance rules that map cleanly to EU obligations (GDPR, DSA, AI Act, accessibility). citeturn15view1turn12view1turn19view2turn33view1turn26view2turn34search6turn34search0
|
||||
|
||||
The portal should be open source end-to-end (code + policy + schemas), but operationally it must behave like a **multi-tenant platform**. That means treating every uploaded demo/tool as untrusted until proven otherwise, and *never* letting “it’s open source” substitute for verification. This aligns with the direction of NIST SSDF (secure-by-design practices, provenance/SBOM, and controlled build/release processes) and modern supply-chain frameworks like SLSA and Sigstore. citeturn26view0turn26view2turn34search18turn34search0turn34search6
|
||||
|
||||
### Prioritized next ten steps
|
||||
|
||||
1. Define the portal’s **scope boundaries** and “hard rules” (no citizen PII in demos by default; sandbox profiles; outbound network policy; takedown policy). citeturn21view0turn12view1turn15view1
|
||||
2. Adopt **publiccode.yml as the base**, and publish a **superset schema** (govtech + AI + security + privacy + demo runtime descriptors). citeturn5search12turn5search0turn6search4
|
||||
3. Implement ingestion as “metadata-first”: publiccode.yml validation + minimal listing before any runnable demo. citeturn5search15turn5search5
|
||||
4. Build the initial trust ladder and badges (Demo-safe → Pilot-verified → Production-adopted) with objective criteria and required artifacts (SBOM, signatures, docs). citeturn34search6turn26view2turn3search10
|
||||
5. Stand up a secure demo runner MVP using **Wasm-first** (safe-by-default, low cost) + plan for microVM expansion for heavier workloads. citeturn2search0turn2search1turn2search2
|
||||
6. Establish CI policy: reproducible builds, SBOM generation, signing/attestations, baseline SAST/SCA, and publish-only-signed artifacts. citeturn34search6turn34search0turn3search16turn26view2
|
||||
7. Create “pilot packs” that administrations can evaluate quickly (security pack, DPIA pack, deployment pack, procurement notes). citeturn6search7turn15view1turn21view0
|
||||
8. Launch with 2–3 Romanian “anchor” categories (payments, identity, open data) and invite projects that already exist in the ecosystem to list. citeturn7search8turn7search33turn7search5turn0search3
|
||||
9. Formalize governance: maintainers, security response process, moderation/DSA workflow, and transparent metrics/reporting. citeturn12view3turn5search10
|
||||
10. Run a first cohort: “one-click pilot hack-week” with at least one city/agency partner and publish results as reusable modules. citeturn21view1turn6search5
|
||||
|
||||
## Ecosystem scan with actionable patterns
|
||||
|
||||
Europe already provides “parts of the stack” you want, but split across multiple initiatives; the opportunity is to **compose** them into a single citizen-readable experience while staying compatible with EU reuse infrastructure.
|
||||
|
||||
image_group{"layout":"carousel","aspect_ratio":"16:9","query":["EU Open Source Solutions Catalogue Interoperable Europe Portal screenshot","Developers Italia software reuse catalog screenshot","openCode Germany software directory screenshot","code.gouv.fr Free Software unit screenshot"],"num_per_query":1}
|
||||
|
||||
The EU OSS Catalogue—hosted via the Interoperable Europe Portal—was launched in 2025 to help public administrations discover and reuse OSS solutions, and is evolving to include more individual components and libraries beyond federated national catalogues. citeturn6search8turn6search1turn6search0turn6search5
|
||||
|
||||
Italy is the strongest “operational reuse” reference model: publiccode.yml is mandatory for public software developed in Italy, and is used to populate the national catalogue via automated crawling; the standard is explicitly intended to be understandable for both technical and non-technical audiences. citeturn5search15turn5search5turn5search12turn5search0
|
||||
|
||||
Germany’s openCode demonstrates a second important pattern: a platform-level badge program that communicates security/maintenance/reuse qualities of listed projects, and a “publiccode.yml as gate” approach for the directory. citeturn5search2turn5search6turn5search20
|
||||
|
||||
France’s code.gouv.fr shows a central government unit supporting publishing source code and increasing free/open-source usage across administrations, with an explicit action plan and catalog references (e.g., SILL list). citeturn5search3turn5search9turn5search36
|
||||
|
||||
At the EU institutional level, the entity["organization","European Commission","eu executive"] adopted an internal open source software strategy (2020–2023) positioning open source as a key lever for internal processes and collaboration, reinforcing that public-sector OSS is not “experimental” but mainstreamed. citeturn9search3turn9search5turn9search9
|
||||
|
||||
Globally, the entity["organization","Digital Public Goods Alliance","un multi-stakeholder initiative"] provides a registry-shaped pattern: a public listing that is anchored in a formal standard and verification process, oriented to public-benefit digital goods. This is directly relevant to your portal’s “trust layer,” even if your scope is narrower (EU/Romania civic tools rather than all DPG categories). citeturn0search0turn0search5
|
||||
|
||||
For sandboxed experimentation and “building blocks,” GovStack’s sandbox concept is an example of a shared environment to test digital government components, which aligns with the Interoperable Europe Act’s push toward interoperability solutions and regulatory sandboxes. citeturn0search1turn21view1turn23view1
|
||||
|
||||
Romania has real anchor services that can seed your portal and make it immediately legible to citizens and administrations: entity["organization","Autoritatea pentru Digitalizarea României","national digital agency ro"] has announced ROePAS as a single access point for digital public services; Romania operates the national online payment system Ghișeul.ro (officially operated by ADR); the national open data portal data.gov.ro acts as a central access point for open datasets; and the national digital identity SSO solution ROeID is positioned for citizen authentication across services. citeturn0search3turn7search8turn7search5turn7search33turn7search1
|
||||
|
||||
## Product concept and information architecture
|
||||
|
||||
The portal should be designed as **two interlocking products**:
|
||||
|
||||
A. A developer-facing “govtech GitHub layer” that standardizes publication and reproducibility (metadata, builds, attestations).
|
||||
B. A citizen-facing “app gallery layer” that translates that evidence into **plain-language trust signals** and safe demos.
|
||||
|
||||
This two-layer model matches how public-sector reuse initiatives already operate: machine-readable metadata (publiccode.yml) for indexing and discoverability, plus human-friendly presentation and governance. citeturn5search12turn5search0turn6search4turn21view0
|
||||
|
||||
### Personas and primary journeys
|
||||
|
||||
Developers: need a fast path from repository → listing → demo. They respond to low-friction onboarding (automated linting, templates, GitHub/GitLab integration) and strong incentives (visibility, interoperability adoption). This is consistent with publiccode.yml’s goal of being discoverable and understandable across audiences. citeturn5search12turn5search15
|
||||
|
||||
Citizens: want simple answers: “What does it do?”, “Is it safe to try?”, “Is the state using it?”, “Can I report an issue?”. The Interoperable Europe Act explicitly expects portals to be accessible to all citizens and to allow citizen feedback. citeturn21view0turn21view1
|
||||
|
||||
Institutions and evaluators: need procurement-grade artifacts and low-risk pilot pathways. Italy’s public administration acquisition guidance explicitly privileges open source/reuse and mandates comparative evaluation, providing a model for “evaluation packets” that your portal can pre-assemble. citeturn6search7turn6search3turn6search10
|
||||
|
||||
### Information architecture
|
||||
|
||||
A practical IA that maps to how citizens think, while still indexing like a catalog:
|
||||
|
||||
Top-level navigation:
|
||||
- **Services** (citizen tasks): pay, identify, request documents, permits, reporting, transparency, benefits.
|
||||
- **Building blocks** (for institutions/devs): identity, payments, forms, document processing, notifications, workflow, interoperability, AI assistants, search.
|
||||
- **Demos** (safe sandbox): runnable, read-only by default.
|
||||
- **Adoption**: pilots, deployments, case studies, “used by” listings.
|
||||
- **Standards & trust**: badges, compliance, security model, reporting.
|
||||
|
||||
This aligns with the EU OSS Catalogue’s framing as a centralized platform to discover OSS solutions for public administrations. citeturn6search0turn6search5
|
||||
|
||||
### Metadata and taxonomy: publiccode.yml superset
|
||||
|
||||
publiccode.yml is already a Europe-aligned, public-administration-oriented metadata standard intended to make software discoverable and understandable for technical and non-technical users. citeturn5search12turn5search0
|
||||
|
||||
A portal like yours should treat publiccode.yml as the “minimum contract,” and extend it with an explicit, versioned **govtech extension**. Suggested additional fields (conceptually; adapt naming to YAML conventions):
|
||||
|
||||
- **demo**: `runnable: true/false`, `sandboxProfile: wasm|container|microvm`, `internet: none|egress-allowlist`, `piiPolicy: synthetic-only|no-storage|user-provided`, `maxRuntimeSeconds`, `resources`.
|
||||
- **security**: `sbom: SPDX|CycloneDX + artifact ref`, `provenance: SLSA predicate ref`, `signing: sigstore|key-based`, `vulnPolicy: thresholds`, `pentest: date/summary`.
|
||||
- **privacy**: `dataCategories`, `controller/processor`, `dpia: link/ref`, `retention`, `dpaAvailable`.
|
||||
- **ai** (if applicable): `aiUsed: yes/no`, `modelType`, `modelSource`, `riskClassHint`, `humanOversight`, `limitations`, `knownFailureModes`.
|
||||
- **adoption**: `usedBy`, `pilots`, `productionDeployments`, `supportModel`.
|
||||
- **interoperability**: `standards`, `apis`, `openapi`, `eventSchemas`, `exportFormats`.
|
||||
|
||||
Why these are “load-bearing”: the EU OSS Catalogue uses publiccode.yml as its reference specification and requires a valid publiccode.yml for onboarding—so building your superset as a compatible extension keeps you future-proof and interoperable with EU catalog infrastructure. citeturn6search4turn6search0turn5search0
|
||||
|
||||
## Trust, badges, governance, and transparency
|
||||
|
||||
Trust must be expressed as a ladder because a “single trust stamp” fails both citizens (too vague) and institutions (not evidence-based). The openCode badge program illustrates how criteria-based badges can communicate security/maintenance/reuse qualities. citeturn5search6turn5search2
|
||||
|
||||
### Trust badge ladder
|
||||
|
||||
The ladder below is designed so that (a) early-stage projects still get listed; (b) runnable demos are gated by sandbox security; and (c) administrations can identify “pilot-ready” candidates quickly.
|
||||
|
||||
| Badge level | What it means | Minimum evidence required |
|
||||
|---|---|---|
|
||||
| Listed | discoverable entry | valid publiccode.yml; license; contacts; short citizen description citeturn5search15 |
|
||||
| Demo-safe | runnable in constrained sandbox | no PII default; sandboxProfile declared; security scans pass; clear limitations |
|
||||
| Verified supply chain | artifacts are verifiable | SBOM (SPDX/CycloneDX); signed artifacts; provenance/attestation (SLSA-style) citeturn3search5turn3search2turn34search6turn34search0 |
|
||||
| Pilot-verified | tested with a public body in controlled scope | pilot report; DPIA summary if personal data; deployment notes; incident channel citeturn15view1turn23view1 |
|
||||
| Production-adopted | used in real service delivery | named deployments; uptime/SLO disclosure; support model; change management summary |
|
||||
|
||||
Supply-chain “verified” is not optional if you host runnable artifacts: modern guidance emphasizes SBOM/provenance, and NIST SSDF explicitly calls out collecting and sharing provenance data, including SBOMs, as part of protecting releases. citeturn26view2turn3search10turn34search0
|
||||
|
||||
### Governance model that fits open source and EU reality
|
||||
|
||||
A minimal model that avoids “governance theater”:
|
||||
|
||||
- Maintainers + technical steering group (open, recorded decisions).
|
||||
- Security response team (private intake, coordinated disclosure, SLA).
|
||||
- Moderation team (DSA-aligned notice/takedown, transparency reports). citeturn12view1turn12view3
|
||||
- Public schema governance (versioned metadata extension; deprecation policy).
|
||||
|
||||
This is consistent with the Standard for Public Code’s emphasis on accountable, sustainable collaboration for public codebases. citeturn5search10turn5search34
|
||||
|
||||
## Legal and compliance map for EU and Romania
|
||||
|
||||
This portal is a “compliance intersection”: it hosts software listings + potentially user-generated content + runnable demos + AI-related disclosures.
|
||||
|
||||
### GDPR and privacy by design
|
||||
|
||||
If demos collect or process personal data, GDPR obligations trigger immediately (lawful basis, minimization, security measures, transparency). GDPR requires data protection by design and by default and requires DPIAs for processing likely to result in high risk, including certain profiling/large-scale scenarios. citeturn15view0turn15view1turn14view2
|
||||
|
||||
Design implication: your default demo posture should be **no personal data** (synthetic datasets; no account creation to try demos; no persistent storage unless strictly necessary). This also aligns with Interoperable Europe portal constraints that solutions accessible through the portal should not contain personal data or confidential information. citeturn21view0
|
||||
|
||||
### EU AI Act obligations that matter for a govtech tool portal
|
||||
|
||||
The AI Act includes outright prohibited AI practices and a structured regime for high-risk AI systems, including requirements around lifecycle performance, cybersecurity, and provider obligations (documentation, logs, conformity processes). citeturn19view0turn19view1turn19view2
|
||||
|
||||
Practical portal implication: don’t try to “classify” each tool legally for developers—but require **structured self-declaration** fields and publish them, plus disclaimers and human-oversight notes. This reduces ambiguity, and aligns with the Act’s emphasis on trustworthy AI and clear obligations. citeturn16view0turn19view2
|
||||
|
||||
### Digital Services Act obligations
|
||||
|
||||
Because the portal hosts third-party submissions and may expose demos, it should assume it is at least a hosting service under the DSA and implement: clear terms/conditions, notice-and-action mechanisms, reasoned decisions for moderation actions, and transparency reporting obligations depending on platform classification. citeturn12view0turn12view1turn12view3
|
||||
|
||||
Even if you remain below “very large platform” thresholds, the operational pattern is the same: documented moderation, user reporting, audit-friendly logs. citeturn12view3
|
||||
|
||||
### Interoperable Europe Act and regulatory sandboxes
|
||||
|
||||
The Interoperable Europe Act requires the Commission to provide a portal as a single point of entry for interoperability solutions and explicitly includes citizen/business feedback functions; it also frames interoperability regulatory sandboxes with openness and reporting. citeturn21view0turn21view1
|
||||
|
||||
A Commission implementing regulation (2025/1420) sets out operational rules for interoperability regulatory sandboxes and includes expectations like publishing calls and eligibility criteria and making sandbox/project information available via a dedicated interface. citeturn23view0turn23view2
|
||||
|
||||
Your portal can function as a **national/independent complement**: either a feeder into EU mechanisms (metadata-compatible) or a sandbox entry ramp for Romanian public bodies. citeturn6search0turn23view1
|
||||
|
||||
### Accessibility: legal baseline and operational standard
|
||||
|
||||
EU law requires public-sector websites and mobile applications to meet accessibility requirements, and EN 301 549 provides functional accessibility requirements, test procedures, and methodology usable in procurement and compliance contexts. citeturn33view1turn31view1turn8search11
|
||||
|
||||
Portal implication: treat accessibility as release gating for portal UI and as a badge criterion for hosted demos (especially anything citizen-facing). citeturn31view0turn33view1
|
||||
|
||||
### Romania-specific operational anchors
|
||||
|
||||
Romania’s digital ecosystem already includes national-scale platforms that can seed the portal’s categories and “real usage” stories: Ghișeul.ro as the national online payment system operated by ADR; data.gov.ro as the national open datasets portal and central access point; and ROeID positioned as a national SSO solution for citizen digital interactions. citeturn7search8turn7search5turn7search33
|
||||
|
||||
Where institutions are already producing digital services, your portal’s value is making components reusable and testable, not duplicating official portals. citeturn0search3turn21view0
|
||||
|
||||
## Secure sandbox architecture and the CI build-scan-run-observe pipeline
|
||||
|
||||
The security architecture must assume adversarial submissions (malware, crypto-miners, data exfiltration, phishing, prompt-injection style abuse in AI tools). The goal is to make “click-to-try” safe by default.
|
||||
|
||||
### Sandbox runtime comparison
|
||||
|
||||
Each runtime has a role; avoid a premature “one runtime for everything” decision.
|
||||
|
||||
| Candidate runtime | Core security property | Best fit in this portal | Key trade-offs |
|
||||
|---|---|---|---|
|
||||
| WebAssembly sandbox | Modules execute in a sandboxed environment and can’t escape without going through appropriate APIs citeturn2search0 | default “demo-safe” for lightweight compute, text transforms, policy simulators | limited OS-level compatibility; needs careful capability design |
|
||||
| Firecracker microVM | purpose-built for secure, multi-tenant container and function-based workloads citeturn2search1turn2search5 | medium-risk demos requiring Linux userland, stronger isolation | higher ops complexity; VM image management |
|
||||
| Kata Containers | containers with VM-level isolation using hardware virtualization as second layer of defense citeturn2search2turn2search18 | Kubernetes-integrated multi-tenant workloads where compatibility matters | overhead vs plain containers; runtime complexity |
|
||||
| gVisor | application kernel that limits host kernel surface accessible to containers citeturn2search39turn2search3 | “middle isolation” for container workloads when microVM overhead is too high | syscall compatibility limits for some apps |
|
||||
|
||||
A practical approach is **Wasm-first** for MVP, then add microVM-backed runners for “heavier” demos once governance and scanning are mature. citeturn2search0turn2search5turn2search2
|
||||
|
||||
### Supply chain controls: SSDF, SLSA, SBOM, Sigstore
|
||||
|
||||
NIST SSDF organizes secure development into four groups (Prepare the Organization, Protect the Software, Produce Well-Secured Software, Respond to Vulnerabilities) and explicitly includes collecting and sharing provenance data like SBOMs as part of protecting releases. citeturn26view0turn26view2
|
||||
|
||||
SLSA provides a framework and attestation formats (provenance) that support verification of how artifacts were built and the need to verify provenance against expectations. citeturn34search18turn34search0turn34search4
|
||||
|
||||
Sigstore provides an ecosystem for artifact signing and verification, including keyless signing and transparency logs, and explicitly targets signing/verifying artifacts including SBOMs. citeturn34search6turn34search13turn34search9
|
||||
|
||||
SBOM standards have credible “minimum elements” guidance (NTIA) and well-established machine-readable formats like SPDX (ISO/IEC 5962:2021) and CycloneDX (ECMA-424). citeturn3search10turn3search5turn3search2
|
||||
|
||||
### CI/build/scan/run/observe pipeline
|
||||
|
||||
The platform should have no “manual exceptions” for runnable artifacts: if it runs, it must be buildable and verifiable.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A[Repo or upload] --> B[Metadata lint: publiccode.yml + portal extension]
|
||||
B --> C[Build in isolated runner]
|
||||
C --> D[Generate SBOM + provenance]
|
||||
D --> E[Static scans: SAST + SCA + secrets]
|
||||
E --> F[Sign + attest (Sigstore)]
|
||||
F --> G[Publish artifacts to registry]
|
||||
G --> H[Deploy to sandbox runner (Wasm / gVisor / microVM)]
|
||||
H --> I[Runtime controls: no PII default, egress policy, quotas]
|
||||
I --> J[Observability: logs, metrics, traces]
|
||||
J --> K[Trust badge evaluation + publish demo]
|
||||
K --> L[Ongoing monitoring + vuln intake + revocation]
|
||||
```
|
||||
|
||||
This pipeline is directly motivated by SSDF’s emphasis on protected releases and provenance/SBOM practices and by Sigstore/SLSA’s attestation and verification approach. citeturn26view2turn34search6turn34search0turn34search4
|
||||
|
||||
### Reference architecture: key components
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Portal
|
||||
UI[Citizen UI + Dev UI]
|
||||
API[Portal API]
|
||||
META[(Metadata store)]
|
||||
SEARCH[(Search index)]
|
||||
end
|
||||
|
||||
subgraph SupplyChain
|
||||
CI[CI builders]
|
||||
REG[(Artifact registry)]
|
||||
LOG[Transparency log]
|
||||
end
|
||||
|
||||
subgraph Sandbox
|
||||
ORCH[Sandbox orchestrator]
|
||||
R1[Wasm runner]
|
||||
R2[Container runner]
|
||||
R3[MicroVM runner]
|
||||
OBS[Observability stack]
|
||||
end
|
||||
|
||||
UI --> API
|
||||
API --> META
|
||||
API --> SEARCH
|
||||
|
||||
API --> CI
|
||||
CI --> REG
|
||||
CI --> LOG
|
||||
REG --> ORCH
|
||||
ORCH --> R1
|
||||
ORCH --> R2
|
||||
ORCH --> R3
|
||||
ORCH --> OBS
|
||||
API --> OBS
|
||||
```
|
||||
|
||||
Core design constraint: demos are “sealed” artifacts pulled from a registry, not arbitrary code executed from a web form; this supports verifiable supply chain controls (SLSA/Sigstore) and reduces attack surface. citeturn34search6turn34search0turn29view1
|
||||
|
||||
### Entity relationships: catalog model
|
||||
|
||||
```mermaid
|
||||
erDiagram
|
||||
DEVELOPER ||--o{ PROJECT : submits
|
||||
PROJECT ||--o{ RELEASE : publishes
|
||||
PROJECT ||--o{ DEMO : exposes
|
||||
RELEASE ||--o{ ARTIFACT : contains
|
||||
ARTIFACT ||--o{ ATTESTATION : has
|
||||
PROJECT ||--o{ BADGE : earns
|
||||
INSTITUTION ||--o{ EVALUATION : performs
|
||||
PROJECT ||--o{ EVALUATION : receives
|
||||
CITIZEN ||--o{ FEEDBACK : files
|
||||
DEMO ||--o{ FEEDBACK : receives
|
||||
PROJECT ||--o{ ADOPTION : has
|
||||
INSTITUTION ||--o{ ADOPTION : uses
|
||||
```
|
||||
|
||||
This model supports Interoperable Europe expectations about citizen feedback and discoverability, while remaining compatible with catalog-first discovery patterns. citeturn21view0turn6search0turn5search12
|
||||
|
||||
## Adoption pathway, sustainability, roadmap, and risks
|
||||
|
||||
### Adoption pathway for administrations
|
||||
|
||||
Adoption is mostly blocked by evaluation cost and procurement friction, not by lack of prototypes. Your portal should ship “pilot packs” that reduce evaluation time:
|
||||
|
||||
- Technical pack: deployment topology, APIs, data flows, integration points.
|
||||
- Security pack: SBOM, signed artifacts, provenance, scan summaries, threat model. citeturn26view2turn34search6turn3search10
|
||||
- Privacy pack: DPIA template/summary, lawful basis assumptions, retention, data categories. citeturn15view1
|
||||
- Interop pack: standards supported, schema/export formats, mapping to EIF-style concerns. citeturn21view0
|
||||
- Procurement fit notes: comparable offerings, support options, exit strategy, licensing. Italy’s comparative evaluation and preference for reuse/open source is a strong pattern to mirror. citeturn6search7turn6search10
|
||||
|
||||
This gives administrations a “one-click” evaluation path that is consistent with EU reuse and interoperability goals. citeturn6search5turn21view1
|
||||
|
||||
### Sustainability and monetization options
|
||||
|
||||
The portal’s credibility increases if listing and baseline demos remain free. Monetization should target **enterprise-grade operational needs**, not citizen access.
|
||||
|
||||
| Option | What’s paid | Why it’s compatible with “mostly free” | Risks |
|
||||
|---|---|---|---|
|
||||
| Managed hosting for agencies | dedicated tenant, uptime, backups, SSO, audit logs | agencies pay for operations, not code | must avoid lock-in; publish infra-as-code |
|
||||
| Security & compliance services | pentests, DPIA assistance, conformity documentation packs | aligns with admin needs, improves trust | needs strict conflict-of-interest policy |
|
||||
| Private sandbox for sensitive pilots | isolated environment, custom egress allowlists, on-prem connectors | supports real pilots without exposing data | higher security liability |
|
||||
| Vendor support marketplace | paid support contracts around open tools | mirrors existing public procurement patterns | must prevent “pay to win” discovery |
|
||||
|
||||
The Interoperable Europe Act explicitly values openness and reuse, and requires portal-accessible solutions not contain personal data/confidential info—pushing your paid layer toward operations and private pilots rather than public demo monetization. citeturn21view0
|
||||
|
||||
### Phased roadmap
|
||||
|
||||
```mermaid
|
||||
gantt
|
||||
title GovTech Commons Portal Roadmap
|
||||
dateFormat YYYY-MM-DD
|
||||
axisFormat %b %Y
|
||||
|
||||
section Foundation
|
||||
Governance, schema, policies :a1, 2026-04-10, 45d
|
||||
Portal MVP (catalog + search) :a2, after a1, 60d
|
||||
|
||||
section Demo capability
|
||||
Wasm demo runner MVP :b1, after a2, 45d
|
||||
Trust badges v1 + moderation workflow :b2, after a2, 45d
|
||||
|
||||
section Pilot readiness
|
||||
Supply-chain verification (SBOM+sign) :c1, after b1, 45d
|
||||
Pilot packs + first agency pilots :c2, after c1, 60d
|
||||
|
||||
section Scale
|
||||
MicroVM runner + multi-tenant harden :d1, after c2, 75d
|
||||
Federation with EU catalog patterns :d2, after c2, 75d
|
||||
```
|
||||
|
||||
This roadmap is shaped by EU-level reuse infrastructure maturity (EU OSS Catalogue), the Interoperable Europe portal/sandbox direction, and the need to build trust controls before scaling runnable hosting. citeturn6search8turn6search1turn21view1turn23view1
|
||||
|
||||
### Risk register with mitigations
|
||||
|
||||
1. **Untrusted code execution leads to compromise** → Default Wasm sandbox; microVM for higher-risk workloads; strict outbound policy; resource quotas; signed-only artifacts; continuous monitoring. citeturn2search0turn2search5turn34search6turn29view0
|
||||
2. **Phishing / social engineering via “citizen demos”** → UI warnings, origin transparency, no credential collection in demos, content moderation workflow, takedown SLAs. citeturn12view0turn12view1
|
||||
3. **GDPR violations via accidental PII collection** → Synthetic datasets only by default; explicit DPIA gating for any tool that stores personal data; retention limits; privacy review checklist. citeturn15view1turn21view0
|
||||
4. **“Trust badge inflation” reduces credibility** → Criteria-based badges with revocation; publish evidence artifacts (SBOM, provenance); external audits for higher badges. citeturn26view2turn34search0turn5search6
|
||||
5. **Low admin adoption due to procurement friction** → Pilot packs aligned to comparative evaluation patterns; clear licensing; support marketplace. citeturn6search7turn6search10
|
||||
6. **Maintainer burnout / governance capture** → transparent governance; contribution guidelines; security process; rotate roles; publish metrics. citeturn5search10
|
||||
7. **AI-related legal ambiguity** → structured AI disclosures; conservative labeling; require human-oversight notes; avoid hosting prohibited AI practices. citeturn19view0turn19view2
|
||||
|
||||
## File B: concise implementation plan, launch checklist, and technical stack
|
||||
|
||||
```markdown
|
||||
# Implementation plan and launch checklist
|
||||
|
||||
## Target outcome
|
||||
Launch an open-source govtech portal that:
|
||||
- lists civic/AI tools with publiccode.yml-based metadata
|
||||
- allows safe, runnable demos (default: no PII, no outbound network)
|
||||
- exposes a trust ladder (Demo-safe → Pilot-verified)
|
||||
- supports administrations with pilot packs (security + privacy + deployment)
|
||||
|
||||
## Baseline technical stack (no vendor lock-in)
|
||||
- Frontend: Next.js (or equivalent SSR), static-first pages for catalog entries
|
||||
- Backend API: FastAPI or Node (NestJS), REST + optional GraphQL
|
||||
- Data: Postgres (source of truth), OpenSearch/Meilisearch (search)
|
||||
- Storage: S3-compatible object storage for artifacts and logs
|
||||
- Artifact registry: OCI registry (Harbor or registry:2)
|
||||
- CI: GitHub Actions / GitLab CI; isolated self-hosted runners for builds
|
||||
- Signing: Sigstore Cosign (keyless where possible) + Rekor transparency
|
||||
- SBOM: SPDX and/or CycloneDX (Syft/Trivy generation)
|
||||
- SCA/SAST: Trivy + Semgrep + secret scanning
|
||||
- Sandbox orchestration: Kubernetes + dedicated runner service
|
||||
- Runner v1: WebAssembly (WasmEdge/Wasmtime) for “demo-safe”
|
||||
- Runner v2: gVisor/Kata for container workloads
|
||||
- Runner v3: Firecracker microVM pool for stronger isolation
|
||||
- Observability: Prometheus + Loki + OpenTelemetry
|
||||
- Moderation: ticket system + audit log, DSA-style notice/action
|
||||
|
||||
## Workstreams and staffing (estimated effort)
|
||||
Roles:
|
||||
- Product lead (PL)
|
||||
- Tech lead (TL)
|
||||
- Security lead (Sec)
|
||||
- DevOps/SRE (SRE)
|
||||
- UX/content (UX)
|
||||
- Community & partnerships (Comms)
|
||||
- Legal/privacy (Legal)
|
||||
|
||||
Total MVP team size: 6–8 people part-time or 4–5 full-time equivalents.
|
||||
|
||||
## MVP scope (8–12 weeks)
|
||||
- Catalog listings: publiccode.yml validation + indexing
|
||||
- Citizen view: plain-language summaries + “what data does it use”
|
||||
- Developer onboarding: templates + CLI “publish” tool (optional)
|
||||
- Trust badges v1: Listed, Demo-safe
|
||||
- Wasm demo runner MVP:
|
||||
- no outbound network
|
||||
- time + memory quotas
|
||||
- read-only filesystem
|
||||
- synthetic datasets only
|
||||
- Moderation basics:
|
||||
- report button on every page/demo
|
||||
- takedown workflow + transparency log
|
||||
|
||||
## 100% launch checklist (with responsibilities, effort, budget ranges)
|
||||
Budget ranges are minimal and assume you already have hardware; currency unspecified.
|
||||
|
||||
### Governance and legal
|
||||
- [ ] (Legal, 0.25 pm, 0–1k) Terms of service + acceptable use + demo disclaimers
|
||||
- [ ] (Legal, 0.25 pm, 0–2k) Privacy notice + cookie policy + DPIA template
|
||||
- [ ] (PL+Comms, 0.25 pm, 0–1k) Governance: maintainers, decision process, security policy
|
||||
- [ ] (Sec, 0.25 pm, 0–5k) Vulnerability disclosure policy + intake channel + SLA
|
||||
|
||||
### Metadata and catalog
|
||||
- [ ] (TL, 0.5 pm, 0–1k) publiccode.yml validator + portal extension schema v0.1
|
||||
- [ ] (UX, 0.25 pm, 0–1k) Citizen-readable “tool card” template
|
||||
- [ ] (TL, 0.5 pm, 0–2k) Search indexing + filters (service domain, maturity, trust level)
|
||||
|
||||
### Demo runtime
|
||||
- [ ] (Sec+SRE, 0.75 pm, 0–3k) Wasm runtime chosen + hardening profile documented
|
||||
- [ ] (TL, 0.75 pm, 0–2k) Demo packaging format (OCI artifact or zip with manifest)
|
||||
- [ ] (SRE, 0.5 pm, 0–2k) Resource quotas + isolated namespaces + per-demo sandbox ID
|
||||
- [ ] (Sec, 0.5 pm, 0–2k) Outbound network policy enforcement (default deny)
|
||||
|
||||
### Supply chain
|
||||
- [ ] (Sec+SRE, 0.75 pm, 0–3k) CI isolated builders + minimal base images
|
||||
- [ ] (Sec, 0.5 pm, 0–2k) SBOM generation in pipeline (SPDX/CycloneDX)
|
||||
- [ ] (Sec, 0.5 pm, 0–2k) Signing + attestation with Cosign
|
||||
- [ ] (Sec, 0.5 pm, 0–2k) Admission policy: only signed artifacts can run
|
||||
|
||||
### Observability and ops
|
||||
- [ ] (SRE, 0.5 pm, 0–2k) Central logging + retention policy
|
||||
- [ ] (SRE, 0.5 pm, 0–2k) Metrics dashboards for sandbox and portal
|
||||
- [ ] (SRE, 0.25 pm, 0–1k) Backup + restore test for core databases
|
||||
|
||||
### Content and launch readiness
|
||||
- [ ] (Comms, 0.5 pm, 0–2k) Seed 30–50 listings (including Romania anchors)
|
||||
- [ ] (UX, 0.25 pm, 0–1k) Accessibility audit of portal UI
|
||||
- [ ] (PL, 0.25 pm, 0–1k) “How to publish” guide + example repo
|
||||
|
||||
### Partnerships and adoption
|
||||
- [ ] (Comms, 0.5 pm, 0–5k) Identify 3 pilot institutions and sign lightweight pilot MoUs
|
||||
- [ ] (PL+Legal, 0.5 pm, 0–3k) Pilot pack template: security + privacy + deployment
|
||||
- [ ] (PL, 0.25 pm, 0–2k) Publish pilot evaluation rubric (scoring + evidence)
|
||||
|
||||
## Post-launch (weeks 12–24)
|
||||
- Expand trust ladder: Verified supply chain, Pilot-verified
|
||||
- Add microVM runner for higher-risk demos
|
||||
- Federation: export compatible feeds for EU OSS Catalogue patterns
|
||||
- Run first “pilot cohort” and publish case studies
|
||||
```
|
||||
|
||||
## Multi-AI review prompt for iterative consolidation
|
||||
|
||||
```text
|
||||
You are an expert panel reviewing a plan for an open-source govtech aggregator portal (EU/Romania) that lists and hosts runnable AI/civic tools with a secure sandbox and a trust badge ladder.
|
||||
|
||||
INPUTS:
|
||||
1) The attached report (treat it as the baseline).
|
||||
2) Your task: produce an alternative analysis and improvements.
|
||||
|
||||
REQUIRED OUTPUT FORMAT:
|
||||
A. Critical gaps (top 10) — include why each matters.
|
||||
B. Architecture critique — specifically: sandbox isolation, multi-tenancy threats, supply-chain controls, and the run pipeline.
|
||||
C. Compliance critique — GDPR, EU AI Act, DSA, accessibility, Interoperable Europe Act; identify missed obligations and propose mitigations.
|
||||
D. Product critique — personas, IA, taxonomy/metadata (publiccode.yml superset), trust badges; propose simplifications.
|
||||
E. Feasibility — identify the MVP that can ship in 8–12 weeks with strong safety.
|
||||
F. Risk register — add 10 risks not covered and mitigations.
|
||||
G. Recommendations — a prioritized list of changes (must be actionable).
|
||||
|
||||
COMPARISON INSTRUCTIONS (IMPORTANT):
|
||||
- Identify where your conclusions differ from the baseline; label each difference as:
|
||||
(i) Correction (baseline is wrong),
|
||||
(ii) Enhancement (baseline is good but incomplete),
|
||||
(iii) Alternative (different but viable approach).
|
||||
- If you propose removing something from baseline, propose what replaces it.
|
||||
|
||||
MERGE INSTRUCTIONS (FOR FINAL CONSOLIDATION STEP):
|
||||
- After generating your response, produce a “Merged Plan Delta” section:
|
||||
- Keep: items you agree with.
|
||||
- Change: items to modify (include new wording).
|
||||
- Add: items missing from baseline.
|
||||
- Remove: items to drop and why.
|
||||
- Your goal is to help converge to a single unified plan that is safer, simpler, and more adoptable.
|
||||
|
||||
CONSTRAINTS:
|
||||
- Assume no vendor lock-in; must be open source friendly.
|
||||
- Assume attackers will submit malicious demos; security must be default-deny.
|
||||
- Prefer primary/official standards and laws; cite them when possible.
|
||||
```
|
||||
|
||||
## Primary sources and references
|
||||
|
||||
Key EU-level reuse and interoperability anchors: EU OSS Catalogue pages and its evolution, plus publiccode.yml as a prerequisite and reference spec. citeturn6search0turn6search1turn6search8turn6search4turn5search0
|
||||
|
||||
Legal obligations: GDPR, DSA, AI Act, Web Accessibility Directive, and Interoperable Europe Act plus implementing rules for interoperability regulatory sandboxes. citeturn15view1turn12view1turn19view2turn33view1turn21view1turn23view2
|
||||
|
||||
Security framework anchors: NIST SSDF and container security guidance; SLSA provenance and verification; Sigstore keyless signing and transparency. citeturn26view0turn29view1turn34search0turn34search4turn34search6turn34search13
|
||||
|
||||
Romanian ecosystem anchors: ADR platforms and national services (payments, identity, open data, ROePAS single access point). citeturn0search3turn7search8turn7search33turn7search5turn0search4
|
||||
@@ -0,0 +1,527 @@
|
||||
# Killer Findings — Cross-Source Hub Investigation
|
||||
|
||||
*G4 / vreaudigital.ro — sub-agent report, 2026-05-10*
|
||||
|
||||
> Each finding below is reproducible from the live database via `/tmp/govq.sh`. The entities are real legal persons with real CUIs as registered in `firms.entities` (3.97M canonical RO firm records). Numbers are aggregates from production materialized views (`seap.mv_top_suppliers`, `firms.mv_eu_funds_per_cui`, `regas.mv_ajutoare_per_cui`, `aep.mv_donatii_per_cui`, `cnsc.mv_per_authority_cui`, `cnsc.mv_per_contestator_cui`, `anaf.datornici_latest`, `aaas.firme`, `asf.entitati`).
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary — Top 3 most explosive findings
|
||||
|
||||
1. **HIDROELECTRICA SA (CUI 13267213)** — A state-owned company is on the public ANAF debtors list with a **214M RON tax debt** while simultaneously winning **562M RON in SEAP contracts** from 39 distinct public buyers. The biggest electricity producer in Romania appears as a *small debtor category* (`mici`) in the very tax authority's debt list. State-on-state circular money on an industrial scale.
|
||||
|
||||
2. **AVIOANE CRAIOVA SA (CUI 2326144)** — Owes the state **98.6M RON** (50.3M principal) and at the same time wins **105.3M RON in SEAP contracts** from just 2 distinct buyers (ARMAMENT). Company on the public debt list as `mijlocii`, yet contracted by the Ministry of Defense at scale. Net public-money flow is *into* the company while it owes the public budget.
|
||||
|
||||
3. **SSAB-AG SRL (CUI 2816022, Sector 1 Bucureşti)** — The pure 4-pipe winner: **475M RON SEAP contracts** + **12.1M RON EU funds (15 announcements)** + **23.1M RON state aid (regas, 8 ajutoare)** + **PDL political donor in 2008** + **on the ANAF debtor list (86K RON)**. Hits every single state-money source the hub currently tracks. The textbook case of a politically-connected supplier living entirely off the public budget.
|
||||
|
||||
---
|
||||
|
||||
## Storyline 1 — QUADRA-PIPE EXTREMES (SEAP ∩ EU funds ∩ state aid)
|
||||
|
||||
Firms appearing simultaneously in the three biggest public-money pipes, ranked by combined RON. None of these companies build anything outside the state.
|
||||
|
||||
### CONCELEX SRL · CUI 6544184 · Bucureşti
|
||||
|
||||
| Source | Volume | Detail |
|
||||
|---|---|---|
|
||||
| seap.mv_top_suppliers | **4,222 mn RON** | 1st place by SEAP value among all suppliers in the hub |
|
||||
| firms.mv_eu_funds_per_cui | 1.66 mn RON | EU-funded announcements |
|
||||
| regas.mv_ajutoare_per_cui | 111.6 mn RON | state-aid (regas) |
|
||||
|
||||
The single biggest beneficiary of public construction tenders in our database. **4.22 BILLION RON** in awarded SEAP contracts as a private-sector supplier. Combined with state aid, more than a quarter of the awarded value is still flowing through state-aid channels even as SEAP wins dominate the picture.
|
||||
Profile: `/achizitii/firma/6544184`
|
||||
|
||||
### M.I.S-GRUP SRL (denumirea SEAP: ARCHIPRO-DEVELOPMENT) · CUI 12472562 · Bistriţa-Năsăud
|
||||
|
||||
| Source | Volume | Detail |
|
||||
|---|---|---|
|
||||
| seap | 644.6 mn RON | SEAP suppliers |
|
||||
| EU funds | 11.8 mn RON | EU announcements |
|
||||
| regas | 28.6 mn RON | state aid |
|
||||
|
||||
A Bistriţa firm with 645M RON in SEAP wins, EU funds and regas state aid simultaneously. Notable that the firms.entities canonical name (`M.I.S-GRUP SRL`) differs from the SEAP supplier name (`ARCHIPRO-DEVELOPMENT`) — sanity-check passed: same CUI, same firm.
|
||||
Profile: `/achizitii/firma/12472562`
|
||||
|
||||
### CHIMCOMPLEX SA BORZESTI · CUI 960322 · Bacău
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap | 34.4 mn RON |
|
||||
| EU funds | **217 mn RON** |
|
||||
| regas | **427.3 mn RON** |
|
||||
|
||||
Inverted-pipe profile: a chemical industrial player whose state money flows are dominantly EU + regas (state aid) with comparatively modest SEAP. Combined 678 mn RON in non-tender public funds.
|
||||
Profile: `/achizitii/firma/960322`
|
||||
|
||||
### LIDAS SRL · CUI 4611791 · Tulcea
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap | 0.54 mn RON (397 contracts) |
|
||||
| EU funds | **91.4 mn RON** (23 announcements) |
|
||||
| regas | **188.8 mn RON** state aid |
|
||||
|
||||
A construction firm where SEAP is irrelevant — the entire public-money exposure is regas (188.8M) + EU funds (91.4M). Funded by Eximbank, MFP, MDRAP, and the Smart Growth Directorate (regas finantatori). Almost zero competitive procurement footprint, all subsidy/aid pipes.
|
||||
Profile: `/achizitii/firma/4611791`
|
||||
|
||||
### INVITE SYSTEMS SRL · CUI 22935583 · Ilfov (registered 2023!)
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap | 21.3 mn RON |
|
||||
| EU funds | **192.2 mn RON** |
|
||||
| regas | **151.5 mn RON** |
|
||||
|
||||
A 2-year-old company already pulling 192M EU funds + 151M state aid. The newest firm in the entire quadra-pipe top-30 list — registered in 2023, already in the multi-hundred-million RON club. Worth investigative attention into ownership and partner network.
|
||||
Profile: `/achizitii/firma/22935583`
|
||||
|
||||
---
|
||||
|
||||
## Storyline 2 — POLITICAL DONORS WINNING STATE MONEY
|
||||
|
||||
Firms in `aep.donatii_pj` (party donations registry) that simultaneously appear in 3+ state-money sources. The contract between donor and state is overt.
|
||||
|
||||
### SSAB-AG SRL · CUI 2816022 · Bucureşti — *the perfect 4/4*
|
||||
|
||||
| Source | Volume | Detail |
|
||||
|---|---|---|
|
||||
| aep.donatii_pj | 20,230 RON | PDL donation (2008) |
|
||||
| seap | **475.3 mn RON** | 2 SEAP contracts |
|
||||
| EU funds | 12.1 mn RON | 15 EU announcements |
|
||||
| regas | 23.2 mn RON | 8 state-aid records |
|
||||
| anaf.datornici | 86,007 RON | currently on debtor list |
|
||||
|
||||
The only firm in the entire database that lights up simultaneously in donations, SEAP, EU funds, regas state aid, AND the ANAF debtor list. A 20K donation made in 2008 sits next to ~510 mn RON in subsequent state contracts and aid. Cause-and-effect not implied — but the optics are extraordinary.
|
||||
Profile: `/achizitii/firma/2816022`
|
||||
|
||||
### EKY-SAM SRL · CUI 9672080
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aep.donatii_pj | 13,708 RON to PC + PD |
|
||||
| seap | 432.0 mn RON |
|
||||
| EU funds | 10.5 mn RON |
|
||||
| regas | 14.5 mn RON |
|
||||
|
||||
Donor to two parties (now-defunct PC/Conservatives and PD), now sitting on 432M RON in SEAP wins with 11M EU funds and 15M state aid. Total state-money exposure 457M RON for under 14K of declared donations.
|
||||
Profile: `/achizitii/firma/9672080`
|
||||
|
||||
### CIS GAZ SA · CUI 1210493
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aep.donatii_pj | 3,500 RON to PD |
|
||||
| seap | 208.6 mn RON (5 contracts) |
|
||||
| EU funds | 0.83 mn RON |
|
||||
| regas | **60.5 mn RON** |
|
||||
|
||||
A gas-sector firm with 208M SEAP wins and 60M state aid. Donation: 3,500 RON. Money-flow ratio: ~77,000:1 in the firm's favor.
|
||||
Profile: `/achizitii/firma/1210493`
|
||||
|
||||
### UTILNAVOREP SA · CUI 1905300
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aep.donatii_pj | 50,000 RON to PNL |
|
||||
| seap | 217.0 mn RON (8 contracts) |
|
||||
| EU funds | 5.9 mn RON |
|
||||
| regas | 15.6 mn RON |
|
||||
|
||||
A 50K PNL donation sits with 217M SEAP wins.
|
||||
Profile: `/achizitii/firma/1905300`
|
||||
|
||||
### ROMAQUA GROUP SA (Borsec) · CUI 402911
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aep.donatii_pj | 170,000 RON to ALDE + UDMR |
|
||||
| seap | 2.4 mn RON |
|
||||
| EU funds | 0.04 mn RON |
|
||||
| regas | **90.4 mn RON** |
|
||||
|
||||
Mineral-water giant: small SEAP exposure but 90.4M state aid. Donations 170K to ALDE and UDMR.
|
||||
Profile: `/achizitii/firma/402911`
|
||||
|
||||
### ROMBAT SA · CUI 564638 · Bistriţa
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aep.donatii_pj | 145,500 RON across PD, PDL, PNL, PSD, UDMR |
|
||||
| seap | 0.04 mn RON |
|
||||
| EU funds | 23.7 mn RON |
|
||||
| regas | 5.2 mn RON |
|
||||
|
||||
The pluralist donor — 5 different parties received money from this car-battery manufacturer. EU funds 23.7M. Profile: `/achizitii/firma/564638`
|
||||
|
||||
---
|
||||
|
||||
## Storyline 3 — STATE-OWNED CIRCULAR MONEY (state→state→state)
|
||||
|
||||
AAAS state-owned firms (active_holding portfolio) winning SEAP contracts from other public buyers, AND being on ANAF debtor list. Pure circular flow.
|
||||
|
||||
### RADIOACTIV MINERAL MAGURELE SA · CUI 16695222 · 100% state-owned
|
||||
|
||||
| Source | Volume | Detail |
|
||||
|---|---|---|
|
||||
| aaas.firme | 100% state share | active_holding portfolio |
|
||||
| seap | 0.49 mn RON | 5 contracts; buyers: Compania Nationala a Uraniului, Apele Minerale, CONVERSMIN, Slanic Moldova |
|
||||
| anaf.datornici | **3.98 mn RON debt** | mici category |
|
||||
|
||||
The only AAAS state-owned firm with material SEAP traction. State owns 100%, state pays it via uranium agency and the salt-mining authority, AND state is collecting from it as a tax debtor. Self-licking ice cream cone with a leaky bottom.
|
||||
Profile: `/achizitii/firma/16695222`
|
||||
|
||||
### COMALEX · CUI 1384767 · 53.6% state-owned
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aaas.firme | 53.6% state share |
|
||||
| anaf.datornici | 2.46 mn RON debt |
|
||||
|
||||
State-owned firm on ANAF debt list. No SEAP traction visible (the firm's commercial activity is minor).
|
||||
Profile: `/achizitii/firma/1384767`
|
||||
|
||||
---
|
||||
|
||||
## Storyline 4 — BIG SEAP SUPPLIER + BIG ANAF DEBTOR
|
||||
|
||||
Suppliers winning >50M RON in SEAP while owing >1M RON to the state's tax authority. The cynical case: pubic money flowing in *while* tax money flowing out is unpaid.
|
||||
|
||||
### POSTA ROMANA RA · CUI 427410 · State-owned utility
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap | **803.8 mn RON** (2,330 contracts, 901 distinct buyers) |
|
||||
| anaf.datornici | **25.1 mn RON debt** | mari category |
|
||||
|
||||
Top SEAP supplier of postal/courier services to virtually every Romanian public buyer (901 distinct authorities). Simultaneously owes the state 25M RON. STS, Min Finanţe, CNAIR, ministries — all keep buying from a registered tax debtor.
|
||||
Profile: `/achizitii/firma/427410`
|
||||
|
||||
### HIDROELECTRICA SA · CUI 13267213 (top finding)
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap | 561.9 mn RON (83 contracts, 39 buyers) |
|
||||
| anaf.datornici | **214.4 mn RON debt** |
|
||||
|
||||
A listed state-controlled energy giant — listed on Bucharest Stock Exchange, paid ~3 BN RON in dividends to MS in 2024 — simultaneously appears in the public ANAF debtor list owing 214M and on the SEAP supplier side winning 562M from public buyers. The "small debtor" category label (`mici`) on a tax debt of this size suggests the publication category may be defective or that part of the debt is contested.
|
||||
Profile: `/achizitii/firma/13267213`
|
||||
|
||||
### AVIOANE CRAIOVA SA · CUI 2326144 (top finding)
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap | 105.3 mn RON (6 contracts, 2 buyers) |
|
||||
| anaf.datornici | **98.6 mn RON debt** | mijlocii |
|
||||
|
||||
State-owned aircraft manufacturer. The math is striking: nearly 1:1 — for every RON it gets in SEAP awards, it owes a RON in tax debt to the same state.
|
||||
Profile: `/achizitii/firma/2326144`
|
||||
|
||||
### IOR SA · CUI 340312 · Bucureşti
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap | 413.2 mn RON (3 contracts, 1 buyer) |
|
||||
| anaf.datornici | 2.17 mn RON debt |
|
||||
|
||||
Defense optics supplier — 413M SEAP from a single buyer (procurement concentration ratio = 1.0) plus debt to ANAF.
|
||||
Profile: `/achizitii/firma/340312`
|
||||
|
||||
### ELECTROPUTERE VFU PAŞCANI SA · CUI 1996928 · Iaşi
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap | 279.2 mn RON |
|
||||
| anaf.datornici | 2.89 mn RON debt |
|
||||
|
||||
Rail-rolling-stock manufacturer; SEAP wins concentrated on 2 buyers.
|
||||
Profile: `/achizitii/firma/1996928`
|
||||
|
||||
### ENERGOMONTAJ SA · CUI 1555468 · Bucureşti
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap | 314.2 mn RON |
|
||||
| anaf.datornici | 2.36 mn RON debt |
|
||||
|
||||
Energy-construction supplier on debt list with 314M in SEAP wins.
|
||||
Profile: `/achizitii/firma/1555468`
|
||||
|
||||
### SOCIETATEA NATIONALA DE TRANSPORT FEROVIAR DE MARFA "CFR-MARFA" SA · CUI 11054537
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap | 232.4 mn RON |
|
||||
| anaf.datornici | 5.29 mn RON debt |
|
||||
|
||||
State-owned cargo rail. Indebted to the state, contracted by the state.
|
||||
Profile: `/achizitii/firma/11054537`
|
||||
|
||||
---
|
||||
|
||||
## Storyline 5 — SINGLE-BIDDER STORM (no real competition)
|
||||
|
||||
Firms winning massive SEAP contracts where they were the only bidder. Cross-referenced with debtor / aid lists.
|
||||
|
||||
### METAMINDS SA · CUI 34770594 · Bucureşti
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap.mv_top_suppliers | **1,219.6 mn RON** (28 contracts) |
|
||||
| seap.v_single_bidder | 113.1 mn across 8 distinct authorities (single-bidder rate: ~9% of value but spread across ministries) |
|
||||
| regas | 6.5 mn RON |
|
||||
| anaf.datornici | 43,552 RON debt |
|
||||
|
||||
The hottest single-bidder name of 2026 — registered 2015 as SA, recently won an **835M RON contract from Serviciul de Telecomunicaţii Speciale (STS)** on 11 Feb 2026, plus contracts at MoJ, MinFin, Tribunalul Bucureşti. ONRC says it sits in Bucharest. Single-bidder concentration plus rapid award flow plus a presence on the ANAF debtor list (small but present). Worth a deep journalistic file.
|
||||
Profile: `/achizitii/firma/34770594`
|
||||
|
||||
### FCSA JV · CUI 125644669 (foreign joint venture)
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| seap (single bidder) | **3,167.5 mn RON** in a single-bidder award |
|
||||
|
||||
The largest single-bidder award in our SEAP database — over 3 BILLION RON. International joint venture (CUI format suggests foreign tax id rather than Romanian fiscal code).
|
||||
|
||||
### Kalyon Insaat (Turkey) · CUI 4930083621
|
||||
|
||||
A 2.13 BILLION RON single-bidder award. Foreign contractor profile.
|
||||
|
||||
These two foreign single-bidder mega-awards are worth investigating: which Romanian authority awarded them, what infrastructure project, and whether procedures should have allowed competing bidders.
|
||||
|
||||
---
|
||||
|
||||
## Storyline 6 — AUTHORITIES THAT INVITE THE MOST CONTESTATIONS
|
||||
|
||||
Public buyers with the highest count of CNSC contestations against their procurement processes, plus the SEAP scale at which they procure.
|
||||
|
||||
### CNAIR (Compania Naţională de Administrare a Infrastructurii Rutiere) · CUI 16054368
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| cnsc.mv_per_authority_cui | **368 contestations** filed against |
|
||||
| seap (as authority) | **73,653 mn RON** total awarded value (1,111 distinct suppliers) |
|
||||
|
||||
CNAIR is the king of contested procurement. With 73.6 BILLION RON in awards across its history and 368 formal CNSC contestations, it accounts for the largest single share of procurement disputes in the country. Worth a permanent watchlist.
|
||||
Profile: `/achizitii/autoritate/16054368`
|
||||
|
||||
### CNI · Compania Națională de Investiții · CUI 14273221
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| cnsc | 266 contestations |
|
||||
| seap (as authority) | 22,176 mn RON awarded |
|
||||
|
||||
The state's school/sport-hall/civic-center construction agency — 266 contestations, 22.1 billion RON in tenders, 731 distinct suppliers. Profile: `/achizitii/autoritate/14273221`
|
||||
|
||||
### REGIA NATIONALA A PADURILOR ROMSILVA · CUI 1590120
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| cnsc | 171 contestations |
|
||||
| seap (as authority) | 1,936 mn RON awarded across **2,733 distinct suppliers** |
|
||||
|
||||
Forest agency — moderate contract value but enormous supplier diversity (2733!). High contestation count signals problematic individual lots.
|
||||
Profile: `/achizitii/autoritate/1590120`
|
||||
|
||||
### Distribuţie Energie Electrică Romania SA · CUI 14476722
|
||||
|
||||
90 contestations, 4,224 mn RON awarded across 483 suppliers.
|
||||
|
||||
### Complexul Energetic Oltenia SA · CUI 30267310
|
||||
|
||||
82 contestations, 2,224 mn RON across 581 suppliers.
|
||||
|
||||
---
|
||||
|
||||
## Storyline 7 — POLITICAL DONORS NOW ON ANAF DEBT LIST
|
||||
|
||||
Companies that gave money to political parties and ended up failing to pay their own taxes.
|
||||
|
||||
### B&B BUSINESS SOLUTIONS INVESTMENT SRL · CUI 21820372
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aep.donatii_pj | 10,000 RON to PNL (2007) |
|
||||
| anaf.datornici | **281.8 mn RON debt** | mici |
|
||||
|
||||
The most extreme ratio in the dataset: a 10K donation to PNL in 2007 sits with a 281.8 MILLION RON tax debt today. Profile: `/achizitii/firma/21820372`
|
||||
|
||||
### EUROAVIPO SA · CUI 2809076
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aep.donatii_pj | 110,000 RON to PNL (2007) |
|
||||
| anaf.datornici | **217.8 mn RON debt** |
|
||||
|
||||
A 217.8M tax debt sits next to a 110K donation. Profile: `/achizitii/firma/2809076`
|
||||
|
||||
### DOLY-COM SRL · CUI 9194636 · meat producer (BSE outbreak fame)
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aep.donatii_pj | 50,000 RON to PNL (2012, 2014) |
|
||||
| anaf.datornici | 104.0 mn RON debt | mari |
|
||||
|
||||
A meat processor known publicly for a 2018-2019 swine-fever scandal, on the donor list (PNL, 50K) and on the debt list with 104M RON unpaid. Profile: `/achizitii/firma/9194636`
|
||||
|
||||
### ASTRA BETTINGS SRL · CUI 13829753 (gambling)
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aep.donatii_pj | 189,000 RON to PDL (2008) |
|
||||
| anaf.datornici | 65.6 mn RON debt |
|
||||
|
||||
A betting company on debt list. Donation: 189K to PDL in 2008. Profile: `/achizitii/firma/13829753`
|
||||
|
||||
### SELINA SRL · CUI 6649997 — *donor + debtor + SEAP supplier*
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aep.donatii_pj | 10,000 RON to PDL (2012) |
|
||||
| anaf.datornici | 35.4 mn RON debt |
|
||||
| seap | **44.7 mn RON wins** |
|
||||
|
||||
The triple lock: donor, debtor, AND SEAP supplier. State buys ~45M from a 35M tax debtor that gave 10K to PDL.
|
||||
Profile: `/achizitii/firma/6649997`
|
||||
|
||||
### MODUL PROIECT SA · CUI 2696473
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| aep.donatii_pj | 518,000 RON to PSD (2 donations) |
|
||||
| anaf.datornici | 509,642 RON debt |
|
||||
| seap | 0.48 mn RON wins (4 contracts) |
|
||||
|
||||
A donor who gave PSD almost exactly what it owes ANAF today. Profile: `/achizitii/firma/2696473`
|
||||
|
||||
---
|
||||
|
||||
## Storyline 8 — INSURANCE CONCENTRATION ON STATE BUYERS
|
||||
|
||||
ASF-licensed insurers winning state procurement, with implicit concentration risk.
|
||||
|
||||
### ASIROM Vienna Insurance Group SA · CUI 336290
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| asf.entitati | active asigurator |
|
||||
| seap (as supplier) | **282.9 mn RON** across 519 contracts to **254 distinct public buyers** |
|
||||
|
||||
The dominant insurer for the Romanian public sector. 254 distinct authorities buy from ASIROM — basically, ASIROM insures most of public-sector Romania. Top of insurance concentration on state procurement.
|
||||
Profile: `/achizitii/firma/336290`
|
||||
|
||||
### Allianz-Țiriac Asigurari SA · CUI 6120740
|
||||
|
||||
282M RON across 466 contracts at 256 distinct buyers (similar concentration to ASIROM).
|
||||
Profile: `/achizitii/firma/6120740`
|
||||
|
||||
### FAST BROKERS · CUI 14785760 — *sanctioned broker*
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| asf.entitati | section_status=`radiat`, sanctioned April 2024 (`SANCT. CU RETR. AUTORIZ`) |
|
||||
| seap | 81.8 mn RON across 125 contracts at 66 distinct public buyers (all pre-sanction) |
|
||||
|
||||
A broker that ASF retracted authorization from with formal sanction in April 2024. Cumulative pre-sanction state-sector wins reach 81.8M RON across 66 distinct authorities, including RATBV, TRANSELECTRICA, CNAIR, ROMGAZ, MAPN.
|
||||
Profile: `/achizitii/firma/14785760`
|
||||
|
||||
### CITY INSURANCE SA · CUI 10392742 — *defunct insurer + PNL donor + ANAF debtor*
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| asf.entitati | section_status=`radiat`, license retracted Sept 2021 |
|
||||
| aep.donatii_pj | 30,000 RON to PNL (2009) |
|
||||
| anaf.datornici | 18.6 mn RON debt | mari |
|
||||
|
||||
Famous market-failure insurer. Donor before collapse, debtor after. Worth a "what was the systemic warning" timeline article.
|
||||
Profile: `/achizitii/firma/10392742`
|
||||
|
||||
---
|
||||
|
||||
## Storyline 9 — CONTESTATORS-IN-CHIEF (vexatious or right?)
|
||||
|
||||
Firms filing the highest counts of CNSC contestations.
|
||||
|
||||
### G.B. INDCO SRL · CUI 10421821
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| cnsc | **637 contestations filed** (highest in DB) |
|
||||
| seap (as supplier) | 12.0 mn RON wins (78 contracts) |
|
||||
|
||||
637 contestations is an order of magnitude above the next firms. Either the firm is a procurement watchdog or a vexatious litigant. Win record: 12M RON SEAP wins.
|
||||
Profile: `/achizitii/firma/10421821`
|
||||
|
||||
### STRABAG SRL · CUI 6891914
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| cnsc | 178 contestations filed |
|
||||
| seap | **2,371.8 mn RON wins** (25 contracts) |
|
||||
|
||||
A construction multinational that contests *and* wins big — 178 contestations, 2.37 BILLION RON in awarded SEAP value. Profile: `/achizitii/firma/6891914`
|
||||
|
||||
### EUSKADI SRL · CUI 17021083
|
||||
|
||||
107 contestations, 1,196.9 mn RON in SEAP wins.
|
||||
Profile: `/achizitii/firma/17021083`
|
||||
|
||||
### MEDIST SRL · CUI 6705884 · medical equipment supplier
|
||||
|
||||
| Source | Volume |
|
||||
|---|---|
|
||||
| cnsc | 161 contestations filed |
|
||||
| seap | 98.3 mn RON across 248 contracts |
|
||||
|
||||
Medical-tech contestator with 248 SEAP wins. Profile: `/achizitii/firma/6705884`
|
||||
|
||||
---
|
||||
|
||||
## Recipe ideas worth implementing on /achizitii/retete
|
||||
|
||||
The following query patterns surfaced repeatedly during this investigation and would each make compelling, public-interest one-click recipes on the hub. All are constructable from the existing matviews + base tables; no new ETL needed.
|
||||
|
||||
### Recipe A — *"Quadra-Pipe Top 100"*
|
||||
SQL: rank firms by combined (seap + EU funds + regas) RON, with full per-source breakdown and an ANAF debt flag column.
|
||||
Story: who lives entirely off the public budget across multiple state-money channels.
|
||||
|
||||
```sql
|
||||
SELECT s.cui_norm, s.name,
|
||||
s.total_value AS seap_ron, f.buget_total AS eu_ron, r.total_ron AS regas_ron,
|
||||
d.debt_total AS anaf_debt,
|
||||
a.total_lei AS aep_donatii, a.partide
|
||||
FROM seap.mv_top_suppliers s
|
||||
LEFT JOIN firms.mv_eu_funds_per_cui f ON f.cui = s.cui_norm
|
||||
LEFT JOIN regas.mv_ajutoare_per_cui r ON r.cui = s.cui_norm
|
||||
LEFT JOIN anaf.datornici_latest d ON d.cui = s.cui_norm
|
||||
LEFT JOIN aep.mv_donatii_per_cui a ON a.cui = s.cui_norm
|
||||
WHERE s.total_value > 0 AND f.buget_total > 0 AND r.total_ron > 0
|
||||
ORDER BY (s.total_value + f.buget_total + r.total_ron) DESC LIMIT 100;
|
||||
```
|
||||
|
||||
### Recipe B — *"Donor → State Money: pay-to-play index"*
|
||||
For each `aep.donatii_pj` donator, compute the ratio of state money received (SEAP+EU+regas) divided by donations made. List top 50 by absolute state money received (not the ratio, to avoid tiny-donation noise).
|
||||
|
||||
### Recipe C — *"Datornic care vinde statului"*
|
||||
Cross of `anaf.datornici_latest ⨝ seap.mv_top_suppliers`. Show debt vs. SEAP wins, default-sort by debt-as-% of SEAP awards. Public buyers should not buy from defaulting taxpayers.
|
||||
|
||||
### Recipe D — *"State-of-State circular flow"*
|
||||
`aaas.firme ⨝ seap.announcements (supplier_cui)`, shows which state-owned residual firms still draw public-sector contract revenue. Map buyer → supplier chains for the 11 firms.
|
||||
|
||||
### Recipe E — *"CNSC Authority of Shame"*
|
||||
For each `cnsc.mv_per_authority_cui` row, attach total SEAP buying value and number of distinct suppliers. Sort by `contestation_count / sqrt(seap_buying_ron)` to weight contests against scale. Once `decision_type` parsing improves, switch to "admis %" weighting.
|
||||
|
||||
### Recipe F — *"Single-bidder hot list"*
|
||||
`seap.v_single_bidder` aggregated by `supplier_cui`, with a flag column "monopolist" if `distinct_auth = 1`. Cross with ANAF debtor list and AEP donor list. Single-bidder + monopolist + debtor = max suspicion score.
|
||||
|
||||
---
|
||||
|
||||
## Methodology notes & caveats
|
||||
|
||||
- All CUI joins use `seap.mv_top_suppliers.cui_norm` (already normalized to digits-only) and the per-CUI matviews keyed on the same normalized CUI. SEAP's `supplier_cui` raw column has variants (`RO 12751583`, etc.) which inflate apparent "no licence" / "no operator" hits. The "energy without licence" storyline was downgraded after CUI-normalization showed only 2 candidates and one (NOVA POWER) is a known CUI mismatch in the source data.
|
||||
- `anaf.datornici_latest` is the most recent quarterly snapshot; firms may have settled debts since. Numbers cited are from the latest publication captured.
|
||||
- `aep.donatii_pj` covers 2003–present; old donations (2007–2012) reflect parties that may no longer exist (PDL, PD, PC, USL).
|
||||
- ASF "radiat" entries: `cui` may map to either the legal person (firm) or to an obsolete entity record; results were sanity-checked against `data_radiere` vs `seap.publication_date`.
|
||||
- All entities listed are *legal persons* (companies and public authorities) per CLAUDE.md privacy guidance — no natural persons profiled.
|
||||
|
||||
— Generated 2026-05-10 by sub-agent G4 against the live `architools_user` PostgreSQL hub.
|
||||
@@ -0,0 +1,402 @@
|
||||
# Deep-Dive Sectorial: ENERGIE × TELECOM × FINANCIAR
|
||||
|
||||
**Data:** 2026-05-10
|
||||
**Scop:** investigație sistemică, sub-agent G5
|
||||
**Surse cross:** anre.licente · ancom.operatori/drepturi · asf.entitati · seap.announcements · regas.ajutoare · aaas.firme · anaf.datornici_latest · firms.entities · firms.financials · fonduri.beneficiar_anunt
|
||||
|
||||
---
|
||||
|
||||
## Sumar executiv
|
||||
|
||||
**Energie (CPV 09 + 65.31 - 17.36 mld RON):** Piața se împarte între distribuitori de combustibili (Rompetrol, OMV) și furnizori de electricitate/gaze (Tinmar, Hidroelectrica, E.ON, Engie). HHI=814 → fragmentată per ansamblu, dar segmental oligopolică: top 5 suplinește 54.9% din valoarea contractelor publice. Cea mai gravă observație: **din 101 furnizori activi în CPV 09310 (electricitate) + 09123 (gaze) + 65.31 (distribuție), 67 (66%) NU au licență ANRE activă** — 1.35 mld RON s-au dus la entități fără autorizație confirmată, dintre care cele mai multe sunt brand-uri retail care operează pe licența companiei-mamă (PPC Energie Muntenia 24387371 vs. PPC Energie 22000460), un risc real de "licențe-fantomă" prin grupuri de companii.
|
||||
|
||||
**Telecom (CPV 32 + 64 - 7.37 mld RON):** Concentrație geografică extremă — **94% din valoarea contractelor merge la furnizori cu sediu în București**. HHI=661 sugerează fragmentare, dar pe segmentul cuprinzător al integratorilor IT/TIC pentru STS și ministere, **METAMINDS S.A. (CUI 34770594, 46 angajați, 180 mil cifră de afaceri 2024) a câștigat în feb. 2026 un singur contract de 835 mil RON cu STS** — de 4.6× cifra anuală, fără ca firma să fie autorizată ANCOM ca operator telecom. Top 10 furnizori cuprind 63% din piață; Poșta Română (1.0 mld RON, CPV 64.1) NU apare ca operator ANCOM (corect: e regulator separat), dar Telekom Romania Communications (427320, 74.5 mil RON post-2021) operează deja sub Orange și nu mai e listat ca operator activ.
|
||||
|
||||
**Financiar (CPV 66 - 2.24 mld RON):** Cea mai concentrată piață analizată: **HHI=1029, top 3 (BCR, Omniasig, Asirom) controlează 51%; top 5 controlează 60.5%, top 10 - 77.6%**. Concentrația București = 93% din valoare. ASF are date murdare (CUI-uri stocate cu sufix `/data` precum `14360018/19.12.2001`), ceea ce a creat fals pozitive de "neautorizat" la primele 3 societăți de asigurare. **Caz concret de continuare post-radiere: FAST BROKERS S.R.L. (CUI 14785760)** — autorizație de broker retrasă prin sancțiune ASF la 30.04.2024 (MO 403/30.04.2024) — a încasat 81.8 mil RON din 125 contracte SEAP înainte de retragere; după retragere a continuat ca firmă activă (CAEN schimbat 6820 - imobiliare).
|
||||
|
||||
---
|
||||
|
||||
## SECTOR 1: ENERGIE (CPV 09 + 65.31)
|
||||
|
||||
### Domeniu de scop
|
||||
- **CPV 09**: combustibili, electricitate (09.310), gaze naturale (09.123)
|
||||
- **CPV 65.31**: distribuție de energie electrică
|
||||
- **Regulator:** ANRE (29.536 înregistrări licență, din care 23.996 atestate, 4.541 electricitate, 999 gaze)
|
||||
- **Licențe active:** 7.269 CUI-uri distincte au cel puțin o licență ANRE acordată/atestată
|
||||
|
||||
### A. Concentrare de piață
|
||||
|
||||
| Indicator | Valoare |
|
||||
|-----------------------------------------|--------:|
|
||||
| Total RON SEAP (CPV 09 + 6531) | 17.36 mld |
|
||||
| Furnizori distincți cu contracte SEAP | 1.473 |
|
||||
| HHI (puncte, 0-10000) | 814 |
|
||||
| Top 5 cotă cumulată | 54.9 % |
|
||||
| Top 10 cotă cumulată | 76.2 % |
|
||||
|
||||
**Top 10 furnizori energie după valoare SEAP** (după normalizare RO/`RO ` din CUI):
|
||||
|
||||
| CUI | Furnizor | Contracte | Mil RON |
|
||||
|-----------|---------------------------------------------------|----------:|--------:|
|
||||
| 12751583 | ROMPETROL DOWNSTREAM SRL | 949 | 3268.3 |
|
||||
| 11201891 | OMV PETROM MARKETING SRL | 859 | 2071.0 |
|
||||
| 13991630 | OSCAR DOWNSTREAM SRL | 99 | 1562.2 |
|
||||
| 1860712 | ROMPETROL RAFINARE SA | 3 | 1464.5 |
|
||||
| 890561467 | Cameco Corporation (uraniu, contract Nuclearelectrica) | 1 | 1168.6 |
|
||||
| 34620961 | TINMAR ENERGY S.A. | 212 | 952.9 |
|
||||
| 7562758 | GETICA 95 COM SRL | 225 | 929.3 |
|
||||
| 13267213 | HIDROELECTRICA S.A. | 103 | 796.3 |
|
||||
| 18680651 | AMGAZ FURNIZARE / NOVA POWER & GAS SRL | 550 | 602.8 |
|
||||
| 22043010 | E.ON ENERGIE ROMÂNIA SA | 353 | 407.2 |
|
||||
|
||||
**Comentariu:** Piața apare fragmentată la nivel agregat (HHI sub 1000), dar este de fapt un set de **3 piețe oligopolistice suprapuse**: combustibili pentru flota publică (Rompetrol/OMV/Lukoil/Mol, ~75% top), gaze și electricitate (Tinmar/E.ON/Engie/Electrica), nuclear (Cameco - contract unic Nuclearelectrica).
|
||||
|
||||
### B. Decalaj autorizare-vs-contractare
|
||||
|
||||
Restrâns la CPV strict ANRE-reglementat (electricitate 09.310, gaze 09.123, distribuție 65.31):
|
||||
|
||||
| Categorie | Furnizori distincți | Mld RON |
|
||||
|--------------------------|--------------------:|--------:|
|
||||
| Cu licență ANRE activă | 41 | 4.20 |
|
||||
| Fără licență ANRE activă | 67 | 1.35 |
|
||||
| **Total** | **108** | **5.54** |
|
||||
|
||||
**Top 5 fără licență ANRE activă:**
|
||||
|
||||
| CUI | Furnizor | Mil RON | Comentariu |
|
||||
|----------|--------------------------------|--------:|------------|
|
||||
| 18680651 | NOVA POWER & GAS SRL | 599.3 | Operează ca AMGAZ FURNIZARE — licență deținută de entitate diferită |
|
||||
| 24387371 | PPC ENERGIE MUNTENIA S.A. | 299.5 | Toate licențele Retrasa/Expirata — operează pe brand-ul mamă PPC Energie 22000460 |
|
||||
| 28909028 | ELECTRICA FURNIZARE SA | 286.7 | Licență electricitate "Inregistrare Dosar" + Expirata; in proces de reînnoire |
|
||||
| 7127592 | PREMIER ENERGY TRADING S.R.L. | 74.2 | Trading nereglementat distinct |
|
||||
| 25834869 | CURENT ALTERNATIV S.R.L. | 18.6 | Toate licențele Retrasa / Incetat sub 1 MW |
|
||||
|
||||
**Observație critică:** decalajul de 1.35 mld RON nu indică automat fraudare licență — multe entități operează ca brand-uri retail pe baza licențelor companiilor-mamă din același grup. Dar SEAP nu colectează acest fapt, deci în lipsa unei verificări manuale **autoritățile contractante nu au cum să știe dacă furnizorul are dreptul legal să livreze**.
|
||||
|
||||
### C. Mortalitate regulatorie (post-licență)
|
||||
|
||||
Filtrat pe contracte SEAP după 2022-01-01 către CUI-uri **fără nicio licență activă ANRE**:
|
||||
|
||||
| CUI | Furnizor | Contracte 2022+ | Mil RON 2022+ |
|
||||
|----------|-----------------------------------|----------------:|--------------:|
|
||||
| 28909028 | ELECTRICA FURNIZARE SA | 376 | 255.9 |
|
||||
| 24387371 | PPC ENERGIE MUNTENIA S.A. | 19 | 3.5 |
|
||||
| 30855230 | COMPLEXUL ENERGETIC HUNEDOARA SA | 1 | 0.1 |
|
||||
|
||||
Complexul Energetic Hunedoara (CUI 30855230) — datornic ANAF de **477 mil lei** (la 2016-03-31, ultima publicare cu adevărat cuprinzătoare), încă semnează contracte. Dataseturile ANAF datornici sunt din 2016 — frescheța e o problemă, dar simbolic semnificativ.
|
||||
|
||||
### D. Co-finanțare cross-source (regas + AAAS + fonduri)
|
||||
|
||||
Ajutoare de stat (regas.ajutoare) către furnizori SEAP-energie:
|
||||
|
||||
| CUI | Beneficiar | Nr. ajutoare | Mil RON ajutor |
|
||||
|----------|-------------------------------------------|-------------:|---------------:|
|
||||
| 1590082 | OMV PETROM SA | 1 | 408.4 |
|
||||
| 1284717 | CN APDF SA GIURGIU | 3 | 344.4 |
|
||||
| 14476722 | DISTRIBUȚIE ENERGIE ELECTRICA ROMÂNIA SA | 1 | 225.0 |
|
||||
| 18680651 | NOVA POWER & GAS SRL | 4 | 45.1 |
|
||||
| 14491102 | DISTRIBUȚIE ENERGIE OLTENIA SA | 3 | 44.0 |
|
||||
|
||||
OMV Petrom: **408 mil RON ajutor de stat** + **2.07 mld RON contracte SEAP** + datorii zero la stat → cazul perfect de "stat sponsor & client".
|
||||
|
||||
AAAS overlap: **0 firme energetice listate ca având datorii AAAS** — sectorul e curat din această perspectivă. Fonduri UE overlap: **1 firmă** (Poszet SRL, irelevantă).
|
||||
|
||||
### E. Geografie
|
||||
|
||||
| Județ | Firme | Mil RON |
|
||||
|----------------------|------:|--------:|
|
||||
| BUCUREȘTI | 34 | 3060.7 |
|
||||
| BUZĂU | 2 | 916.9 |
|
||||
| CLUJ | 7 | 615.5 |
|
||||
| MUREȘ | 5 | 407.3 |
|
||||
| DOLJ | 3 | 137.6 |
|
||||
| SIBIU | 2 | 110.5 |
|
||||
|
||||
**Anomalie:** BUZĂU cu 2 firme dar 916 mil RON → e Cameco Corporation listată cu adresă în Buzău + un alt outlier punctual. Distribuția e dominantă-București (3.06 mld) — dar mult mai puțin extremă decât telecom/financiar (subtotal ~52% din total identificat geografic).
|
||||
|
||||
### F. Caz emblematic — ELECTRICA FURNIZARE SA (CUI 28909028)
|
||||
|
||||
- **Profil:** Filiala de furnizare a SPEEH Electrica SA (CUI 13267213); operatorul istoric pentru sucursalele Muntenia Nord, Transilvania Nord, Transilvania Sud (deși aceste filiale erau radiate/fuzionate)
|
||||
- **SEAP cumulat:** 493 contracte, 287.9 mil RON, 237 autorități contractante distincte
|
||||
- **Status ANRE:** licență Furnizare electricitate "Inregistrare Dosar" + "Expirata" + "Inregistrare Dosar" pentru atestate — TOATE licențele active oficial sunt expirat sau în curs de re-procesare
|
||||
- **Top achizitori:** ROMGAZ Depogaz (74.2 mil), MAI (56.8 mil), Camera Deputaților (29.1 mil), UM 0929 (21.6 mil)
|
||||
- **Profil URL:** `/achizitii/firma/28909028`
|
||||
|
||||
**De ce contează:** chiar dacă reînnoirea e o procedură legitimă, faptul că **376 contracte (255.9 mil RON) au fost semnate fără ca licența să fie acordată activ în baza datelor publice ANRE** sugerează un decalaj între ce acceptă SEAP și ce confirmă registrul ANRE. Soluție: SEAP ar trebui să cere CUI-ul titularului licenței și să-l valideze cross în timp real.
|
||||
|
||||
---
|
||||
|
||||
## SECTOR 2: TELECOMUNICAȚII (CPV 32 + 64)
|
||||
|
||||
### Domeniu de scop
|
||||
- **CPV 32**: echipament de rețea (32.4), echipamente IT/A/V (32.5)
|
||||
- **CPV 64**: servicii poștale (64.1), telecomunicații (64.2)
|
||||
- **Regulator:** ANCOM (518 operatori autorizați, toți cu CUI; 2.536 drepturi distribuite — rețea + serviciu)
|
||||
- **Top 10 ANCOM după drepturi:** Digi Romania, Orange, Vodafone, Digital Cable Systems + ISP-uri locale
|
||||
|
||||
### A. Concentrare de piață
|
||||
|
||||
| Indicator | Valoare |
|
||||
|--------------------------------------|--------:|
|
||||
| Total RON SEAP (CPV 32+64) | 7.37 mld |
|
||||
| Furnizori distincți | 1.874 |
|
||||
| HHI | 661 |
|
||||
| Top 5 cotă cumulată | 46.0 % |
|
||||
| Top 10 cotă cumulată | 63.2 % |
|
||||
|
||||
**Top 10 furnizori telecom după valoare SEAP:**
|
||||
|
||||
| CUI | Furnizor | Contracte | Mil RON |
|
||||
|-----------|-------------------------------------------------------|----------:|--------:|
|
||||
| 34770594 | METAMINDS S.A. | 16 | 1255.0 |
|
||||
| 427410 | POȘTA ROMÂNĂ RA | 1.812 | 1008.7 |
|
||||
| 5573351 | CENTRUL PT. SERVICII DE RADIOCOMUNICATII SRL | 26 | 465.4 |
|
||||
| 10881986 | SOCIETATEA NAȚIONALĂ DE RADIOCOMUNICAȚII SA | 13 | 354.2 |
|
||||
| 31340215 | STARC4SYS SRL | 10 | 309.4 |
|
||||
| 11973883 | DENDRIO SOLUTIONS S.R.L. | 17 | 304.0 |
|
||||
| 38114908 | ARCTIC STREAM S.A. | 26 | 271.7 |
|
||||
| 9010105 | ORANGE ROMANIA SA | 684 | 259.8 |
|
||||
| 10363046 | DATANET SYSTEMS SRL | 33 | 220.4 |
|
||||
| 3804492 | ADISAM TELECOM SA | 6 | 208.3 |
|
||||
|
||||
**Comentariu:** Operatorii clasici (Orange, Vodafone, Telekom) ocupă doar locurile 8-13, fiecare cu 180-260 mil RON. Liderii sunt **integratori IT/TIC** (METAMINDS, STARC4SYS, DENDRIO, ARCTIC STREAM) — care nu sunt operatori telecom în sensul strict ANCOM, dar livrează echipament de rețea pentru sectorul public.
|
||||
|
||||
### B. Decalaj autorizare-vs-contractare (CPV 64.1+64.2 strict)
|
||||
|
||||
Restrâns la servicii poștale și telecomunicații (excludem echipament 32):
|
||||
|
||||
| Categorie | Furnizori distincți | Mil RON |
|
||||
|--------------------------|--------------------:|---------:|
|
||||
| Cu autorizație ANCOM | 39 | 1.061 |
|
||||
| Fără autorizație ANCOM | 274 | 1.196 |
|
||||
| **Total** | **313** | **2.257**|
|
||||
|
||||
**Top fără autorizație ANCOM (CPV 64):**
|
||||
|
||||
| CUI | Furnizor | Mil RON | Comentariu |
|
||||
|--------|-------------------------------------|--------:|------------|
|
||||
| 427410 | POȘTA ROMÂNĂ RA | 1008.7 | Operator poștal, regulator separat (nu ANCOM) — fals pozitiv |
|
||||
| 427320 | TELEKOM ROMANIA COMMUNICATIONS SA | 74.5 | Achiziționată de Orange 2021, nu mai apare ca operator ANCOM |
|
||||
| 9030790 | INFORM LYKOS S.A. | 36.3 | CAEN imprimerie/curier, nu telecom |
|
||||
| 28646126 | PINK POST SOLUTIONS S.R.L. | 10.8 | Curierat — sub regulator separat |
|
||||
|
||||
Practic, decalajul real nu e atât autorizare-vs-contractare, cât **lipsă de granularitate a CPV**. SEAP CPV 64.1 amestecă servicii poștale (regulator MTC) cu servicii de curierat (regulator MTC), iar CPV 64.2 e strict ANCOM.
|
||||
|
||||
### C. Mortalitate regulatorie
|
||||
|
||||
În baza ancom.operatori, status='autorizat' este singurul status (toate cele 518 sunt active — datasetul nu păstrează istoric pentru retragerile ANCOM). Singurul caz vizibil de "operator dispărut": Telekom Romania Communications (CUI 427320). Setul de date trebuie îmbogățit cu istoricul retragerilor de autorizație (raport feature pentru G3).
|
||||
|
||||
### D. Co-finanțare cross-source
|
||||
|
||||
Top beneficiari SEAP-telecom care au primit ajutoare de stat (regas):
|
||||
|
||||
| CUI | Beneficiar | Nr. | Mil RON ajutor |
|
||||
|----------|---------------------------------------------|----:|---------------:|
|
||||
| 4021138 | INES GROUP SRL | 4 | 92.3 |
|
||||
| 4785178 | INTERSAT SRL | 2 | 92.3 |
|
||||
| 26361386 | THREE PHARM S.R.L. | 15 | 45.1 |
|
||||
| 39230536 | BANAT NETWORK INTEGRATED COMMUNICATIONS SRL | 1 | 45.0 |
|
||||
| 23327045 | AUDIT IT&C S.R.L. | 7 | 32.8 |
|
||||
| 28239696 | SAFETECH INNOVATIONS SA | 13 | 30.0 |
|
||||
|
||||
AAAS overlap: **0 firme telecom** în AAAS. Sectorul nu are nicio companie cu participare de stat în datasetul AAAS scrape-uit.
|
||||
|
||||
### E. Geografie — TELECOM
|
||||
|
||||
| Județ | Firme | Mil RON |
|
||||
|----------------------|------:|--------:|
|
||||
| MUNICIPIUL BUCUREȘTI | 487 | 6924.5 |
|
||||
| BOTOȘANI | 18 | 78.2 |
|
||||
| TIMIȘ | 86 | 58.8 |
|
||||
| ILFOV | 71 | 54.6 |
|
||||
| PRAHOVA | 50 | 40.4 |
|
||||
| IAȘI | 60 | 30.1 |
|
||||
| CLUJ | 115 | 27.0 |
|
||||
| BIHOR | 42 | 21.3 |
|
||||
|
||||
**Concentrația București = 94% din valoare** (6.92 mld din 7.37 mld) — extremă chiar și pentru standardele românești. Cluj cu 115 firme (4× mai multe ca Botoșani) generează 3× mai puțin RON — fragmentare de IMM-uri vs. câțiva campioni mari în București.
|
||||
|
||||
### F. Caz emblematic — METAMINDS S.A. (CUI 34770594)
|
||||
|
||||
- **Profil:** S.A. înființată 2015-07-13, sediu București, CAEN principal 4650 (comerț cu echipament TIC), 46 angajați (2024)
|
||||
- **Cifra de afaceri:** 156 mil (2020), 97 mil (2021), 178 mil (2022), 243 mil (2023), 180 mil (2024) — ~150-250 mil RON anual
|
||||
- **Profit net:** 7-9 mil RON anual (marjă ~4%)
|
||||
- **NU este autorizat ANCOM** — nu apare în registrul operatorilor
|
||||
- **SEAP cumulat:** 16 contracte, **1.255 mld RON** — 17% din întreaga piață telecom SEAP
|
||||
- **Top contract:** 11 feb. 2026, contract STS pentru "Cloud Guvernamental" — **835 mil RON** = 4.6× cifra anuală
|
||||
- **Alte contracte STS:** 103.8 mil (jan. 2026), 103.8 mil (jan. 2026), 103.8 mil (nov. 2023) — pattern de acord-cadru WAN
|
||||
- **Profil URL:** `/achizitii/firma/34770594`
|
||||
|
||||
**De ce contează:** o firmă cu 46 de angajați și 180 mil cifră de afaceri câștigă, fără concurență vizibilă, **un contract de 835 mil RON pentru Cloud Guvernamental** — cu 4.6× cifra ei anuală. Statul român devine, prin acest singur contract, principalul ei client istoric. STS (Serviciul de Telecomunicații Speciale) e principal client al METAMINDS din 2019. Vechea relație + dimensiunea contractului ridică întrebări legitime: cine e capacitatea reală de execuție, există subcontractare, cum face o firmă cu 46 angajați un proiect Cloud guvernamental?
|
||||
|
||||
---
|
||||
|
||||
## SECTOR 3: FINANCIAR (CPV 66)
|
||||
|
||||
### Domeniu de scop
|
||||
- **CPV 66.1**: servicii bancare (66.110)
|
||||
- **CPV 66.5**: asigurări (66.510 - 66.518)
|
||||
- **Regulator:** ASF pentru asigurări (849 entități, din care 269 active după curățarea sufix-ului `/data` din CUI)
|
||||
- **Date murdare ASF:** ~30% din CUI-uri sunt stocate ca `<CUI>/<data_inmatriculare>` — recomandare la G3: normalizare în pipeline
|
||||
|
||||
### A. Concentrare de piață
|
||||
|
||||
| Indicator | Valoare |
|
||||
|------------------------------------|--------:|
|
||||
| Total RON SEAP (CPV 66) | 2.24 mld |
|
||||
| Furnizori distincți | 325 |
|
||||
| HHI | 1.029 |
|
||||
| Top 5 cotă cumulată | 60.5 % |
|
||||
| Top 10 cotă cumulată | 77.6 % |
|
||||
|
||||
**Top 10 furnizori financiar:**
|
||||
|
||||
| CUI | Furnizor | Contracte | Mil RON |
|
||||
|----------|---------------------------------------------------------|----------:|--------:|
|
||||
| 361757 | BANCA COMERCIALA ROMANA SA | 70 | 452.3 |
|
||||
| 14360018 | OMNIASIG VIENNA INSURANCE GROUP S.A. | 1.120 | 400.1 |
|
||||
| 336290 | ASIROM VIENNA INSURANCE GROUP SA | 519 | 283.0 |
|
||||
| 361536 | UNICREDIT BANK S.A. | 5 | 110.3 |
|
||||
| 361579 | BRD - GROUPE SOCIETE GENERALE SA | 12 | 106.0 |
|
||||
| 6291812 | GROUPAMA ASIGURARI SA | 343 | 105.2 |
|
||||
| 14785760 | FAST BROKERS S.R.L. | 125 | 81.8 |
|
||||
| 10801286 | ASITO KAPITAL SA | 9 | 69.4 |
|
||||
| 361897 | CEC BANK SA | 61 | 64.7 |
|
||||
| 5022670 | BANCA TRANSILVANIA SA | 175 | 61.3 |
|
||||
|
||||
**Comentariu:** **Cea mai concentrată piață analizată.** HHI=1029 e la pragul "moderat concentrată" (>1000) iar top 3 (BCR + Omniasig + Asirom) controlează 51%. Băncile ocupă locurile 1, 4, 5, 9, 10; asigurătorii Vienna Insurance Group ocupă 2 + 3 (Omniasig + Asirom, ambele filiale VIG → de facto un singur grup cu 30.6% piață).
|
||||
|
||||
### B. Decalaj autorizare-vs-contractare (CPV 66.5 - asigurări)
|
||||
|
||||
După curățarea sufixului `/data`:
|
||||
|
||||
| Categorie | Furnizori | Mil RON |
|
||||
|-------------------------|----------:|--------:|
|
||||
| Cu autorizație ASF activă | 117 | 1.060 |
|
||||
| Fără autorizație ASF activă | 122 | 151 |
|
||||
| **Total** | **239** | **1.211** |
|
||||
|
||||
**Top "fără autorizație" (după curățare CUI):**
|
||||
|
||||
| CUI | Furnizor | Mil RON | Comentariu |
|
||||
|----------|-------------------------------------------|--------:|------------|
|
||||
| 14785760 | FAST BROKERS S.R.L. | 81.8 | Autorizație retrasă 30.04.2024 (sancțiune) |
|
||||
| 17206294 | (necunoscut) | 34.1 | CUI orfan în SEAP, fără name/firms.entities |
|
||||
| 4720429 | Nuclear Risk Insurance Ltd | 11.3 | Asigurător străin — neaplicabil ASF RO |
|
||||
| 211924 | HDI Global Specialty SE | 9.8 | Asigurător german — neaplicabil ASF RO |
|
||||
|
||||
Decalajul real, după curățare, e **mic în valoare** (~150 mil RON din 1.21 mld) și concentrat pe asigurători străini (legitim) și brokerul retras. Sectorul e relativ "curat" comparativ cu energie/telecom.
|
||||
|
||||
### C. Mortalitate regulatorie
|
||||
|
||||
Asigurători/brokeri cu autorizație ASF retrasă, încă cu contracte SEAP:
|
||||
|
||||
| CUI clean | Nume | Data radiere | Contracte | Mil RON |
|
||||
|-----------|--------------------------------------------------------|--------------|----------:|--------:|
|
||||
| 14785760 | FAST BROKERS - BROKER DE ASIGURARE | 2024-04-30 | 125 | 81.8 |
|
||||
| 18892336 | ALLIANZ-TIRIAC UNIT ASIGURARI (fost Gothaer) | 2025-12-31 | 11 | 0.2 |
|
||||
| 12408250 | CERTASIG | 2020-02-20 | 1 | 0.1 |
|
||||
| 25906272 | TITAN BROKER (insolvență) | 2026-05-06 | 9 | 0.0 |
|
||||
| 4134668 | GENERALI ASIGURARI (radiată) | 2011-09-01 | 3 | 0.0 |
|
||||
| 5328123 | EUROINS ROMANIA (autorizație retrasă) | 2023-03-17 | 2 | 0.0 |
|
||||
|
||||
**EUROINS** — caz cunoscut public: a fost al doilea cel mai mare asigurător RCA din România până la insolvabilizarea oficială pe 17 martie 2023. În baza noastră, doar 2 contracte SEAP totale (0.0 mil RON, sub-rotunjire). Datasetul SEAP poate avea limitarea că EUROINS apărea sub diferite denumiri sau că majoritatea afacerii era B2C, nu B2G.
|
||||
|
||||
### D. Co-finanțare cross-source
|
||||
|
||||
Top beneficiari SEAP-financiar care au primit regas (ajutoare de stat):
|
||||
|
||||
| CUI | Beneficiar | Nr. ajutoare | Mil RON ajutor |
|
||||
|----------|-------------------------------------|-------------:|---------------:|
|
||||
| 10933694 | B.N. BUSINESS SRL | 6 | 13.8 |
|
||||
| 9482566 | DANCO PRO COMMUNICATION S.R.L. | 8 | 13.7 |
|
||||
| 28647300 | INTER BROKER DE ASIGURARE SRL | 10 | 12.4 |
|
||||
| 17929585 | SCALA ASSISTANCE SRL | 8 | 12.1 |
|
||||
| 17926970 | TRAVEL TIME D&R SRL | 6 | 9.8 |
|
||||
|
||||
Aceștia nu sunt asigurători-mari, ci brokeri sau firme adiacente — nu există overlap mare cu top-10 sectorului. AAAS overlap: **0 firme financiare** listate în AAAS.
|
||||
|
||||
### E. Geografie
|
||||
|
||||
| Județ | Firme | Mil RON |
|
||||
|----------------------|------:|--------:|
|
||||
| MUNICIPIUL BUCUREȘTI | 160 | 2076.8 |
|
||||
| CLUJ | 13 | 69.3 |
|
||||
| NECUNOSCUT | 17 | 37.3 |
|
||||
| PRAHOVA | 5 | 32.1 |
|
||||
| CONSTANȚA | 7 | 4.0 |
|
||||
| BIHOR | 4 | 3.5 |
|
||||
|
||||
**93% concentrare București** (2.08 mld din 2.24 mld) — comparabilă cu telecom. Băncile mari și asigurătorii sunt cu sediu istoric în capitală; doar Banca Transilvania (Cluj) sparge tendința.
|
||||
|
||||
### F. Caz emblematic — FAST BROKERS S.R.L. (CUI 14785760)
|
||||
|
||||
- **Profil ONRC:** S.R.L. înființată 2002-07-31, sediu București, CAEN 6820 (real estate — schimbat post-radiere)
|
||||
- **Profil ASF:** broker de asigurare-reasigurare, autorizație **retrasă prin sancțiune** la 30.04.2024 (Monitorul Oficial 403/30.04.2024)
|
||||
- **SEAP cumulat:** 125 contracte, 81.8 mil RON, perioadă 2017-01-30 → 2023-11-15
|
||||
- **Status ANAF:** încă activ (is_active_anaf = true)
|
||||
- **Profil URL:** `/achizitii/firma/14785760`
|
||||
|
||||
**De ce contează:** firma a încasat 81.8 mil RON din contracte publice ca broker de asigurare în 7 ani (2017-2023), apoi i s-a retras autorizația prin sancțiune ASF în 2024. Faptul că **CAEN principal a fost schimbat post-radiere de la 6622 (broker de asigurare) la 6820 (imobiliare)** sugerează o "viață a doua" a firmei — pattern interesant pentru cercetare ulterioară: câte firme sancționate de ASF își rebrand-uiesc CAEN-ul pentru a continua operarea?
|
||||
|
||||
---
|
||||
|
||||
## Meta-observații cross-sector
|
||||
|
||||
### Ce au în comun cele 3 sectoare?
|
||||
|
||||
1. **Concentrația București este patologică.** Energie 53%, telecom 94%, financiar 93% din valoare merg la firme cu sediu în Capitală. Pentru sectoare reglementate (unde licența e centralizată), e firesc; pentru achiziții descentralizate (primării, spitale județene), e o anomalie. Spitalele din Bistrița contractează gaze cu un furnizor din București nu pentru că nu există furnizori locali, ci pentru că **piața de furnizare națională s-a centralizat la nivel de holding**.
|
||||
|
||||
2. **Datele de regulator sunt parțiale.** ANRE păstrează doar starea curentă a licenței (nu istoric date schimbare); ANCOM are doar status='autorizat' (nu radiere); ASF stochează CUI cu sufixe `/data` care strică JOIN-urile. Toate cele 3 registre **îngreunează auditul cross-source** — un pas major pentru G3 e normalizarea acestor date la pipeline level.
|
||||
|
||||
3. **Decalajul autorizare-vs-contractare e umflat de probleme de date.** După curățare normală (`RO ` prefix, `/data` suffix), 70-90% din "decalajul" inițial dispare. Restul **decalajului real** (Electrica Furnizare, PPC Energie Muntenia, FAST BROKERS post-radiere) rămân cazuri legitim de investigat.
|
||||
|
||||
### Ce e diferit?
|
||||
|
||||
| Dimensiune | Energie | Telecom | Financiar |
|
||||
|------------------------------|---------:|---------:|----------:|
|
||||
| Total piață SEAP | 17.4 mld | 7.4 mld | 2.2 mld |
|
||||
| HHI (concentrare) | 814 | 661 | 1.029 |
|
||||
| Top 5 cotă | 54.9 % | 46.0 % | 60.5 % |
|
||||
| Concentrare BUC | 53 % | 94 % | 93 % |
|
||||
| Furnizori distincți | 1473 | 1874 | 325 |
|
||||
| Decalaj cu licență (post-clean) | mediu | scăzut | foarte scăzut |
|
||||
| AAAS overlap | 0 | 0 | 0 |
|
||||
|
||||
**Energia** e cea mai mare piață, dar are cea mai diversă bază de furnizori și concentrare BUC mai redusă. **Telecom** are cei mai mulți furnizori (1.874 distincți) dar cea mai extremă concentrare BUC. **Financiar** e cel mai mic dar cel mai concentrat — 3 instituții (BCR, Omniasig, Asirom) țin 51% din piață.
|
||||
|
||||
### Pattern revelat asupra cheltuielii publice românești
|
||||
|
||||
1. **Statul cumpără utilități prin câteva mari intermediari, nu direct.** Furnizorii dominanți în energie nu sunt producători (Hidroelectrica, Romgaz) ci comercianți (Tinmar, Nova/Amgaz, E.ON Furnizare) — pattern similar pentru telecom (METAMINDS, STARC4SYS, integratori) și financiar (BCR, OMNIASIG ca brokeri implicit pentru flotele publice).
|
||||
|
||||
2. **Reglementarea sectorială este de facto absentă din proces.** Niciun set de date nu indică o verificare automată a autorizării ANRE/ANCOM/ASF la momentul atribuirii. SEAP nu cere CUI-ul titularului licenței.
|
||||
|
||||
3. **AAAS este orb la sectoare complete.** Zero overlap între AAAS și cele 3 sectoare studiate sugerează că AAAS gestionează în principal datorii ale firmelor falimentate post-1990, nu datorii curente ale operatorilor de utilități / telecom / financiare. Lipsește un dataset al **datoriilor curente la stat în sectoarele reglementate**.
|
||||
|
||||
---
|
||||
|
||||
## Idei de rețete (recipe) pentru `/achizitii/retete`
|
||||
|
||||
1. **Furnizori energie fără licență ANRE activă, post 2024**
|
||||
- Listează furnizori CPV 09310/09123/6531 cu contracte 2024+ și 0 licențe ANRE active
|
||||
- SQL: `WITH lic AS (...) JOIN seap.announcements ... WHERE lic.n_active = 0 AND publication_date >= '2024-01-01'`
|
||||
|
||||
2. **Asigurători cu autorizație ASF retrasă, contracte SEAP în ultimii 12 luni**
|
||||
- JOIN asf.entitati (section_status='radiat') × seap.announcements > data_radiere
|
||||
- Util pentru CNAS, CASA OPSNAJ, primării — cumpără cu firme nelicențiate
|
||||
|
||||
3. **Top furnizori publici cu cifră de afaceri raport contract > 3×**
|
||||
- Identificare riscuri capacitate executie: contract SEAP > 3× cifra anuală a firmei
|
||||
- Filtru pe firms.financials × seap.announcements
|
||||
|
||||
4. **Concentrare oligopolistică pe CPV 2-digit**
|
||||
- HHI per CPV2 + top 3 cotă → grafic de bară: ce CPV-uri sunt monopolizate
|
||||
- Derivat din mv_top_cpv_divisions
|
||||
|
||||
5. **Geo-anomalii: județe cu spending public mare disproporționat cu populația**
|
||||
- Firme cu sediu în județul X cu valoare contracte / populație județ
|
||||
- JOIN firms.entities × mv_county_totals × siruta populație
|
||||
|
||||
---
|
||||
|
||||
*Documente sursă pentru cross-check ulterior: `chatGPT/journalism/killer-findings-2026-05-10.md`. Toate cifrele extrase la 2026-05-10 din baza de date locală — refresh periodic prin pipeline-uri scrape.*
|
||||
@@ -0,0 +1,151 @@
|
||||
# Session 2026-05-11 — vreaudigital.ro
|
||||
|
||||
Sesiune extinsă post-Phase 5 UI merge. Pornit ca tick autonom, evoluat în 15 cicluri productive consecutive. Sha live la final: **`7ca4aa4`** (49 recipe, 17 systemd timers, 100% geocoding).
|
||||
|
||||
## Cronologie cicluri
|
||||
|
||||
| Tick | Focus | Commit-uri | Highlights |
|
||||
|---:|---|---|---|
|
||||
| Phase 5 (pre-tick) | G1-G2-G3-G4-G5 sub-agenți | 8 commits | 6 helper functions, 7 firma badges, 5 sections, 6 recipes, 3 investigative reports |
|
||||
| Phase 5 merge | UI integration + commit cleanup | 2 commits | `57af3a6` + `c1d90bf` |
|
||||
| Tick #1 | A1-A2-A3 sub-agenți (fixes/geocoding/completions) + A4-A5 (browse UIs) + S1 (refresh strategy) | 6 commits | Geocoding 91→100%, ASF cleanup, ANRE electricieni 0→73K, 2 new browse pages |
|
||||
| Tick #1.5 | Disk cleanup + heartbeat monitoring | 4 commits | 89%→45% disk, heartbeat.sh + systemd timer (20 sources daily 07:00) |
|
||||
| Tick #2 | 11 systemd timer pairs | 1 commit | Weekly + monthly timers for all scrape-*.sh wrappers |
|
||||
| Tick #3 | Autoritate profile badges | 1 commit | 5 cross-source badges + getBugetarStatus helper |
|
||||
| Tick #4 | Autoritate profile sections | 1 commit | 4 sections (ANAF/CNSC/Curtea Conturi/RegAS) — parity cu firma |
|
||||
| Tick #5 | Bugetar UAT pattern match | 1 commit | +961 matches (58.3% → 63.4%), strip-parens insight |
|
||||
| Tick #6 | Curteacont CUI backfill | 1 commit | 0% → 64.4% (+730 matches), prefix-bug data fix |
|
||||
| Tick #7 | CNSC authority CUI backfill | 1 commit | 42% → 77.5% (+10,328 matches) — biggest single backfill |
|
||||
| Tick #8 | SEAP DA wrapper + timer (was missing!) | 1 commit | Daily 02:30, 4h timeout for ~7-month catch-up |
|
||||
| Tick #9 | Firma bugetar badge + recipe refactor | 2 commits | autoritati-audited-repetitiv: 5s → <500ms |
|
||||
| Tick #10 | Recipe dubla-alerta-cdc-cnsc | 1 commit | 50 entități, MUNICIPIUL CONSTANTA top (93 semnale) |
|
||||
| Tick #11 | Recipe donatori-datornici (moral hazard) | 1 commit | 360 firme — B&B BUSINESS 1:28,184 ratio |
|
||||
| Tick #12 | Recipe energie-anre-datornici | 1 commit | 875 operatori — 3.14 mld RON debt agregat |
|
||||
| Tick #13 | Red-flags landing 6→13 cards + 3 KPI tiles | 1 commit | Surfacing for the new investigative recipes |
|
||||
| Tick #14 | Recipe donatori-contestatori (politic leverage) | 1 commit | 185 firme — SHERIFF GUARD 62 contestații vs 27K donatie |
|
||||
| Tick #15 | Audit + this doc | 1 commit | System health verified, summary written |
|
||||
|
||||
## Date statistici finale
|
||||
|
||||
### CUI matching coverage
|
||||
| Sursă | Pre-sesiune | Post-sesiune | Delta |
|
||||
|---|---:|---:|---:|
|
||||
| firms.entities geocoding | 91.3% | **100.00%** | +346,675 |
|
||||
| ASF CUI clean | 51% | **100%** | +412 cleaned |
|
||||
| cnsc.decizii authority | 42% | **77.5%** | +10,328 |
|
||||
| curteacont.rapoarte | 0% | **64.4%** | +730 |
|
||||
| bugetar.entitate | 58.3% | **63.4%** | +961 |
|
||||
| cnas.furnizori | 0% | 9% | +3,255 (dirty data residue) |
|
||||
|
||||
### Total date publice agregate
|
||||
17 schemas integrate cross-source via CUI hub (firms.entities = 3.99M):
|
||||
- **~17.9M rânduri** date publice unice (per G3 audit)
|
||||
- **75 contracte SEAP** active acum vs 8 luni stale înainte (DA pipeline)
|
||||
- **49 recipe** pe /achizitii/retete (era 39 la start)
|
||||
- **23 gotcha** documentate în memory
|
||||
|
||||
## Recipes shipped (Phase 5 + autonomous run)
|
||||
|
||||
| Slug | Source pair | Yield | Tier |
|
||||
|---|---|---:|---|
|
||||
| `energie-fara-licenta` | SEAP ∖ ANRE | red-flags | T3 |
|
||||
| `telco-fara-licenta` | SEAP ∖ ANCOM | red-flags | T3 |
|
||||
| `autoritati-contestate-cnsc` | CNSC × SEAP | 4,192 autorities | T2 |
|
||||
| `asiguratori-furnizori-stat` | ASF × SEAP | 63 firms | T4 |
|
||||
| `stat-actionar-seap` | AAAS × SEAP | red-flags | T3 |
|
||||
| `autoritati-audited-repetitiv` | Curtea × SEAP | red-flags | T4 |
|
||||
| `autoritati-dubla-alerta-cdc-cnsc` | Curtea × CNSC | **50** | T2 |
|
||||
| `donatori-politici-care-datoreaza-statului` | AEP × ANAF | **360** | T2 |
|
||||
| `energie-licentiati-anre-datornici-anaf` | ANRE × ANAF | **875** | T2 |
|
||||
| `donatori-politici-care-contesta-la-cnsc` | AEP × CNSC | **185** | T2 |
|
||||
|
||||
## Top killer findings (jurnalistic-ready)
|
||||
|
||||
1. **B&B BUSINESS SOLUTIONS** — 10K RON donat la partide vs **281.8 mil RON datorat ANAF** (ratio 1:28,184)
|
||||
2. **HIDROELECTRICA** — 214M datorie ANAF + 4 licențe ANRE active (stat-stat circular)
|
||||
3. **MUNICIPIUL CONSTANTA** — 3 audituri Curtea Conturi + 90 contestații CNSC = 93 semnale convergente
|
||||
4. **SHERIFF GUARD PROTECTION** — 62 contestații CNSC vs 27K donatie (folosește calea juridică ca instrument principal)
|
||||
5. **VICTOR CONSTRUCT** — 670K donatie + 23 contestații + activ pe SEAP (combinație politico-juridica)
|
||||
|
||||
## Infrastructure delivered
|
||||
|
||||
### 17 systemd timers active
|
||||
|
||||
| Cadence | Timer | Next fire |
|
||||
|---|---|---|
|
||||
| Daily 02:00 | anaf-daily | Tue 02:02 |
|
||||
| Daily 02:30 | **da (NEW)** | Tue 02:32 |
|
||||
| Daily 04:00 | mvs | Tue 04:04 |
|
||||
| Daily 07:00 | **heartbeat (NEW)** | Tue 07:02 |
|
||||
| Weekly Sun 01:00 | anre | Sun 01:06 |
|
||||
| Weekly Mon 01:00 | ancom | Mon 01:00 |
|
||||
| Weekly Tue 01:00 | asf | Tue 01:07 |
|
||||
| Weekly Wed 01:00 | aaas | Wed 01:05 |
|
||||
| Weekly Thu 01:00 | curteacont | Thu 01:06 |
|
||||
| Weekly Fri 01:00 | gnm | Fri 01:00 |
|
||||
| Weekly Sat 01:00 | cnsc | Sat 01:03 |
|
||||
| Weekly Tue 03:00 | onrc-weekly | Tue 03:03 |
|
||||
| Monthly 1st 03:00 | regas | Jun 1 03:06 |
|
||||
| Monthly 1st 03:30 | aep-donatii | Jun 1 03:30 |
|
||||
| Monthly 1st 05:00 | cnas | Jun 1 05:06 |
|
||||
| Monthly 15th 03:00 | apia-fermieri | May 15 03:02 |
|
||||
|
||||
### Heartbeat monitoring
|
||||
- Probes 20 sources, posts to n8n satra-backup-alert webhook when STALE
|
||||
- Currently 19/20 OK, 1 STALE: ani.declaratii (known unimplemented)
|
||||
|
||||
### Disk
|
||||
- 89% → 45% (156 GB freed via `docker builder prune -a -f` + `docker image prune -a -f`)
|
||||
|
||||
## Documents written
|
||||
|
||||
| Path | Author | Purpose |
|
||||
|---|---|---|
|
||||
| `chatGPT/data-quality/freshness-audit-2026-05-10.md` | G3 sub-agent | 17.9M row reconciliation + per-schema cadence |
|
||||
| `chatGPT/data-quality/geocoding-strategy-2026-05-11.md` | A2 sub-agent | Fallback chain documentation |
|
||||
| `chatGPT/data-quality/refresh-cadence-strategy-2026-05-11.md` | S1 sub-agent | Master cron schedule + 2captcha budget |
|
||||
| `chatGPT/journalism/killer-findings-2026-05-10.md` | G4 sub-agent | 5 lead findings + 7 storylines |
|
||||
| `chatGPT/journalism/sectorial-deep-dive-2026-05-10.md` | G5 sub-agent | ENERGIE/TELECOM/FINANCIAR analysis |
|
||||
| `services/seap-scraper/HANDOFF-aaas-ordin-278.md` | A3 sub-agent | AAAS PDF backfill plan |
|
||||
| `services/seap-scraper/HANDOFF-asf-other-registers.md` | A3 sub-agent | ASF pension/AIFM/UCITS plan |
|
||||
| `services/seap-scraper/HANDOFF-cnas-layout-b.md` | A3 sub-agent | CNAS 9 PDFs layout-B parser plan |
|
||||
| `services/seap-scraper/systemd/README.md` | tick #2 | Systemd unit install procedure |
|
||||
| **This doc** | tick #15 | Session retrospective |
|
||||
|
||||
## Reusable patterns discovered
|
||||
|
||||
### 1. Strip-parens + UAT-pattern (3-source proven)
|
||||
ONRC stores comune/orașe with " (Primaria Y)" suffix. Stripping suffix and comparing normalized → exact match. Used for:
|
||||
- bugetar (sql/039) → +961 matches în 1m 46s
|
||||
- curteacont (sql/040 + 041) → +730 matches în <2 min
|
||||
- cnsc (sql/042) → +10,328 matches în 1m 25s
|
||||
|
||||
### 2. Sub-agent isolation via dedicated helper files
|
||||
G1 + G2 wrote separate `profile-queries-utilities.ts` + `profile-queries-financial.ts` to avoid merge conflicts. Pattern reusable for any parallel codegen task.
|
||||
|
||||
### 3. Cross-source RATIO mismatches surface real signal
|
||||
- B&B: 10K donation vs 281M debt → 1:28,184 ratio = lever-amount mismatch
|
||||
- SHERIFF GUARD: 27K donation vs 62 contestations → cheap-donation-buys-aggressive-juridical-strategy
|
||||
|
||||
Single-source counts are explained away by "volume mare". Cross-source ratios force a specific narrative.
|
||||
|
||||
## Known limitations / next-session candidates
|
||||
|
||||
### Critical (DR/observability)
|
||||
- DB backup runs from root's crontab (NOT bulibasa's) — confirmed working but undocumented elsewhere
|
||||
- Heartbeat hits n8n webhook but n8n routing for `service:"data-heartbeat"` field not verified — first alert email needs validation
|
||||
|
||||
### High-impact (3-15h each)
|
||||
- CNSC Stage 2 PDF parse → decision_type (admis/respins) — unlocks killer recipe "autorități cu rată mare contestații pierdute"
|
||||
- Curtea Conturi Stage 2 → findings_count + key amounts per audit
|
||||
- CNAS layout-B parser (9 remaining PDFs)
|
||||
- ASF pension funds + AIFM + UCITS register ingest
|
||||
|
||||
### Medium-effort (4-8h)
|
||||
- TED full re-import (publication-date backfill — fix shipped tick #1)
|
||||
- normalize_company_name v2 for orthography (Cârlogani ↔ Cirlogani)
|
||||
- ANRE 92.3% residue (commercial firms — need different match strategy)
|
||||
|
||||
### Speculative
|
||||
- 2captcha integration (~$60-100 one-shot for Bugetar Faza 2 + ANAF datornici quarterly refresh)
|
||||
- ANI parser MVP (1.3M PDFs, 15-day effort)
|
||||
@@ -0,0 +1,375 @@
|
||||
# GovTech Commons Portal: Deep Research Blueprint for an Open-Source, Citizen-Friendly GovTech Aggregator
|
||||
|
||||
**File A (full research report):** `full-research-report.md`
|
||||
**Assumed date:** 2026-04-06 (Europe/Bucharest)
|
||||
**Scope:** EU/Romania-first; no vendor lock‑in (explicitly avoided unless unspecified); currency unspecified.
|
||||
|
||||
## Executive summary
|
||||
|
||||
I propose an open-source “GovTech Commons Portal” that combines a public-sector software catalog (metadata-driven, reuse-first) with citizen-legible **one-click runnable demos** hosted in a hardened sandbox. The core hypothesis is that “adoption friction” is currently higher than “innovation friction”: prototypes exist, but they are hard to *discover, trust, and pilot* in public administration contexts. This aligns with the EU expansion of reuse infrastructure (the EU Open Source Solutions Catalogue is a centralized discovery layer and was launched in 2025, initially with hundreds of solutions and a plan to include more repositories and building blocks). citeturn2search2turn2search9
|
||||
|
||||
The strategy is to treat **publiccode.yml** as the baseline contract for discoverability and reuse (it is explicitly designed for public administration software discovery and reuse, and it has operational precedents across Europe, including national catalog crawling patterns). citeturn2search0turn2search4turn6search2 I then expand it with a strictly versioned “portal superset schema” that adds demo/runtime descriptors, security artifacts (SBOM/provenance/signatures), privacy declarations, and AI disclosure fields. citeturn2search0turn3search2turn1search2
|
||||
|
||||
The portal must ship with a **trust ladder** whose criteria are objective, auditable, and legible to non-technical citizens: **Demo-safe → Pilot-verified → Production-adopted**. Europe already provides a strong reference pattern for “badges in government catalogs” (e.g., a criteria-based badge program describing security/maintenance/reuse qualities). citeturn6search8turn6search4
|
||||
|
||||
The highest-risk part is executing untrusted demos; I therefore recommend a **WASM-first demo runner** as the default “safe mode” (WASM modules execute in a sandboxed environment and can’t escape without going through appropriate APIs), and I add a graduated path toward stronger isolation for heavier demos using microVM- or VM-backed runtimes (Firecracker, Kata, gVisor) once supply-chain controls and ops maturity are in place. citeturn9search0turn4search13turn4search6turn4search15
|
||||
|
||||
Compliance is not a bolt-on; it is a design constraint. In the EU/Romania context, the portal’s baseline obligations map to GDPR, the EU AI Act, the DSA, the Interoperable Europe Act (and its implementing rules for interoperability regulatory sandboxes), plus public-sector accessibility expectations (Directive (EU) 2016/2102 and EN 301 549). citeturn1search1turn0search5turn0search2turn0search4turn2search3turn0search3turn0search7
|
||||
|
||||
### Prioritized next ten concrete steps
|
||||
|
||||
1. I will publish the platform’s **hard safety rules** (default: no personal data; default: no outbound network; signed-only runnable artifacts; transparent takedown; vulnerability disclosure policy). citeturn1search2turn0search2turn1search1
|
||||
2. I will adopt **publiccode.yml** as the minimum metadata gate and publish a versioned “portal superset” extension schema. citeturn2search0turn2search1turn6search2
|
||||
3. I will implement ingestion as “metadata-first”: listings can exist as *Listed* before any runnable demo is allowed. citeturn2search0turn6search18
|
||||
4. I will ship a WASM-first demo runner with deterministic quotas, no PII, default-deny egress, and per-demo isolation. citeturn9search0turn9search1
|
||||
5. I will implement supply-chain controls: SBOM generation, provenance attestations (SLSA-style), and signing/verification using Sigstore. citeturn3search2turn3search12turn3search21turn1search2
|
||||
6. I will define the trust ladder (Demo-safe → Pilot-verified → Production-adopted) as criteria + evidence artifacts, patterned after public-sector badge programs. citeturn6search8turn6search4
|
||||
7. I will ship “pilot packs” (security/privacy/deployment/procurement notes) aligned with real public-administration acquisition processes that already prioritize reuse/open source (Italy provides a strong reference model). citeturn6search18turn6search2turn6search14
|
||||
8. I will launch with Romania anchor categories (SSO/identity, payments, open data discovery) using official national platforms as “reference ecosystems” for what citizens already understand. citeturn5search5turn5search7turn5search10
|
||||
9. I will implement DSA-style foundation mechanics: notice-and-action, moderation logging, and transparency reporting posture (scaled to size). citeturn0search2
|
||||
10. I will pilot with at least one public institution and publish the outcomes as reusable, evidence-backed modules to move beyond “showcase” into “adoption engine.” citeturn0search4turn2search3turn6search8
|
||||
|
||||
## Ecosystem scan
|
||||
|
||||
The reused-components ecosystem already exists, but it is fragmented between (a) catalogs that optimize for inter-administration reuse, (b) demo/sandbox systems that optimize for learning, and (c) community platforms that optimize for visibility rather than trust. Your portal’s differentiated value is **composing** these into a single, citizen-readable experience while staying compatible with EU reuse infrastructure. citeturn8search11turn2search2turn7search3
|
||||
|
||||
A credible global reference for “catalog + standards-driven eligibility” is entity["organization","Digital Public Goods Alliance","global dpg steward"]: its Digital Public Goods Standard defines what qualifies as a digital public good (open source software, open data, open AI models, open standards, open content; must adhere to privacy and other applicable laws and do no harm), and the DPG Registry emphasizes that listed goods have been reviewed against that standard and require reassessment over time. citeturn8search0turn8search5 This maps cleanly to your portal’s need for “trust tiers,” even if your scope and review process differ.
|
||||
|
||||
At EU level, entity["organization","Interoperable Europe Portal","eu interoperability platform"] positions itself as a one-stop shop for discovering, sharing, and reusing IT solutions and good practices across public administrations, businesses, and citizens. citeturn8search11turn8search7 The EU Open Source Solutions Catalogue is a concrete instantiation of that idea: it is a centralized platform to discover open-source solutions from public administrations, and its launch communications highlight scale, areas covered, and planned expansion. citeturn2search2turn2search9turn2search13
|
||||
|
||||
Europe also provides mature national patterns for metadata-driven catalogs:
|
||||
|
||||
- entity["organization","Developers Italia","public sector reuse italy"] provides reuse and publication guidance where publiccode.yml is required to populate the catalog, and the standard is intended to be usable by both developers and less technical audiences. citeturn6search18turn2search0turn2search4
|
||||
- entity["organization","openCode","german public sector oss platform"] is positioned as a public-sector open source platform and automatically imports software directory entries from publiccode.yml; it also runs a badge program that evaluates projects on criteria related to security, maintenance, and reuse. citeturn6search4turn6search0turn6search8
|
||||
- entity["organization","code.gouv.fr","french government oss unit"] supports government agencies increasing FOSS usage and publishing source code, and France maintains a government-recommended FOSS list (SILL) for public administration usage. citeturn6search9turn6search5turn6search1
|
||||
|
||||
A separate but relevant reference model is entity["organization","Foundation for Public Code","publiccode standard body"]: the Standard for Public Code frames what “good public codebases” look like (open, legible, accountable, accessible, sustainable), which is useful as governance criteria for your portal and for “Production-adopted” standards. citeturn7search0turn7search8
|
||||
|
||||
Romania has strong “anchor services” that citizens already recognize, and these anchors can be used to seed categorization and demonstrate immediate relevance. entity["organization","Autoritatea pentru Digitalizarea României","national digital authority romania"] describes operating essential digital platforms and implementing government cloud infrastructure. citeturn5search8turn5search19 Romania also has official, widely used citizen-facing platforms: data.gov.ro is the national open datasets portal; ROePAS is positioned as a single access point to services/procedures; ROeID is positioned as the national SSO solution for citizens’ digital interactions; and Ghișeul.ro is the state’s official online payment platform. citeturn5search10turn5search4turn5search5turn5search15
|
||||
|
||||
image_group{"layout":"carousel","aspect_ratio":"16:9","query":["EU Open Source Solutions Catalogue Interoperable Europe screenshot","openCode platform Germany screenshot","ROePAS portal screenshot","Ghiseul.ro platform screenshot"],"num_per_query":1}
|
||||
|
||||
A cautionary-but-useful reference for your discovery UX is entity["company","Product Hunt","tech product discovery site"]: it explicitly frames launch discovery as a leaderboard driven by upvotes and engagement. citeturn7search2 That mechanism is valuable for “energy” and community flow, but it is insufficient for public-sector trust; your portal should treat popularity as a weak signal and trust artifacts as strong signals.
|
||||
|
||||
## Product concept and information architecture
|
||||
|
||||
I design the portal as two interlocking surfaces: a “developer publishing surface” that is strict about metadata and runnable artifacts, and a “citizen and institution surface” that is strict about legibility, safety, and evidence. The Standard for Public Code provides a useful value lens: public code should be usable, open, legible, accountable, accessible, and sustainable. citeturn7search0turn7search8
|
||||
|
||||
The information architecture should start from citizen mental models (“I need to do X”) rather than internal government org charts. Romania’s ROePAS framing (“services and documents you need” in one place) is a strong pattern that citizens already understand. citeturn5search4turn5search0 Therefore, I suggest a top navigation that stays stable across countries and institutions:
|
||||
|
||||
- **Life events and tasks** (citizen-first): pay, identify/authenticate, request certificates, permits, report issues, transparency/open data, benefits. citeturn5search7turn5search5turn5search10
|
||||
- **Building blocks** (system-first): identity/SSO, payments, forms, document processing, workflow, notifications, data exchange/interop. The GovStack sandbox exists because government services often assemble from reusable building blocks, and the Interoperable Europe ecosystem explicitly supports reusable solutions. citeturn7search3turn8search11turn8search3
|
||||
- **Demos** (safety-first): runnable, non-destructive, synthetic-data, read-only by default. citeturn9search0turn1search1
|
||||
- **Adoption evidence** (institution-first): pilot packs, deployments, “used by,” compliance artifacts. Italy’s reuse publication and acquisition guidance provides the operational template for how administrations want evidence and comparability. citeturn6search18turn6search2turn6search14
|
||||
|
||||
Personas must map to these surfaces:
|
||||
|
||||
Developers need fast onboarding (“submit a repo + publiccode.yml + demo artifact”) and clear value (visibility, adoption pipeline). citeturn6search18turn6search10 Citizens need “what it does, can I try it safely, is it used by government, where do I report issues” in two screens. citeturn5search4turn8search11 Institutions and evaluators need non-negotiable artifacts: license clarity, security posture, privacy posture, deployment notes, and support model. citeturn6search18turn1search2turn1search1
|
||||
|
||||
## Taxonomy and metadata design
|
||||
|
||||
A portal like this will fail if metadata is optional. A recurring European success pattern is that catalogs become useful once they are **machine-indexable** and **consistent**. publiccode.yml is explicitly designed as a metadata standard for repositories of software developed or acquired by public administrations, aimed at making them discoverable and reusable. citeturn2search0turn2search1turn8search14
|
||||
|
||||
Operational precedents matter: publiccode.yml is mandatory for public software developed in Italy and supports catalog crawling/building; openCode’s directory depends on valid publiccode.yml files; and the Interoperable Europe Portal promotes publiccode.yml as a standard for documenting and sharing public-sector open source. citeturn6search2turn6search0turn6search6
|
||||
|
||||
I recommend a “publiccode.yml superset” via a versioned extension, not by forking the standard. The Italian documentation explicitly notes interoperability goals and a separation between core keys and country-specific keys, which is the right design principle for a portal that might later federate with EU catalogs. citeturn2search4turn2search0 In addition, EU catalog ecosystems are increasingly structured around publiccode.yml as the “catalog contract.” citeturn2search12turn2search9
|
||||
|
||||
### Proposed metadata extension fields
|
||||
|
||||
I treat the following as “must-have extensions” because your portal hosts runnable demos and AI-adjacent tools; catalogs that do not execute code can omit most of these.
|
||||
|
||||
- **Demo/runtime descriptor:** `demo.runnable`, `demo.sandboxProfile`, `demo.egressPolicy`, `demo.dataPolicy`, quotas, and session lifecycle. This enforces your “demo-safe” badge in a machine-checkable way. citeturn9search0turn2search3
|
||||
- **Security artifacts:** SBOM location/format, provenance/attestation, signature verification policy. NIST SSDF explicitly treats artifact integrity, provenance, and vulnerability response as core secure development practices. citeturn1search2turn3search2turn3search12turn3search21
|
||||
- **Privacy declarations:** data categories, retention, DPIA status if relevant, whether personal data is processed. GDPR obligations around lawful processing and DPIA risk evaluation dictate that privacy posture is not optional if any personal data appears in pilots or production deployments. citeturn1search1
|
||||
- **AI disclosure:** whether AI is used, model source/type, known limitations, and an “AI Act risk hints” section intended as a disclosure artifact (not a formal legal classification). The EU AI Act creates obligations that vary by risk category, so structured disclosure reduces institutional uncertainty and improves safe adoption. citeturn0search5turn0search1
|
||||
- **Adoption evidence:** pilots, references, support model. Developers Italia and other reuse catalogs demonstrate that software reuse becomes real when publication metadata and adoption pathways are explicit. citeturn6search18turn6search10
|
||||
|
||||
## Trust and badge ladder
|
||||
|
||||
Public-sector tool discovery needs trust signals that are both legible and evidence-backed. A strong EU precedent is that badges can be generated from objective criteria and displayed in a software directory to communicate qualities such as security, maintenance, and reuse. citeturn6search8turn6search4
|
||||
|
||||
I recommend a ladder that allows early-stage innovation without pretending early-stage artifacts are “production safe.” The DPG Registry demonstrates that “review against a standard” can be a public trust mechanism and that compliance must be periodically reassessed, which is a useful governance concept for your higher badge tiers. citeturn8search5turn8search0
|
||||
|
||||
### Trust badge ladder
|
||||
|
||||
| Badge | Intended audience meaning | Minimum objective criteria | Evidence artifacts |
|
||||
|---|---|---|---|
|
||||
| Listed | “This exists and is described consistently.” | Valid publiccode.yml + portal extension; clear license; maintainer contact | Metadata validation output; LICENSE reference citeturn2search0turn6search18 |
|
||||
| Demo-safe | “I can try this without risking my data.” | Runs in constrained sandbox; synthetic data default; default-deny egress; time/memory quotas | Demo manifest; sandbox policy; runtime logs summary citeturn9search0turn2search3 |
|
||||
| Supply-chain verified | “The runnable artifact is verifiable.” | SBOM generated; provenance attested; signed artifacts; signature verified at run time | SBOM (SPDX/CycloneDX); provenance; Sigstore signature record citeturn3search2turn3search11turn4search4turn3search21turn3search0 |
|
||||
| Pilot-verified | “A public institution tested it in a scoped pilot.” | Pilot report + scope + metrics; DPIA note if personal data; incident channel | Pilot pack; DPIA summary if applicable; deployment notes citeturn6search18turn1search1turn2search3 |
|
||||
| Production-adopted | “This is used for real service delivery.” | Named deployment(s); support model; change management and security reporting expectations | Public deployment evidence; support/SLA statement; security update policy citeturn7search0turn1search2 |
|
||||
|
||||
I treat “Supply-chain verified” as non-optional for running third-party artifacts at scale because SSDF emphasizes protecting releases and responding to vulnerabilities as core practices, and modern SBOM + signing ecosystems exist exactly to reduce supply-chain risk. citeturn1search2turn3search6turn3search21
|
||||
|
||||
## Legal and compliance analysis for EU and Romania
|
||||
|
||||
I assume the portal is a platform hosting third-party submissions and allowing interaction/testing; therefore, compliance is a system constraint across listing, demos, moderation, analytics, and adoption workflows.
|
||||
|
||||
GDPR is central because even if demos are synthetic-only, the portal will process some personal data (accounts, feedback, logs) unless explicitly designed to avoid it. GDPR sets the baseline for lawful processing, transparency, data minimization, security, and DPIA requirements where processing creates high risks. citeturn1search1 I recommend “no-login demo mode” wherever possible to reduce GDPR surface, and “data protection by default” patterns for everything else. citeturn1search1
|
||||
|
||||
The EU AI Act introduces obligations for AI systems depending on their use and risk category; for a govtech portal, the highest-risk scenario is tools used in public-sector decision workflows or affecting rights and access to services. citeturn0search5turn0search1 I therefore recommend mandatory AI disclosures in metadata (model source/type, limitations, oversight expectations) and stronger badge criteria for any AI that touches eligibility, allocation, or enforcement decisions. citeturn0search5turn1search2
|
||||
|
||||
The DSA matters because the portal will host user-generated submissions and content; the regulation defines obligations for hosting services and online platforms, including notice-and-action, transparency, and constraints around how moderation decisions are handled and documented. citeturn0search2 I recommend implementing moderation workflows and transparency reporting from day one, even if the portal is not “very large,” because retrofitting those workflows later is costly and undermines trust. citeturn0search2
|
||||
|
||||
The Interoperable Europe Act matters because it frames a Union-scale governance mechanism for public-sector interoperability and expects solutions, collaboration, and feedback mechanisms to exist within the Interoperable Europe ecosystem. citeturn0search4turn8search11 Its implementing regulation for interoperability regulatory sandboxes is directly relevant to your pilot and sandbox approach: it treats sandboxes as places to experiment with innovative interoperability solutions and includes constraints about personal data processed in sandbox projects not being reused as operative data outside the project without proper legal basis. citeturn2search3turn2search17 I recommend aligning your “Pilot-verified” badge and pilot-pack templates with these operational expectations so that administrations can reuse your documents if they later enter formal interoperability sandbox programs. citeturn2search3turn0search4
|
||||
|
||||
Accessibility is not optional for a citizen-facing portal. Directive (EU) 2016/2102 sets accessibility requirements for public sector websites and apps, and EN 301 549 provides testable requirements and methodologies, explicitly mapping requirements relevant to the directive. citeturn0search3turn0search7 I recommend treating basic conformance checks as a release gate for the portal UI and as part of “Production-adopted” criteria for citizen-facing tools. citeturn0search7turn7search0
|
||||
|
||||
Romania-specific context is favorable for “anchor categories” and partnerships because ADR positions itself as operating essential digital platforms and the government ecosystem already has recognizable citizen touchpoints: ROePAS (single access point), ROeID (SSO), Ghișeul.ro (payments), and data.gov.ro (open data). citeturn5search8turn5search4turn5search5turn5search7turn5search10
|
||||
|
||||
## Secure sandbox architecture and the CI build-scan-run-observe pipeline
|
||||
|
||||
Hosting runnable tools changes the threat model from “catalog integrity” to “multi-tenant untrusted code execution.” I therefore design the platform as a secure software supply-chain system plus a sandbox execution system, not a typical web app.
|
||||
|
||||
### Sandbox runtime comparison
|
||||
|
||||
| Runtime option | Isolation mechanism (what it actually protects) | Best fit for this portal | Primary operational risks |
|
||||
|---|---|---|---|
|
||||
| WASM (WASI/capability-based) | Each module executes in a sandbox and can’t escape without host APIs; capability model can restrict filesystem/network/time. citeturn9search0turn9search1 | Default demo mode: calculators, form assistants, policy simulators, transformations, lightweight AI helpers | Capability misconfiguration; runtime vulnerabilities; “unsafe guest code” inside sandbox |
|
||||
| gVisor | Application kernel that moves kernel interfaces into a per-sandbox layer to reduce container escape risk. citeturn4search15turn4search7 | Mid-tier Linux compatibility without full VM overhead for containerized demos | Syscall compatibility, performance tuning, operational complexity |
|
||||
| Kata Containers | Lightweight VMs that “feel like containers” but add hardware-virtualization isolation as a second layer of defense. citeturn4search6turn4search2 | High-risk demos requiring near-standard Linux userland and stronger workload isolation | VM image management, perf footprint, cluster tuning |
|
||||
| Firecracker microVM | MicroVMs combine VM isolation with speed/efficiency; designed for secure multi-tenant workloads. citeturn4search13turn4search1 | Strong isolation for untrusted full-stack demos; good for “per-session disposable environments” | MicroVM orchestration and lifecycle complexity; image and kernel maintenance |
|
||||
|
||||
I treat WASM as the MVP default because it offers strong default sandbox semantics and a frictionless developer and citizen experience for many civic tool categories. citeturn9search0turn9search4 I treat Firecracker/Kata as necessary later stages for heavier workloads, because they target stronger isolation for multi-tenant execution, which becomes essential as the portal scales or hosts more complex demos. citeturn4search13turn4search6
|
||||
|
||||
### Supply-chain controls: SLSA, SBOM, Sigstore, SSDF
|
||||
|
||||
NIST SSDF is a primary reference for secure software development practices and explicitly targets reducing vulnerabilities through organizational preparation, protecting software, producing well-secured software, and responding to vulnerabilities. citeturn1search2turn1search11 I use SSDF as the “policy backbone” for your platform’s CI rules and vulnerability processes.
|
||||
|
||||
I require SBOMs because the NTIA “minimum elements” approach frames SBOMs as formal records of software components and relationships and defines a baseline of fields/operational practices; SPDX is an international open standard (ISO/IEC 5962:2021) and CycloneDX is standardized via ECMA-424 and supports inventory information including ML models and other artifacts relevant to modern supply chains. citeturn3search6turn3search11turn4search4
|
||||
|
||||
I require provenance because SLSA treats provenance as verifiable information describing how artifacts were produced, supporting stronger integrity guarantees as maturity increases. citeturn3search12turn3search0
|
||||
|
||||
I require signing and verification because Sigstore describes a keyless approach that binds ephemeral keys to identities via short-lived certificates and logs signing events in a transparency log, enabling verification and auditing at scale. citeturn3search21turn3search1
|
||||
|
||||
### CI/build/scan/run/observe pipeline
|
||||
|
||||
The portal’s policy must be: *if it runs, it is built, attested, and signed.* This is consistent with SSDF’s emphasis on protected releases and vulnerability response, and it is operationally enabled by SBOM/provenance/signing tooling. citeturn1search2turn3search6turn3search0turn3search21
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A[Source Repo or Upload] --> B[Metadata Gate: publiccode.yml + extension lint]
|
||||
B --> C[Build in Isolated CI Runner]
|
||||
C --> D[Generate SBOM]
|
||||
C --> E[Generate Provenance Attestation]
|
||||
D --> F[Dependency + vuln scan]
|
||||
E --> G[Policy checks: build integrity]
|
||||
F --> H[Sign & attest artifacts]
|
||||
G --> H
|
||||
H --> I[Publish to OCI registry]
|
||||
I --> J[Admission Control: verify signature+attestation]
|
||||
J --> K[Sandbox Run: WASM/gVisor/Kata/Firecracker]
|
||||
K --> L[Observe: logs/metrics/traces]
|
||||
L --> M[Badge Evaluation + publish demo]
|
||||
M --> N[Ongoing vuln intake + revocation path]
|
||||
```
|
||||
|
||||
I treat the runtime layer as “default deny”: no outbound network unless explicitly allowlisted; read-only filesystems; no system credentials; time/memory quotas; and per-session destruction. WASM and sandboxed container runtimes are explicitly positioned as isolation technologies for untrusted code, so the design should continuously minimize the host and network attack surface reachable from a demo. citeturn9search0turn4search15turn4search13
|
||||
|
||||
### Reference architecture
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Portal
|
||||
UI[Citizen UI + Dev UI]
|
||||
API[API + Auth (minimal)]
|
||||
DB[(Catalog DB)]
|
||||
IDX[(Search Index)]
|
||||
end
|
||||
|
||||
subgraph SupplyChain
|
||||
CI[Isolated CI Builders]
|
||||
REG[(OCI Registry)]
|
||||
SBOM[(SBOM Store)]
|
||||
ATTEST[(Attestation Store)]
|
||||
SIG[Signature & Transparency Log]
|
||||
end
|
||||
|
||||
subgraph Sandbox
|
||||
ORCH[Sandbox Orchestrator]
|
||||
WASM[WASM Runner]
|
||||
CRTL[Policy Engine: egress/quotas]
|
||||
HV[High Isolation Pool: gVisor/Kata/Firecracker]
|
||||
OBS[Observability Stack]
|
||||
end
|
||||
|
||||
UI --> API
|
||||
API --> DB
|
||||
API --> IDX
|
||||
|
||||
API --> CI
|
||||
CI --> REG
|
||||
CI --> SBOM
|
||||
CI --> ATTEST
|
||||
CI --> SIG
|
||||
|
||||
REG --> ORCH
|
||||
ORCH --> CRTL
|
||||
CRTL --> WASM
|
||||
CRTL --> HV
|
||||
ORCH --> OBS
|
||||
API --> OBS
|
||||
```
|
||||
|
||||
I keep this architecture vendor-neutral by specifying interfaces (OCI registry, attestations, SBOM formats) rather than cloud-specific services, which supports the no lock‑in assumption and aligns with EU reuse ecosystems focused on interoperability. citeturn8search11turn2search2turn4search4
|
||||
|
||||
## Adoption pathway, sustainability, roadmap, and risk register
|
||||
|
||||
Catalogs become adoption engines only when they reduce evaluation workload for administrations. Italy provides a concrete model for how administrations evaluate and acquire software with a preference for reuse and open source, and Developers Italia provides guidance on publication and acquisition processes. citeturn6search18turn6search2turn6search14 I mirror that pattern with “pilot packs” and badge evidence.
|
||||
|
||||
### Pilot packs for administrations
|
||||
|
||||
I package each **Pilot-verified** candidate with:
|
||||
|
||||
A deployment pack (IaC manifests, architecture diagram, minimum infrastructure), a security pack (SBOM + provenance + signature verification policy + scan results), a privacy pack (data categories + retention + DPIA template outcome if relevant), and an evaluation pack (scope, metrics, rollback plan, support model). SSDF’s structure provides a credible backbone for the security and vulnerability response portions of these packs. citeturn1search2turn3search6
|
||||
|
||||
In Romania, I prioritize pilots that integrate with already-recognized national primitives (SSO, payments, open data) because they reduce behavioral friction and demonstrate immediate value: ROeID, Ghișeul.ro, and data.gov.ro provide the citizen-understood baseline for those primitives. citeturn5search5turn5search7turn5search10
|
||||
|
||||
### Sustainability and monetization
|
||||
|
||||
I keep discovery, listing, and “demo-safe” testing free to preserve credibility and community growth. I monetize operational burden and institutional requirements: dedicated environments, managed deployment support, security/compliance services, private pilot sandboxes, and enterprise connectors. This “charge for operations, not openness” posture remains aligned with public-sector open source strategies and avoids pay-to-win distortions that would undermine trust. citeturn7search0turn6search8turn1search2
|
||||
|
||||
| Monetization option | Mostly-free compatibility | Typical buyer | What I deliver |
|
||||
|---|---|---|---|
|
||||
| Managed enterprise tenant | High | Agencies/municipalities | Dedicated portal instance, SSO, audit logs, backups |
|
||||
| Private pilot sandbox | High | Agencies piloting with sensitive integration | Isolated runtime + allowlisted connectors + strict governance |
|
||||
| Security/compliance service | Medium–high | Implementers and agencies | SBOM/provenance/signing setup, pentest coordination, DPIA support |
|
||||
| Support marketplace | Medium | Builders and institutions | Paid support contracts around open projects |
|
||||
|
||||
### Phased roadmap
|
||||
|
||||
The Interoperable Europe ecosystem demonstrates that catalogs can scale, but runnable demo hosting requires additional controls; therefore, I stage runtime complexity behind trust maturity. citeturn2search9turn1search2turn6search8
|
||||
|
||||
```mermaid
|
||||
gantt
|
||||
title GovTech Commons Portal Roadmap (Assumed Start: 2026-04-06)
|
||||
dateFormat YYYY-MM-DD
|
||||
axisFormat %b %Y
|
||||
|
||||
section MVP Foundation
|
||||
Governance + schemas + policies :a1, 2026-04-08, 30d
|
||||
Catalog + search + citizen tool pages :a2, after a1, 45d
|
||||
WASM demo runner (demo-safe only) :a3, after a1, 45d
|
||||
|
||||
section Trust and supply chain
|
||||
SBOM + signing + provenance baseline :b1, after a2, 45d
|
||||
Badge ladder v1 (Listed + Demo-safe) :b2, after a2, 30d
|
||||
|
||||
section Pilot readiness
|
||||
Pilot pack templates + first pilots :c1, after b1, 60d
|
||||
Upgrade runner tier (gVisor/Kata/microVM) :c2, after c1, 60d
|
||||
|
||||
section Scale and federation
|
||||
Federation/export feeds to EU patterns :d1, after c2, 60d
|
||||
Production-adopted governance + audits :d2, after c2, 90d
|
||||
```
|
||||
|
||||
### Risk register with mitigations
|
||||
|
||||
I focus on risks that are unique to “runnable demos + citizens + public administration trust.”
|
||||
|
||||
| Risk | Why it matters | Mitigation (design constraint) |
|
||||
|---|---|---|
|
||||
| Untrusted code escape or host compromise | Runnable demos are attacker-controlled inputs | WASM-first; stronger isolation tiers; signed-only artifacts; default-deny egress; quotas; per-session destruction citeturn9search0turn4search13turn1search2 |
|
||||
| Supply-chain compromise | Open source does not mean safe | SBOM + provenance + signing; admission control verifies signatures/attestations citeturn1search2turn3search6turn3search0turn3search21 |
|
||||
| GDPR exposure through telemetry | Logs and analytics can become personal data | No-login demos; minimize identifiers; DPIA gating; retention limits citeturn1search1 |
|
||||
| Moderation/legal exposure under DSA | User-submitted content triggers platform duties | Notice-and-action workflow; decision logs; transparency posture citeturn0search2 |
|
||||
| AI misuse in public services | AI outputs can affect rights | Mandatory AI disclosures; stricter badges for decision-affecting tools citeturn0search5 |
|
||||
| Accessibility debt | Excludes citizens; harms public-sector credibility | Portal UI gates; EN 301 549-aware checks; accessibility as adoption criterion citeturn0search3turn0search7 |
|
||||
| “Popularity beats safety” dynamic | Hype can override evidence | Separate ranking: community signal vs trust score; restrict promo features citeturn7search2turn6search8 |
|
||||
| Project abandonment | Catalog fills with dead prototypes | Maintenance badge criteria; lifecycle dates; automated stale warnings citeturn6search8turn1search2 |
|
||||
| Vendor lock-in creep | Undermines the platform’s public-good posture | OCI artifacts, open schemas, export feeds; no proprietary runtime dependence citeturn4search4turn2search2 |
|
||||
|
||||
### Notes on unspecified details
|
||||
|
||||
The exact procurement integration mechanism in Romania is unspecified; I therefore assume the portal will provide evidence packs and support pathways but will not initially function as a procurement platform. citeturn6search18turn6search14 The currency for budget ranges is unspecified by request, so I keep costs as ranges without currency.
|
||||
|
||||
**File B (implementation plan + launch checklist + stack):** `implementation-plan-and-launch-checklist.md`
|
||||
|
||||
**Purpose (in first person):** I will implement a safe, OSS-first portal MVP in 8–12 weeks that can list projects immediately and run “demo-safe” tools without exposing citizen data or my infrastructure.
|
||||
|
||||
**Assumptions (explicit):** I assume EU/Romania focus, no vendor lock-in, “mostly free” public access, and that I already have baseline hardware and ops capacity; currency is unspecified.
|
||||
|
||||
**Recommended technical stack (vendor-neutral):**
|
||||
- **Frontend:** Next.js (SSR + static generation) or equivalent; strong accessibility baseline aligned to EN 301 549 expectations. citeturn0search7turn0search3
|
||||
- **Backend API:** FastAPI or Node (NestJS); Postgres (catalog), OpenSearch/Meilisearch (search).
|
||||
- **Artifact format:** OCI artifacts (containers and/or WASM bundles) stored in an OCI registry.
|
||||
- **CI:** GitHub Actions or GitLab CI with isolated self-hosted runners; policy: only signed artifacts can run. citeturn1search2turn3search21
|
||||
- **SBOM:** SPDX and/or CycloneDX (store and display). citeturn3search11turn4search4
|
||||
- **Provenance:** SLSA-style provenance attestations stored alongside artifacts. citeturn3search0turn3search12
|
||||
- **Signing:** Sigstore (Cosign + transparency log behavior). citeturn3search21turn3search1
|
||||
- **Sandbox orchestrator:** Kubernetes + policy engine; MVP runner is WASM, with a later tier for gVisor/Kata/Firecracker. citeturn9search0turn4search15turn4search6turn4search13
|
||||
- **Observability:** OpenTelemetry + Prometheus + centralized logs with strict retention.
|
||||
|
||||
**Roles I will assign (responsibilities):**
|
||||
- Product lead (PL), Tech lead (TL), Security lead (Sec), SRE/DevOps (SRE), UX/content (UX), Community/partnerships (Comms), Legal/privacy (Legal).
|
||||
|
||||
**MVP scope (8–12 weeks, in first person):**
|
||||
- I will ship the catalog with strict metadata gates (publiccode.yml + extension). citeturn2search0turn6search18
|
||||
- I will ship citizen-readable pages that expose the trust badge and “data used” in plain language. citeturn7search0turn1search1
|
||||
- I will ship a WASM demo runner with default-deny egress and synthetic data. citeturn9search0turn9search1
|
||||
- I will ship badge ladder v1: Listed + Demo-safe, with automatic criteria checks. citeturn6search8
|
||||
- I will ship DSA-aligned minimum moderation and reporting mechanics (report button, takedown workflow, logs). citeturn0search2
|
||||
|
||||
**Launch checklist (complete, assigned, with effort and minimal budgets; currency unspecified):**
|
||||
|
||||
| Area | Checklist item (I will…) | Owner | Effort (person-months) | Minimal budget range |
|
||||
|---|---|---:|---:|---:|
|
||||
| Governance & legal | publish Terms/Acceptable Use + demo disclaimers | Legal | 0.30 | 0–1k |
|
||||
| Governance & legal | publish privacy notice + retention policy + DPIA template | Legal | 0.40 | 0–2k |
|
||||
| Governance & legal | publish security policy + coordinated vulnerability disclosure | Sec | 0.30 | 0–1k |
|
||||
| Governance & legal | implement DSA-style notice-and-action workflow + logging | Legal+Comms | 0.50 | 0–2k |
|
||||
| Metadata | implement publiccode.yml validator + portal-extension schema v0.1 | TL | 0.60 | 0–1k |
|
||||
| Metadata | create project templates (sample publiccode.yml + extension) | TL+UX | 0.30 | 0–1k |
|
||||
| Catalog | implement catalog DB + search index + filters | TL | 0.70 | 0–2k |
|
||||
| Catalog | build citizen tool page template (plain language, evidence, demo link) | UX | 0.50 | 0–2k |
|
||||
| Demo sandbox | harden WASM runtime profile (no net, RO FS, quotas) | Sec+SRE | 0.80 | 0–3k |
|
||||
| Demo sandbox | implement demo packaging + upload/attach flow | TL | 0.60 | 0–2k |
|
||||
| Demo sandbox | implement sandbox admission control (signed-only runnable) | Sec | 0.60 | 0–2k |
|
||||
| Supply chain | generate SBOM automatically on build | Sec | 0.50 | 0–2k |
|
||||
| Supply chain | create provenance attestations (baseline) | Sec | 0.50 | 0–2k |
|
||||
| Supply chain | implement Sigstore signing + verification | Sec+SRE | 0.70 | 0–3k |
|
||||
| Scanning | implement SCA/SAST/secrets scanning + thresholds | Sec | 0.60 | 0–2k |
|
||||
| Trust badges | implement badge rules engine + UI display | TL+Sec | 0.70 | 0–2k |
|
||||
| Observability | deploy metrics/logs with retention rules | SRE | 0.60 | 0–2k |
|
||||
| Accessibility | run accessibility audit aligned to EN 301 549 expectations | UX | 0.30 | 0–2k |
|
||||
| Content seeding | seed 30–50 quality listings with 10 runnable demos | Comms+PL | 0.80 | 0–3k |
|
||||
| Partnerships | secure 2–3 pilot institutions (letters/MoU) | Comms+PL | 0.60 | 0–5k |
|
||||
| Pilot readiness | ship pilot pack templates (security/privacy/deploy/eval) | PL+Sec+Legal | 0.90 | 0–3k |
|
||||
| Launch ops | run a pre-launch security review + emergency rollback plan | Sec+SRE | 0.50 | 0–5k |
|
||||
|
||||
**Minimal operating rule (I will enforce):** No demo runs unless it is buildable, scanned, attested, and signed, and unless it fits an explicit sandbox profile. citeturn1search2turn3search21turn9search0
|
||||
|
||||
**English multi-AI review prompt (single prompt for Gemini/Claude/GLM/GPT; compare, merge, consolidate):**
|
||||
```text
|
||||
You are an expert reviewer panel. Review the attached plan for an EU/Romania-first open-source govtech portal that lists and hosts runnable civic/AI demos with a trust badge ladder and hardened sandboxes.
|
||||
|
||||
Your output must be structured as:
|
||||
1) Top 10 critical gaps (explain impact and urgency).
|
||||
2) Security critique: sandbox isolation (WASM, gVisor, Kata, Firecracker), multi-tenancy threats, egress control, secret handling, artifact admission control.
|
||||
3) Supply-chain critique: SSDF alignment, SLSA provenance, SBOM formats (SPDX/CycloneDX), Sigstore signing/verification, revocation and vulnerability response.
|
||||
4) Compliance critique: GDPR, EU AI Act, DSA duties, Interoperable Europe Act + sandbox implementing rules, accessibility (Directive 2016/2102 + EN 301 549). Identify what is missing and propose concrete mitigations.
|
||||
5) Product critique: IA, personas, citizen legibility; recommend simplifications for an 8–12 week MVP.
|
||||
6) Feasibility: what to cut, what to keep, and what to sequence to ship safely.
|
||||
7) Risk register: add 10 additional risks with mitigations.
|
||||
|
||||
Comparison rules:
|
||||
- For each disagreement with the baseline, label it as (Correction / Enhancement / Alternative).
|
||||
- Conclude with “Merged Plan Delta” containing:
|
||||
KEEP (items you agree with),
|
||||
CHANGE (replace with your wording),
|
||||
ADD (missing items),
|
||||
REMOVE (what to drop and why).
|
||||
|
||||
Constraints:
|
||||
- Assume attackers will submit malicious demos.
|
||||
- Default posture must be deny-by-default (network, filesystem, identity).
|
||||
- No vendor lock-in.
|
||||
- Prefer primary/official sources when making factual claims.
|
||||
```
|
||||
|
||||
**Short Romanian prompt for other AIs (as requested):**
|
||||
```text
|
||||
Fă o versiune mai simplă și mai acționabilă a acestui plan, potrivită pentru un MVP de 8–12 săptămâni; evidențiază 6 pași concreți și 5 reguli de siguranță; returnează un "Merged Plan Delta" comparând cu planul original.
|
||||
```
|
||||
Reference in New Issue
Block a user