Files
vreau-digital/chatGPT/session-summary-2026-05-11.md
Claude VM a6c03a091e initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
2026-05-13 00:10:32 +03:00

8.5 KiB
Raw Permalink Blame History

Session 2026-05-11 — vreaudigital.ro

Sesiune extinsă post-Phase 5 UI merge. Pornit ca tick autonom, evoluat în 15 cicluri productive consecutive. Sha live la final: 7ca4aa4 (49 recipe, 17 systemd timers, 100% geocoding).

Cronologie cicluri

Tick Focus Commit-uri Highlights
Phase 5 (pre-tick) G1-G2-G3-G4-G5 sub-agenți 8 commits 6 helper functions, 7 firma badges, 5 sections, 6 recipes, 3 investigative reports
Phase 5 merge UI integration + commit cleanup 2 commits 57af3a6 + c1d90bf
Tick #1 A1-A2-A3 sub-agenți (fixes/geocoding/completions) + A4-A5 (browse UIs) + S1 (refresh strategy) 6 commits Geocoding 91→100%, ASF cleanup, ANRE electricieni 0→73K, 2 new browse pages
Tick #1.5 Disk cleanup + heartbeat monitoring 4 commits 89%→45% disk, heartbeat.sh + systemd timer (20 sources daily 07:00)
Tick #2 11 systemd timer pairs 1 commit Weekly + monthly timers for all scrape-*.sh wrappers
Tick #3 Autoritate profile badges 1 commit 5 cross-source badges + getBugetarStatus helper
Tick #4 Autoritate profile sections 1 commit 4 sections (ANAF/CNSC/Curtea Conturi/RegAS) — parity cu firma
Tick #5 Bugetar UAT pattern match 1 commit +961 matches (58.3% → 63.4%), strip-parens insight
Tick #6 Curteacont CUI backfill 1 commit 0% → 64.4% (+730 matches), prefix-bug data fix
Tick #7 CNSC authority CUI backfill 1 commit 42% → 77.5% (+10,328 matches) — biggest single backfill
Tick #8 SEAP DA wrapper + timer (was missing!) 1 commit Daily 02:30, 4h timeout for ~7-month catch-up
Tick #9 Firma bugetar badge + recipe refactor 2 commits autoritati-audited-repetitiv: 5s → <500ms
Tick #10 Recipe dubla-alerta-cdc-cnsc 1 commit 50 entități, MUNICIPIUL CONSTANTA top (93 semnale)
Tick #11 Recipe donatori-datornici (moral hazard) 1 commit 360 firme — B&B BUSINESS 1:28,184 ratio
Tick #12 Recipe energie-anre-datornici 1 commit 875 operatori — 3.14 mld RON debt agregat
Tick #13 Red-flags landing 6→13 cards + 3 KPI tiles 1 commit Surfacing for the new investigative recipes
Tick #14 Recipe donatori-contestatori (politic leverage) 1 commit 185 firme — SHERIFF GUARD 62 contestații vs 27K donatie
Tick #15 Audit + this doc 1 commit System health verified, summary written

Date statistici finale

CUI matching coverage

Sursă Pre-sesiune Post-sesiune Delta
firms.entities geocoding 91.3% 100.00% +346,675
ASF CUI clean 51% 100% +412 cleaned
cnsc.decizii authority 42% 77.5% +10,328
curteacont.rapoarte 0% 64.4% +730
bugetar.entitate 58.3% 63.4% +961
cnas.furnizori 0% 9% +3,255 (dirty data residue)

Total date publice agregate

17 schemas integrate cross-source via CUI hub (firms.entities = 3.99M):

  • ~17.9M rânduri date publice unice (per G3 audit)
  • 75 contracte SEAP active acum vs 8 luni stale înainte (DA pipeline)
  • 49 recipe pe /achizitii/retete (era 39 la start)
  • 23 gotcha documentate în memory

Recipes shipped (Phase 5 + autonomous run)

Slug Source pair Yield Tier
energie-fara-licenta SEAP ANRE red-flags T3
telco-fara-licenta SEAP ANCOM red-flags T3
autoritati-contestate-cnsc CNSC × SEAP 4,192 autorities T2
asiguratori-furnizori-stat ASF × SEAP 63 firms T4
stat-actionar-seap AAAS × SEAP red-flags T3
autoritati-audited-repetitiv Curtea × SEAP red-flags T4
autoritati-dubla-alerta-cdc-cnsc Curtea × CNSC 50 T2
donatori-politici-care-datoreaza-statului AEP × ANAF 360 T2
energie-licentiati-anre-datornici-anaf ANRE × ANAF 875 T2
donatori-politici-care-contesta-la-cnsc AEP × CNSC 185 T2

Top killer findings (jurnalistic-ready)

  1. B&B BUSINESS SOLUTIONS — 10K RON donat la partide vs 281.8 mil RON datorat ANAF (ratio 1:28,184)
  2. HIDROELECTRICA — 214M datorie ANAF + 4 licențe ANRE active (stat-stat circular)
  3. MUNICIPIUL CONSTANTA — 3 audituri Curtea Conturi + 90 contestații CNSC = 93 semnale convergente
  4. SHERIFF GUARD PROTECTION — 62 contestații CNSC vs 27K donatie (folosește calea juridică ca instrument principal)
  5. VICTOR CONSTRUCT — 670K donatie + 23 contestații + activ pe SEAP (combinație politico-juridica)

Infrastructure delivered

17 systemd timers active

Cadence Timer Next fire
Daily 02:00 anaf-daily Tue 02:02
Daily 02:30 da (NEW) Tue 02:32
Daily 04:00 mvs Tue 04:04
Daily 07:00 heartbeat (NEW) Tue 07:02
Weekly Sun 01:00 anre Sun 01:06
Weekly Mon 01:00 ancom Mon 01:00
Weekly Tue 01:00 asf Tue 01:07
Weekly Wed 01:00 aaas Wed 01:05
Weekly Thu 01:00 curteacont Thu 01:06
Weekly Fri 01:00 gnm Fri 01:00
Weekly Sat 01:00 cnsc Sat 01:03
Weekly Tue 03:00 onrc-weekly Tue 03:03
Monthly 1st 03:00 regas Jun 1 03:06
Monthly 1st 03:30 aep-donatii Jun 1 03:30
Monthly 1st 05:00 cnas Jun 1 05:06
Monthly 15th 03:00 apia-fermieri May 15 03:02

Heartbeat monitoring

  • Probes 20 sources, posts to n8n satra-backup-alert webhook when STALE
  • Currently 19/20 OK, 1 STALE: ani.declaratii (known unimplemented)

Disk

  • 89% → 45% (156 GB freed via docker builder prune -a -f + docker image prune -a -f)

Documents written

Path Author Purpose
chatGPT/data-quality/freshness-audit-2026-05-10.md G3 sub-agent 17.9M row reconciliation + per-schema cadence
chatGPT/data-quality/geocoding-strategy-2026-05-11.md A2 sub-agent Fallback chain documentation
chatGPT/data-quality/refresh-cadence-strategy-2026-05-11.md S1 sub-agent Master cron schedule + 2captcha budget
chatGPT/journalism/killer-findings-2026-05-10.md G4 sub-agent 5 lead findings + 7 storylines
chatGPT/journalism/sectorial-deep-dive-2026-05-10.md G5 sub-agent ENERGIE/TELECOM/FINANCIAR analysis
services/seap-scraper/HANDOFF-aaas-ordin-278.md A3 sub-agent AAAS PDF backfill plan
services/seap-scraper/HANDOFF-asf-other-registers.md A3 sub-agent ASF pension/AIFM/UCITS plan
services/seap-scraper/HANDOFF-cnas-layout-b.md A3 sub-agent CNAS 9 PDFs layout-B parser plan
services/seap-scraper/systemd/README.md tick #2 Systemd unit install procedure
This doc tick #15 Session retrospective

Reusable patterns discovered

1. Strip-parens + UAT-pattern (3-source proven)

ONRC stores comune/orașe with " (Primaria Y)" suffix. Stripping suffix and comparing normalized → exact match. Used for:

  • bugetar (sql/039) → +961 matches în 1m 46s
  • curteacont (sql/040 + 041) → +730 matches în <2 min
  • cnsc (sql/042) → +10,328 matches în 1m 25s

2. Sub-agent isolation via dedicated helper files

G1 + G2 wrote separate profile-queries-utilities.ts + profile-queries-financial.ts to avoid merge conflicts. Pattern reusable for any parallel codegen task.

3. Cross-source RATIO mismatches surface real signal

  • B&B: 10K donation vs 281M debt → 1:28,184 ratio = lever-amount mismatch
  • SHERIFF GUARD: 27K donation vs 62 contestations → cheap-donation-buys-aggressive-juridical-strategy

Single-source counts are explained away by "volume mare". Cross-source ratios force a specific narrative.

Known limitations / next-session candidates

Critical (DR/observability)

  • DB backup runs from root's crontab (NOT bulibasa's) — confirmed working but undocumented elsewhere
  • Heartbeat hits n8n webhook but n8n routing for service:"data-heartbeat" field not verified — first alert email needs validation

High-impact (3-15h each)

  • CNSC Stage 2 PDF parse → decision_type (admis/respins) — unlocks killer recipe "autorități cu rată mare contestații pierdute"
  • Curtea Conturi Stage 2 → findings_count + key amounts per audit
  • CNAS layout-B parser (9 remaining PDFs)
  • ASF pension funds + AIFM + UCITS register ingest

Medium-effort (4-8h)

  • TED full re-import (publication-date backfill — fix shipped tick #1)
  • normalize_company_name v2 for orthography (Cârlogani ↔ Cirlogani)
  • ANRE 92.3% residue (commercial firms — need different match strategy)

Speculative

  • 2captcha integration (~$60-100 one-shot for Bugetar Faza 2 + ANAF datornici quarterly refresh)
  • ANI parser MVP (1.3M PDFs, 15-day effort)