Files
vreau-digital/chatGPT/session-summary-2026-05-11.md
T
Claude VM a6c03a091e initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
2026-05-13 00:10:32 +03:00

152 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Session 2026-05-11 — vreaudigital.ro
Sesiune extinsă post-Phase 5 UI merge. Pornit ca tick autonom, evoluat în 15 cicluri productive consecutive. Sha live la final: **`7ca4aa4`** (49 recipe, 17 systemd timers, 100% geocoding).
## Cronologie cicluri
| Tick | Focus | Commit-uri | Highlights |
|---:|---|---|---|
| Phase 5 (pre-tick) | G1-G2-G3-G4-G5 sub-agenți | 8 commits | 6 helper functions, 7 firma badges, 5 sections, 6 recipes, 3 investigative reports |
| Phase 5 merge | UI integration + commit cleanup | 2 commits | `57af3a6` + `c1d90bf` |
| Tick #1 | A1-A2-A3 sub-agenți (fixes/geocoding/completions) + A4-A5 (browse UIs) + S1 (refresh strategy) | 6 commits | Geocoding 91→100%, ASF cleanup, ANRE electricieni 0→73K, 2 new browse pages |
| Tick #1.5 | Disk cleanup + heartbeat monitoring | 4 commits | 89%→45% disk, heartbeat.sh + systemd timer (20 sources daily 07:00) |
| Tick #2 | 11 systemd timer pairs | 1 commit | Weekly + monthly timers for all scrape-*.sh wrappers |
| Tick #3 | Autoritate profile badges | 1 commit | 5 cross-source badges + getBugetarStatus helper |
| Tick #4 | Autoritate profile sections | 1 commit | 4 sections (ANAF/CNSC/Curtea Conturi/RegAS) — parity cu firma |
| Tick #5 | Bugetar UAT pattern match | 1 commit | +961 matches (58.3% → 63.4%), strip-parens insight |
| Tick #6 | Curteacont CUI backfill | 1 commit | 0% → 64.4% (+730 matches), prefix-bug data fix |
| Tick #7 | CNSC authority CUI backfill | 1 commit | 42% → 77.5% (+10,328 matches) — biggest single backfill |
| Tick #8 | SEAP DA wrapper + timer (was missing!) | 1 commit | Daily 02:30, 4h timeout for ~7-month catch-up |
| Tick #9 | Firma bugetar badge + recipe refactor | 2 commits | autoritati-audited-repetitiv: 5s → <500ms |
| Tick #10 | Recipe dubla-alerta-cdc-cnsc | 1 commit | 50 entități, MUNICIPIUL CONSTANTA top (93 semnale) |
| Tick #11 | Recipe donatori-datornici (moral hazard) | 1 commit | 360 firme — B&B BUSINESS 1:28,184 ratio |
| Tick #12 | Recipe energie-anre-datornici | 1 commit | 875 operatori — 3.14 mld RON debt agregat |
| Tick #13 | Red-flags landing 6→13 cards + 3 KPI tiles | 1 commit | Surfacing for the new investigative recipes |
| Tick #14 | Recipe donatori-contestatori (politic leverage) | 1 commit | 185 firme — SHERIFF GUARD 62 contestații vs 27K donatie |
| Tick #15 | Audit + this doc | 1 commit | System health verified, summary written |
## Date statistici finale
### CUI matching coverage
| Sursă | Pre-sesiune | Post-sesiune | Delta |
|---|---:|---:|---:|
| firms.entities geocoding | 91.3% | **100.00%** | +346,675 |
| ASF CUI clean | 51% | **100%** | +412 cleaned |
| cnsc.decizii authority | 42% | **77.5%** | +10,328 |
| curteacont.rapoarte | 0% | **64.4%** | +730 |
| bugetar.entitate | 58.3% | **63.4%** | +961 |
| cnas.furnizori | 0% | 9% | +3,255 (dirty data residue) |
### Total date publice agregate
17 schemas integrate cross-source via CUI hub (firms.entities = 3.99M):
- **~17.9M rânduri** date publice unice (per G3 audit)
- **75 contracte SEAP** active acum vs 8 luni stale înainte (DA pipeline)
- **49 recipe** pe /achizitii/retete (era 39 la start)
- **23 gotcha** documentate în memory
## Recipes shipped (Phase 5 + autonomous run)
| Slug | Source pair | Yield | Tier |
|---|---|---:|---|
| `energie-fara-licenta` | SEAP ANRE | red-flags | T3 |
| `telco-fara-licenta` | SEAP ANCOM | red-flags | T3 |
| `autoritati-contestate-cnsc` | CNSC × SEAP | 4,192 autorities | T2 |
| `asiguratori-furnizori-stat` | ASF × SEAP | 63 firms | T4 |
| `stat-actionar-seap` | AAAS × SEAP | red-flags | T3 |
| `autoritati-audited-repetitiv` | Curtea × SEAP | red-flags | T4 |
| `autoritati-dubla-alerta-cdc-cnsc` | Curtea × CNSC | **50** | T2 |
| `donatori-politici-care-datoreaza-statului` | AEP × ANAF | **360** | T2 |
| `energie-licentiati-anre-datornici-anaf` | ANRE × ANAF | **875** | T2 |
| `donatori-politici-care-contesta-la-cnsc` | AEP × CNSC | **185** | T2 |
## Top killer findings (jurnalistic-ready)
1. **B&B BUSINESS SOLUTIONS** — 10K RON donat la partide vs **281.8 mil RON datorat ANAF** (ratio 1:28,184)
2. **HIDROELECTRICA** — 214M datorie ANAF + 4 licențe ANRE active (stat-stat circular)
3. **MUNICIPIUL CONSTANTA** — 3 audituri Curtea Conturi + 90 contestații CNSC = 93 semnale convergente
4. **SHERIFF GUARD PROTECTION** — 62 contestații CNSC vs 27K donatie (folosește calea juridică ca instrument principal)
5. **VICTOR CONSTRUCT** — 670K donatie + 23 contestații + activ pe SEAP (combinație politico-juridica)
## Infrastructure delivered
### 17 systemd timers active
| Cadence | Timer | Next fire |
|---|---|---|
| Daily 02:00 | anaf-daily | Tue 02:02 |
| Daily 02:30 | **da (NEW)** | Tue 02:32 |
| Daily 04:00 | mvs | Tue 04:04 |
| Daily 07:00 | **heartbeat (NEW)** | Tue 07:02 |
| Weekly Sun 01:00 | anre | Sun 01:06 |
| Weekly Mon 01:00 | ancom | Mon 01:00 |
| Weekly Tue 01:00 | asf | Tue 01:07 |
| Weekly Wed 01:00 | aaas | Wed 01:05 |
| Weekly Thu 01:00 | curteacont | Thu 01:06 |
| Weekly Fri 01:00 | gnm | Fri 01:00 |
| Weekly Sat 01:00 | cnsc | Sat 01:03 |
| Weekly Tue 03:00 | onrc-weekly | Tue 03:03 |
| Monthly 1st 03:00 | regas | Jun 1 03:06 |
| Monthly 1st 03:30 | aep-donatii | Jun 1 03:30 |
| Monthly 1st 05:00 | cnas | Jun 1 05:06 |
| Monthly 15th 03:00 | apia-fermieri | May 15 03:02 |
### Heartbeat monitoring
- Probes 20 sources, posts to n8n satra-backup-alert webhook when STALE
- Currently 19/20 OK, 1 STALE: ani.declaratii (known unimplemented)
### Disk
- 89% → 45% (156 GB freed via `docker builder prune -a -f` + `docker image prune -a -f`)
## Documents written
| Path | Author | Purpose |
|---|---|---|
| `chatGPT/data-quality/freshness-audit-2026-05-10.md` | G3 sub-agent | 17.9M row reconciliation + per-schema cadence |
| `chatGPT/data-quality/geocoding-strategy-2026-05-11.md` | A2 sub-agent | Fallback chain documentation |
| `chatGPT/data-quality/refresh-cadence-strategy-2026-05-11.md` | S1 sub-agent | Master cron schedule + 2captcha budget |
| `chatGPT/journalism/killer-findings-2026-05-10.md` | G4 sub-agent | 5 lead findings + 7 storylines |
| `chatGPT/journalism/sectorial-deep-dive-2026-05-10.md` | G5 sub-agent | ENERGIE/TELECOM/FINANCIAR analysis |
| `services/seap-scraper/HANDOFF-aaas-ordin-278.md` | A3 sub-agent | AAAS PDF backfill plan |
| `services/seap-scraper/HANDOFF-asf-other-registers.md` | A3 sub-agent | ASF pension/AIFM/UCITS plan |
| `services/seap-scraper/HANDOFF-cnas-layout-b.md` | A3 sub-agent | CNAS 9 PDFs layout-B parser plan |
| `services/seap-scraper/systemd/README.md` | tick #2 | Systemd unit install procedure |
| **This doc** | tick #15 | Session retrospective |
## Reusable patterns discovered
### 1. Strip-parens + UAT-pattern (3-source proven)
ONRC stores comune/orașe with " (Primaria Y)" suffix. Stripping suffix and comparing normalized → exact match. Used for:
- bugetar (sql/039) → +961 matches în 1m 46s
- curteacont (sql/040 + 041) → +730 matches în <2 min
- cnsc (sql/042) → +10,328 matches în 1m 25s
### 2. Sub-agent isolation via dedicated helper files
G1 + G2 wrote separate `profile-queries-utilities.ts` + `profile-queries-financial.ts` to avoid merge conflicts. Pattern reusable for any parallel codegen task.
### 3. Cross-source RATIO mismatches surface real signal
- B&B: 10K donation vs 281M debt → 1:28,184 ratio = lever-amount mismatch
- SHERIFF GUARD: 27K donation vs 62 contestations → cheap-donation-buys-aggressive-juridical-strategy
Single-source counts are explained away by "volume mare". Cross-source ratios force a specific narrative.
## Known limitations / next-session candidates
### Critical (DR/observability)
- DB backup runs from root's crontab (NOT bulibasa's) — confirmed working but undocumented elsewhere
- Heartbeat hits n8n webhook but n8n routing for `service:"data-heartbeat"` field not verified — first alert email needs validation
### High-impact (3-15h each)
- CNSC Stage 2 PDF parse → decision_type (admis/respins) — unlocks killer recipe "autorități cu rată mare contestații pierdute"
- Curtea Conturi Stage 2 → findings_count + key amounts per audit
- CNAS layout-B parser (9 remaining PDFs)
- ASF pension funds + AIFM + UCITS register ingest
### Medium-effort (4-8h)
- TED full re-import (publication-date backfill — fix shipped tick #1)
- normalize_company_name v2 for orthography (Cârlogani ↔ Cirlogani)
- ANRE 92.3% residue (commercial firms — need different match strategy)
### Speculative
- 2captcha integration (~$60-100 one-shot for Bugetar Faza 2 + ANAF datornici quarterly refresh)
- ANI parser MVP (1.3M PDFs, 15-day effort)