initial: split from gov-agreg — vreau.digital standalone platform

Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
This commit is contained in:
Claude VM
2026-05-13 00:10:32 +03:00
commit a6c03a091e
352 changed files with 75295 additions and 0 deletions
+151
View File
@@ -0,0 +1,151 @@
# Session 2026-05-11 — vreaudigital.ro
Sesiune extinsă post-Phase 5 UI merge. Pornit ca tick autonom, evoluat în 15 cicluri productive consecutive. Sha live la final: **`7ca4aa4`** (49 recipe, 17 systemd timers, 100% geocoding).
## Cronologie cicluri
| Tick | Focus | Commit-uri | Highlights |
|---:|---|---|---|
| Phase 5 (pre-tick) | G1-G2-G3-G4-G5 sub-agenți | 8 commits | 6 helper functions, 7 firma badges, 5 sections, 6 recipes, 3 investigative reports |
| Phase 5 merge | UI integration + commit cleanup | 2 commits | `57af3a6` + `c1d90bf` |
| Tick #1 | A1-A2-A3 sub-agenți (fixes/geocoding/completions) + A4-A5 (browse UIs) + S1 (refresh strategy) | 6 commits | Geocoding 91→100%, ASF cleanup, ANRE electricieni 0→73K, 2 new browse pages |
| Tick #1.5 | Disk cleanup + heartbeat monitoring | 4 commits | 89%→45% disk, heartbeat.sh + systemd timer (20 sources daily 07:00) |
| Tick #2 | 11 systemd timer pairs | 1 commit | Weekly + monthly timers for all scrape-*.sh wrappers |
| Tick #3 | Autoritate profile badges | 1 commit | 5 cross-source badges + getBugetarStatus helper |
| Tick #4 | Autoritate profile sections | 1 commit | 4 sections (ANAF/CNSC/Curtea Conturi/RegAS) — parity cu firma |
| Tick #5 | Bugetar UAT pattern match | 1 commit | +961 matches (58.3% → 63.4%), strip-parens insight |
| Tick #6 | Curteacont CUI backfill | 1 commit | 0% → 64.4% (+730 matches), prefix-bug data fix |
| Tick #7 | CNSC authority CUI backfill | 1 commit | 42% → 77.5% (+10,328 matches) — biggest single backfill |
| Tick #8 | SEAP DA wrapper + timer (was missing!) | 1 commit | Daily 02:30, 4h timeout for ~7-month catch-up |
| Tick #9 | Firma bugetar badge + recipe refactor | 2 commits | autoritati-audited-repetitiv: 5s → <500ms |
| Tick #10 | Recipe dubla-alerta-cdc-cnsc | 1 commit | 50 entități, MUNICIPIUL CONSTANTA top (93 semnale) |
| Tick #11 | Recipe donatori-datornici (moral hazard) | 1 commit | 360 firme — B&B BUSINESS 1:28,184 ratio |
| Tick #12 | Recipe energie-anre-datornici | 1 commit | 875 operatori — 3.14 mld RON debt agregat |
| Tick #13 | Red-flags landing 6→13 cards + 3 KPI tiles | 1 commit | Surfacing for the new investigative recipes |
| Tick #14 | Recipe donatori-contestatori (politic leverage) | 1 commit | 185 firme — SHERIFF GUARD 62 contestații vs 27K donatie |
| Tick #15 | Audit + this doc | 1 commit | System health verified, summary written |
## Date statistici finale
### CUI matching coverage
| Sursă | Pre-sesiune | Post-sesiune | Delta |
|---|---:|---:|---:|
| firms.entities geocoding | 91.3% | **100.00%** | +346,675 |
| ASF CUI clean | 51% | **100%** | +412 cleaned |
| cnsc.decizii authority | 42% | **77.5%** | +10,328 |
| curteacont.rapoarte | 0% | **64.4%** | +730 |
| bugetar.entitate | 58.3% | **63.4%** | +961 |
| cnas.furnizori | 0% | 9% | +3,255 (dirty data residue) |
### Total date publice agregate
17 schemas integrate cross-source via CUI hub (firms.entities = 3.99M):
- **~17.9M rânduri** date publice unice (per G3 audit)
- **75 contracte SEAP** active acum vs 8 luni stale înainte (DA pipeline)
- **49 recipe** pe /achizitii/retete (era 39 la start)
- **23 gotcha** documentate în memory
## Recipes shipped (Phase 5 + autonomous run)
| Slug | Source pair | Yield | Tier |
|---|---|---:|---|
| `energie-fara-licenta` | SEAP ANRE | red-flags | T3 |
| `telco-fara-licenta` | SEAP ANCOM | red-flags | T3 |
| `autoritati-contestate-cnsc` | CNSC × SEAP | 4,192 autorities | T2 |
| `asiguratori-furnizori-stat` | ASF × SEAP | 63 firms | T4 |
| `stat-actionar-seap` | AAAS × SEAP | red-flags | T3 |
| `autoritati-audited-repetitiv` | Curtea × SEAP | red-flags | T4 |
| `autoritati-dubla-alerta-cdc-cnsc` | Curtea × CNSC | **50** | T2 |
| `donatori-politici-care-datoreaza-statului` | AEP × ANAF | **360** | T2 |
| `energie-licentiati-anre-datornici-anaf` | ANRE × ANAF | **875** | T2 |
| `donatori-politici-care-contesta-la-cnsc` | AEP × CNSC | **185** | T2 |
## Top killer findings (jurnalistic-ready)
1. **B&B BUSINESS SOLUTIONS** — 10K RON donat la partide vs **281.8 mil RON datorat ANAF** (ratio 1:28,184)
2. **HIDROELECTRICA** — 214M datorie ANAF + 4 licențe ANRE active (stat-stat circular)
3. **MUNICIPIUL CONSTANTA** — 3 audituri Curtea Conturi + 90 contestații CNSC = 93 semnale convergente
4. **SHERIFF GUARD PROTECTION** — 62 contestații CNSC vs 27K donatie (folosește calea juridică ca instrument principal)
5. **VICTOR CONSTRUCT** — 670K donatie + 23 contestații + activ pe SEAP (combinație politico-juridica)
## Infrastructure delivered
### 17 systemd timers active
| Cadence | Timer | Next fire |
|---|---|---|
| Daily 02:00 | anaf-daily | Tue 02:02 |
| Daily 02:30 | **da (NEW)** | Tue 02:32 |
| Daily 04:00 | mvs | Tue 04:04 |
| Daily 07:00 | **heartbeat (NEW)** | Tue 07:02 |
| Weekly Sun 01:00 | anre | Sun 01:06 |
| Weekly Mon 01:00 | ancom | Mon 01:00 |
| Weekly Tue 01:00 | asf | Tue 01:07 |
| Weekly Wed 01:00 | aaas | Wed 01:05 |
| Weekly Thu 01:00 | curteacont | Thu 01:06 |
| Weekly Fri 01:00 | gnm | Fri 01:00 |
| Weekly Sat 01:00 | cnsc | Sat 01:03 |
| Weekly Tue 03:00 | onrc-weekly | Tue 03:03 |
| Monthly 1st 03:00 | regas | Jun 1 03:06 |
| Monthly 1st 03:30 | aep-donatii | Jun 1 03:30 |
| Monthly 1st 05:00 | cnas | Jun 1 05:06 |
| Monthly 15th 03:00 | apia-fermieri | May 15 03:02 |
### Heartbeat monitoring
- Probes 20 sources, posts to n8n satra-backup-alert webhook when STALE
- Currently 19/20 OK, 1 STALE: ani.declaratii (known unimplemented)
### Disk
- 89% → 45% (156 GB freed via `docker builder prune -a -f` + `docker image prune -a -f`)
## Documents written
| Path | Author | Purpose |
|---|---|---|
| `chatGPT/data-quality/freshness-audit-2026-05-10.md` | G3 sub-agent | 17.9M row reconciliation + per-schema cadence |
| `chatGPT/data-quality/geocoding-strategy-2026-05-11.md` | A2 sub-agent | Fallback chain documentation |
| `chatGPT/data-quality/refresh-cadence-strategy-2026-05-11.md` | S1 sub-agent | Master cron schedule + 2captcha budget |
| `chatGPT/journalism/killer-findings-2026-05-10.md` | G4 sub-agent | 5 lead findings + 7 storylines |
| `chatGPT/journalism/sectorial-deep-dive-2026-05-10.md` | G5 sub-agent | ENERGIE/TELECOM/FINANCIAR analysis |
| `services/seap-scraper/HANDOFF-aaas-ordin-278.md` | A3 sub-agent | AAAS PDF backfill plan |
| `services/seap-scraper/HANDOFF-asf-other-registers.md` | A3 sub-agent | ASF pension/AIFM/UCITS plan |
| `services/seap-scraper/HANDOFF-cnas-layout-b.md` | A3 sub-agent | CNAS 9 PDFs layout-B parser plan |
| `services/seap-scraper/systemd/README.md` | tick #2 | Systemd unit install procedure |
| **This doc** | tick #15 | Session retrospective |
## Reusable patterns discovered
### 1. Strip-parens + UAT-pattern (3-source proven)
ONRC stores comune/orașe with " (Primaria Y)" suffix. Stripping suffix and comparing normalized → exact match. Used for:
- bugetar (sql/039) → +961 matches în 1m 46s
- curteacont (sql/040 + 041) → +730 matches în <2 min
- cnsc (sql/042) → +10,328 matches în 1m 25s
### 2. Sub-agent isolation via dedicated helper files
G1 + G2 wrote separate `profile-queries-utilities.ts` + `profile-queries-financial.ts` to avoid merge conflicts. Pattern reusable for any parallel codegen task.
### 3. Cross-source RATIO mismatches surface real signal
- B&B: 10K donation vs 281M debt → 1:28,184 ratio = lever-amount mismatch
- SHERIFF GUARD: 27K donation vs 62 contestations → cheap-donation-buys-aggressive-juridical-strategy
Single-source counts are explained away by "volume mare". Cross-source ratios force a specific narrative.
## Known limitations / next-session candidates
### Critical (DR/observability)
- DB backup runs from root's crontab (NOT bulibasa's) — confirmed working but undocumented elsewhere
- Heartbeat hits n8n webhook but n8n routing for `service:"data-heartbeat"` field not verified — first alert email needs validation
### High-impact (3-15h each)
- CNSC Stage 2 PDF parse → decision_type (admis/respins) — unlocks killer recipe "autorități cu rată mare contestații pierdute"
- Curtea Conturi Stage 2 → findings_count + key amounts per audit
- CNAS layout-B parser (9 remaining PDFs)
- ASF pension funds + AIFM + UCITS register ingest
### Medium-effort (4-8h)
- TED full re-import (publication-date backfill — fix shipped tick #1)
- normalize_company_name v2 for orthography (Cârlogani ↔ Cirlogani)
- ANRE 92.3% residue (commercial firms — need different match strategy)
### Speculative
- 2captcha integration (~$60-100 one-shot for Bugetar Faza 2 + ANAF datornici quarterly refresh)
- ANI parser MVP (1.3M PDFs, 15-day effort)