a6c03a091e
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix). - 22 pages migrated, 127 files total - All internal links: /achizitii/X → /X (176 occurrences fixed) - AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub - BaseLayout new (vreau.digital branding, OG tags, site URL) - astro.config.mjs: site https://vreau.digital, server output (was static) - docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital - deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log Backend shared with gov-agreg: - PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...) - Photon, Martin tiles - Infisical /vreaudigital path (DATABASE_URL etc. shared) build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
8.5 KiB
8.5 KiB
Session 2026-05-11 — vreaudigital.ro
Sesiune extinsă post-Phase 5 UI merge. Pornit ca tick autonom, evoluat în 15 cicluri productive consecutive. Sha live la final: 7ca4aa4 (49 recipe, 17 systemd timers, 100% geocoding).
Cronologie cicluri
| Tick | Focus | Commit-uri | Highlights |
|---|---|---|---|
| Phase 5 (pre-tick) | G1-G2-G3-G4-G5 sub-agenți | 8 commits | 6 helper functions, 7 firma badges, 5 sections, 6 recipes, 3 investigative reports |
| Phase 5 merge | UI integration + commit cleanup | 2 commits | 57af3a6 + c1d90bf |
| Tick #1 | A1-A2-A3 sub-agenți (fixes/geocoding/completions) + A4-A5 (browse UIs) + S1 (refresh strategy) | 6 commits | Geocoding 91→100%, ASF cleanup, ANRE electricieni 0→73K, 2 new browse pages |
| Tick #1.5 | Disk cleanup + heartbeat monitoring | 4 commits | 89%→45% disk, heartbeat.sh + systemd timer (20 sources daily 07:00) |
| Tick #2 | 11 systemd timer pairs | 1 commit | Weekly + monthly timers for all scrape-*.sh wrappers |
| Tick #3 | Autoritate profile badges | 1 commit | 5 cross-source badges + getBugetarStatus helper |
| Tick #4 | Autoritate profile sections | 1 commit | 4 sections (ANAF/CNSC/Curtea Conturi/RegAS) — parity cu firma |
| Tick #5 | Bugetar UAT pattern match | 1 commit | +961 matches (58.3% → 63.4%), strip-parens insight |
| Tick #6 | Curteacont CUI backfill | 1 commit | 0% → 64.4% (+730 matches), prefix-bug data fix |
| Tick #7 | CNSC authority CUI backfill | 1 commit | 42% → 77.5% (+10,328 matches) — biggest single backfill |
| Tick #8 | SEAP DA wrapper + timer (was missing!) | 1 commit | Daily 02:30, 4h timeout for ~7-month catch-up |
| Tick #9 | Firma bugetar badge + recipe refactor | 2 commits | autoritati-audited-repetitiv: 5s → <500ms |
| Tick #10 | Recipe dubla-alerta-cdc-cnsc | 1 commit | 50 entități, MUNICIPIUL CONSTANTA top (93 semnale) |
| Tick #11 | Recipe donatori-datornici (moral hazard) | 1 commit | 360 firme — B&B BUSINESS 1:28,184 ratio |
| Tick #12 | Recipe energie-anre-datornici | 1 commit | 875 operatori — 3.14 mld RON debt agregat |
| Tick #13 | Red-flags landing 6→13 cards + 3 KPI tiles | 1 commit | Surfacing for the new investigative recipes |
| Tick #14 | Recipe donatori-contestatori (politic leverage) | 1 commit | 185 firme — SHERIFF GUARD 62 contestații vs 27K donatie |
| Tick #15 | Audit + this doc | 1 commit | System health verified, summary written |
Date statistici finale
CUI matching coverage
| Sursă | Pre-sesiune | Post-sesiune | Delta |
|---|---|---|---|
| firms.entities geocoding | 91.3% | 100.00% | +346,675 |
| ASF CUI clean | 51% | 100% | +412 cleaned |
| cnsc.decizii authority | 42% | 77.5% | +10,328 |
| curteacont.rapoarte | 0% | 64.4% | +730 |
| bugetar.entitate | 58.3% | 63.4% | +961 |
| cnas.furnizori | 0% | 9% | +3,255 (dirty data residue) |
Total date publice agregate
17 schemas integrate cross-source via CUI hub (firms.entities = 3.99M):
- ~17.9M rânduri date publice unice (per G3 audit)
- 75 contracte SEAP active acum vs 8 luni stale înainte (DA pipeline)
- 49 recipe pe /achizitii/retete (era 39 la start)
- 23 gotcha documentate în memory
Recipes shipped (Phase 5 + autonomous run)
| Slug | Source pair | Yield | Tier |
|---|---|---|---|
energie-fara-licenta |
SEAP ∖ ANRE | red-flags | T3 |
telco-fara-licenta |
SEAP ∖ ANCOM | red-flags | T3 |
autoritati-contestate-cnsc |
CNSC × SEAP | 4,192 autorities | T2 |
asiguratori-furnizori-stat |
ASF × SEAP | 63 firms | T4 |
stat-actionar-seap |
AAAS × SEAP | red-flags | T3 |
autoritati-audited-repetitiv |
Curtea × SEAP | red-flags | T4 |
autoritati-dubla-alerta-cdc-cnsc |
Curtea × CNSC | 50 | T2 |
donatori-politici-care-datoreaza-statului |
AEP × ANAF | 360 | T2 |
energie-licentiati-anre-datornici-anaf |
ANRE × ANAF | 875 | T2 |
donatori-politici-care-contesta-la-cnsc |
AEP × CNSC | 185 | T2 |
Top killer findings (jurnalistic-ready)
- B&B BUSINESS SOLUTIONS — 10K RON donat la partide vs 281.8 mil RON datorat ANAF (ratio 1:28,184)
- HIDROELECTRICA — 214M datorie ANAF + 4 licențe ANRE active (stat-stat circular)
- MUNICIPIUL CONSTANTA — 3 audituri Curtea Conturi + 90 contestații CNSC = 93 semnale convergente
- SHERIFF GUARD PROTECTION — 62 contestații CNSC vs 27K donatie (folosește calea juridică ca instrument principal)
- VICTOR CONSTRUCT — 670K donatie + 23 contestații + activ pe SEAP (combinație politico-juridica)
Infrastructure delivered
17 systemd timers active
| Cadence | Timer | Next fire |
|---|---|---|
| Daily 02:00 | anaf-daily | Tue 02:02 |
| Daily 02:30 | da (NEW) | Tue 02:32 |
| Daily 04:00 | mvs | Tue 04:04 |
| Daily 07:00 | heartbeat (NEW) | Tue 07:02 |
| Weekly Sun 01:00 | anre | Sun 01:06 |
| Weekly Mon 01:00 | ancom | Mon 01:00 |
| Weekly Tue 01:00 | asf | Tue 01:07 |
| Weekly Wed 01:00 | aaas | Wed 01:05 |
| Weekly Thu 01:00 | curteacont | Thu 01:06 |
| Weekly Fri 01:00 | gnm | Fri 01:00 |
| Weekly Sat 01:00 | cnsc | Sat 01:03 |
| Weekly Tue 03:00 | onrc-weekly | Tue 03:03 |
| Monthly 1st 03:00 | regas | Jun 1 03:06 |
| Monthly 1st 03:30 | aep-donatii | Jun 1 03:30 |
| Monthly 1st 05:00 | cnas | Jun 1 05:06 |
| Monthly 15th 03:00 | apia-fermieri | May 15 03:02 |
Heartbeat monitoring
- Probes 20 sources, posts to n8n satra-backup-alert webhook when STALE
- Currently 19/20 OK, 1 STALE: ani.declaratii (known unimplemented)
Disk
- 89% → 45% (156 GB freed via
docker builder prune -a -f+docker image prune -a -f)
Documents written
| Path | Author | Purpose |
|---|---|---|
chatGPT/data-quality/freshness-audit-2026-05-10.md |
G3 sub-agent | 17.9M row reconciliation + per-schema cadence |
chatGPT/data-quality/geocoding-strategy-2026-05-11.md |
A2 sub-agent | Fallback chain documentation |
chatGPT/data-quality/refresh-cadence-strategy-2026-05-11.md |
S1 sub-agent | Master cron schedule + 2captcha budget |
chatGPT/journalism/killer-findings-2026-05-10.md |
G4 sub-agent | 5 lead findings + 7 storylines |
chatGPT/journalism/sectorial-deep-dive-2026-05-10.md |
G5 sub-agent | ENERGIE/TELECOM/FINANCIAR analysis |
services/seap-scraper/HANDOFF-aaas-ordin-278.md |
A3 sub-agent | AAAS PDF backfill plan |
services/seap-scraper/HANDOFF-asf-other-registers.md |
A3 sub-agent | ASF pension/AIFM/UCITS plan |
services/seap-scraper/HANDOFF-cnas-layout-b.md |
A3 sub-agent | CNAS 9 PDFs layout-B parser plan |
services/seap-scraper/systemd/README.md |
tick #2 | Systemd unit install procedure |
| This doc | tick #15 | Session retrospective |
Reusable patterns discovered
1. Strip-parens + UAT-pattern (3-source proven)
ONRC stores comune/orașe with " (Primaria Y)" suffix. Stripping suffix and comparing normalized → exact match. Used for:
- bugetar (sql/039) → +961 matches în 1m 46s
- curteacont (sql/040 + 041) → +730 matches în <2 min
- cnsc (sql/042) → +10,328 matches în 1m 25s
2. Sub-agent isolation via dedicated helper files
G1 + G2 wrote separate profile-queries-utilities.ts + profile-queries-financial.ts to avoid merge conflicts. Pattern reusable for any parallel codegen task.
3. Cross-source RATIO mismatches surface real signal
- B&B: 10K donation vs 281M debt → 1:28,184 ratio = lever-amount mismatch
- SHERIFF GUARD: 27K donation vs 62 contestations → cheap-donation-buys-aggressive-juridical-strategy
Single-source counts are explained away by "volume mare". Cross-source ratios force a specific narrative.
Known limitations / next-session candidates
Critical (DR/observability)
- DB backup runs from root's crontab (NOT bulibasa's) — confirmed working but undocumented elsewhere
- Heartbeat hits n8n webhook but n8n routing for
service:"data-heartbeat"field not verified — first alert email needs validation
High-impact (3-15h each)
- CNSC Stage 2 PDF parse → decision_type (admis/respins) — unlocks killer recipe "autorități cu rată mare contestații pierdute"
- Curtea Conturi Stage 2 → findings_count + key amounts per audit
- CNAS layout-B parser (9 remaining PDFs)
- ASF pension funds + AIFM + UCITS register ingest
Medium-effort (4-8h)
- TED full re-import (publication-date backfill — fix shipped tick #1)
- normalize_company_name v2 for orthography (Cârlogani ↔ Cirlogani)
- ANRE 92.3% residue (commercial firms — need different match strategy)
Speculative
- 2captcha integration (~$60-100 one-shot for Bugetar Faza 2 + ANAF datornici quarterly refresh)
- ANI parser MVP (1.3M PDFs, 15-day effort)