initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix). - 22 pages migrated, 127 files total - All internal links: /achizitii/X → /X (176 occurrences fixed) - AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub - BaseLayout new (vreau.digital branding, OG tags, site URL) - astro.config.mjs: site https://vreau.digital, server output (was static) - docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital - deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log Backend shared with gov-agreg: - PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...) - Photon, Martin tiles - Infisical /vreaudigital path (DATABASE_URL etc. shared) build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
This commit is contained in:
@@ -0,0 +1,151 @@
|
||||
# Session 2026-05-11 — vreaudigital.ro
|
||||
|
||||
Sesiune extinsă post-Phase 5 UI merge. Pornit ca tick autonom, evoluat în 15 cicluri productive consecutive. Sha live la final: **`7ca4aa4`** (49 recipe, 17 systemd timers, 100% geocoding).
|
||||
|
||||
## Cronologie cicluri
|
||||
|
||||
| Tick | Focus | Commit-uri | Highlights |
|
||||
|---:|---|---|---|
|
||||
| Phase 5 (pre-tick) | G1-G2-G3-G4-G5 sub-agenți | 8 commits | 6 helper functions, 7 firma badges, 5 sections, 6 recipes, 3 investigative reports |
|
||||
| Phase 5 merge | UI integration + commit cleanup | 2 commits | `57af3a6` + `c1d90bf` |
|
||||
| Tick #1 | A1-A2-A3 sub-agenți (fixes/geocoding/completions) + A4-A5 (browse UIs) + S1 (refresh strategy) | 6 commits | Geocoding 91→100%, ASF cleanup, ANRE electricieni 0→73K, 2 new browse pages |
|
||||
| Tick #1.5 | Disk cleanup + heartbeat monitoring | 4 commits | 89%→45% disk, heartbeat.sh + systemd timer (20 sources daily 07:00) |
|
||||
| Tick #2 | 11 systemd timer pairs | 1 commit | Weekly + monthly timers for all scrape-*.sh wrappers |
|
||||
| Tick #3 | Autoritate profile badges | 1 commit | 5 cross-source badges + getBugetarStatus helper |
|
||||
| Tick #4 | Autoritate profile sections | 1 commit | 4 sections (ANAF/CNSC/Curtea Conturi/RegAS) — parity cu firma |
|
||||
| Tick #5 | Bugetar UAT pattern match | 1 commit | +961 matches (58.3% → 63.4%), strip-parens insight |
|
||||
| Tick #6 | Curteacont CUI backfill | 1 commit | 0% → 64.4% (+730 matches), prefix-bug data fix |
|
||||
| Tick #7 | CNSC authority CUI backfill | 1 commit | 42% → 77.5% (+10,328 matches) — biggest single backfill |
|
||||
| Tick #8 | SEAP DA wrapper + timer (was missing!) | 1 commit | Daily 02:30, 4h timeout for ~7-month catch-up |
|
||||
| Tick #9 | Firma bugetar badge + recipe refactor | 2 commits | autoritati-audited-repetitiv: 5s → <500ms |
|
||||
| Tick #10 | Recipe dubla-alerta-cdc-cnsc | 1 commit | 50 entități, MUNICIPIUL CONSTANTA top (93 semnale) |
|
||||
| Tick #11 | Recipe donatori-datornici (moral hazard) | 1 commit | 360 firme — B&B BUSINESS 1:28,184 ratio |
|
||||
| Tick #12 | Recipe energie-anre-datornici | 1 commit | 875 operatori — 3.14 mld RON debt agregat |
|
||||
| Tick #13 | Red-flags landing 6→13 cards + 3 KPI tiles | 1 commit | Surfacing for the new investigative recipes |
|
||||
| Tick #14 | Recipe donatori-contestatori (politic leverage) | 1 commit | 185 firme — SHERIFF GUARD 62 contestații vs 27K donatie |
|
||||
| Tick #15 | Audit + this doc | 1 commit | System health verified, summary written |
|
||||
|
||||
## Date statistici finale
|
||||
|
||||
### CUI matching coverage
|
||||
| Sursă | Pre-sesiune | Post-sesiune | Delta |
|
||||
|---|---:|---:|---:|
|
||||
| firms.entities geocoding | 91.3% | **100.00%** | +346,675 |
|
||||
| ASF CUI clean | 51% | **100%** | +412 cleaned |
|
||||
| cnsc.decizii authority | 42% | **77.5%** | +10,328 |
|
||||
| curteacont.rapoarte | 0% | **64.4%** | +730 |
|
||||
| bugetar.entitate | 58.3% | **63.4%** | +961 |
|
||||
| cnas.furnizori | 0% | 9% | +3,255 (dirty data residue) |
|
||||
|
||||
### Total date publice agregate
|
||||
17 schemas integrate cross-source via CUI hub (firms.entities = 3.99M):
|
||||
- **~17.9M rânduri** date publice unice (per G3 audit)
|
||||
- **75 contracte SEAP** active acum vs 8 luni stale înainte (DA pipeline)
|
||||
- **49 recipe** pe /achizitii/retete (era 39 la start)
|
||||
- **23 gotcha** documentate în memory
|
||||
|
||||
## Recipes shipped (Phase 5 + autonomous run)
|
||||
|
||||
| Slug | Source pair | Yield | Tier |
|
||||
|---|---|---:|---|
|
||||
| `energie-fara-licenta` | SEAP ∖ ANRE | red-flags | T3 |
|
||||
| `telco-fara-licenta` | SEAP ∖ ANCOM | red-flags | T3 |
|
||||
| `autoritati-contestate-cnsc` | CNSC × SEAP | 4,192 autorities | T2 |
|
||||
| `asiguratori-furnizori-stat` | ASF × SEAP | 63 firms | T4 |
|
||||
| `stat-actionar-seap` | AAAS × SEAP | red-flags | T3 |
|
||||
| `autoritati-audited-repetitiv` | Curtea × SEAP | red-flags | T4 |
|
||||
| `autoritati-dubla-alerta-cdc-cnsc` | Curtea × CNSC | **50** | T2 |
|
||||
| `donatori-politici-care-datoreaza-statului` | AEP × ANAF | **360** | T2 |
|
||||
| `energie-licentiati-anre-datornici-anaf` | ANRE × ANAF | **875** | T2 |
|
||||
| `donatori-politici-care-contesta-la-cnsc` | AEP × CNSC | **185** | T2 |
|
||||
|
||||
## Top killer findings (jurnalistic-ready)
|
||||
|
||||
1. **B&B BUSINESS SOLUTIONS** — 10K RON donat la partide vs **281.8 mil RON datorat ANAF** (ratio 1:28,184)
|
||||
2. **HIDROELECTRICA** — 214M datorie ANAF + 4 licențe ANRE active (stat-stat circular)
|
||||
3. **MUNICIPIUL CONSTANTA** — 3 audituri Curtea Conturi + 90 contestații CNSC = 93 semnale convergente
|
||||
4. **SHERIFF GUARD PROTECTION** — 62 contestații CNSC vs 27K donatie (folosește calea juridică ca instrument principal)
|
||||
5. **VICTOR CONSTRUCT** — 670K donatie + 23 contestații + activ pe SEAP (combinație politico-juridica)
|
||||
|
||||
## Infrastructure delivered
|
||||
|
||||
### 17 systemd timers active
|
||||
|
||||
| Cadence | Timer | Next fire |
|
||||
|---|---|---|
|
||||
| Daily 02:00 | anaf-daily | Tue 02:02 |
|
||||
| Daily 02:30 | **da (NEW)** | Tue 02:32 |
|
||||
| Daily 04:00 | mvs | Tue 04:04 |
|
||||
| Daily 07:00 | **heartbeat (NEW)** | Tue 07:02 |
|
||||
| Weekly Sun 01:00 | anre | Sun 01:06 |
|
||||
| Weekly Mon 01:00 | ancom | Mon 01:00 |
|
||||
| Weekly Tue 01:00 | asf | Tue 01:07 |
|
||||
| Weekly Wed 01:00 | aaas | Wed 01:05 |
|
||||
| Weekly Thu 01:00 | curteacont | Thu 01:06 |
|
||||
| Weekly Fri 01:00 | gnm | Fri 01:00 |
|
||||
| Weekly Sat 01:00 | cnsc | Sat 01:03 |
|
||||
| Weekly Tue 03:00 | onrc-weekly | Tue 03:03 |
|
||||
| Monthly 1st 03:00 | regas | Jun 1 03:06 |
|
||||
| Monthly 1st 03:30 | aep-donatii | Jun 1 03:30 |
|
||||
| Monthly 1st 05:00 | cnas | Jun 1 05:06 |
|
||||
| Monthly 15th 03:00 | apia-fermieri | May 15 03:02 |
|
||||
|
||||
### Heartbeat monitoring
|
||||
- Probes 20 sources, posts to n8n satra-backup-alert webhook when STALE
|
||||
- Currently 19/20 OK, 1 STALE: ani.declaratii (known unimplemented)
|
||||
|
||||
### Disk
|
||||
- 89% → 45% (156 GB freed via `docker builder prune -a -f` + `docker image prune -a -f`)
|
||||
|
||||
## Documents written
|
||||
|
||||
| Path | Author | Purpose |
|
||||
|---|---|---|
|
||||
| `chatGPT/data-quality/freshness-audit-2026-05-10.md` | G3 sub-agent | 17.9M row reconciliation + per-schema cadence |
|
||||
| `chatGPT/data-quality/geocoding-strategy-2026-05-11.md` | A2 sub-agent | Fallback chain documentation |
|
||||
| `chatGPT/data-quality/refresh-cadence-strategy-2026-05-11.md` | S1 sub-agent | Master cron schedule + 2captcha budget |
|
||||
| `chatGPT/journalism/killer-findings-2026-05-10.md` | G4 sub-agent | 5 lead findings + 7 storylines |
|
||||
| `chatGPT/journalism/sectorial-deep-dive-2026-05-10.md` | G5 sub-agent | ENERGIE/TELECOM/FINANCIAR analysis |
|
||||
| `services/seap-scraper/HANDOFF-aaas-ordin-278.md` | A3 sub-agent | AAAS PDF backfill plan |
|
||||
| `services/seap-scraper/HANDOFF-asf-other-registers.md` | A3 sub-agent | ASF pension/AIFM/UCITS plan |
|
||||
| `services/seap-scraper/HANDOFF-cnas-layout-b.md` | A3 sub-agent | CNAS 9 PDFs layout-B parser plan |
|
||||
| `services/seap-scraper/systemd/README.md` | tick #2 | Systemd unit install procedure |
|
||||
| **This doc** | tick #15 | Session retrospective |
|
||||
|
||||
## Reusable patterns discovered
|
||||
|
||||
### 1. Strip-parens + UAT-pattern (3-source proven)
|
||||
ONRC stores comune/orașe with " (Primaria Y)" suffix. Stripping suffix and comparing normalized → exact match. Used for:
|
||||
- bugetar (sql/039) → +961 matches în 1m 46s
|
||||
- curteacont (sql/040 + 041) → +730 matches în <2 min
|
||||
- cnsc (sql/042) → +10,328 matches în 1m 25s
|
||||
|
||||
### 2. Sub-agent isolation via dedicated helper files
|
||||
G1 + G2 wrote separate `profile-queries-utilities.ts` + `profile-queries-financial.ts` to avoid merge conflicts. Pattern reusable for any parallel codegen task.
|
||||
|
||||
### 3. Cross-source RATIO mismatches surface real signal
|
||||
- B&B: 10K donation vs 281M debt → 1:28,184 ratio = lever-amount mismatch
|
||||
- SHERIFF GUARD: 27K donation vs 62 contestations → cheap-donation-buys-aggressive-juridical-strategy
|
||||
|
||||
Single-source counts are explained away by "volume mare". Cross-source ratios force a specific narrative.
|
||||
|
||||
## Known limitations / next-session candidates
|
||||
|
||||
### Critical (DR/observability)
|
||||
- DB backup runs from root's crontab (NOT bulibasa's) — confirmed working but undocumented elsewhere
|
||||
- Heartbeat hits n8n webhook but n8n routing for `service:"data-heartbeat"` field not verified — first alert email needs validation
|
||||
|
||||
### High-impact (3-15h each)
|
||||
- CNSC Stage 2 PDF parse → decision_type (admis/respins) — unlocks killer recipe "autorități cu rată mare contestații pierdute"
|
||||
- Curtea Conturi Stage 2 → findings_count + key amounts per audit
|
||||
- CNAS layout-B parser (9 remaining PDFs)
|
||||
- ASF pension funds + AIFM + UCITS register ingest
|
||||
|
||||
### Medium-effort (4-8h)
|
||||
- TED full re-import (publication-date backfill — fix shipped tick #1)
|
||||
- normalize_company_name v2 for orthography (Cârlogani ↔ Cirlogani)
|
||||
- ANRE 92.3% residue (commercial firms — need different match strategy)
|
||||
|
||||
### Speculative
|
||||
- 2captcha integration (~$60-100 one-shot for Bugetar Faza 2 + ANAF datornici quarterly refresh)
|
||||
- ANI parser MVP (1.3M PDFs, 15-day effort)
|
||||
Reference in New Issue
Block a user