Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix). - 22 pages migrated, 127 files total - All internal links: /achizitii/X → /X (176 occurrences fixed) - AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub - BaseLayout new (vreau.digital branding, OG tags, site URL) - astro.config.mjs: site https://vreau.digital, server output (was static) - docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital - deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log Backend shared with gov-agreg: - PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...) - Photon, Martin tiles - Infisical /vreaudigital path (DATABASE_URL etc. shared) build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
4.2 KiB
CNAS Phase 2 — Layout B parser handoff
State at 2026-05-11 (after C4 partial fix):
- 14 PDFs were stuck at
parse_status='no_table'. - Commit
bfa0b69relaxed thenr_crtregex from\s{2,}to\s+(guarded by a Romanian capital letter). This recovers ~3-5 of the 14 PDFs that use Layout A (numbered rows). - The remaining ~9-11 PDFs use Layout B (judet-grouped, no row numbers) and need a separate parser path that this handoff describes.
Layout B specimens
Tested via pdftotext -layout:
| ID | URL | Tip | Rows visible |
|---|---|---|---|
| 1 | Lista-furnizori-testare-genetica-2024-2025_all.pdf |
testare_genetica | ~15 |
| 2 | Lista-furnizori-tumori-solide-maligne-martie-2025.pdf |
oncologie | ~15 |
| 14 | Valori-de-contract-furnizori-PNS-13.11.2024.pdf |
pns | unknown |
| 15 | CAS-GORJ-Lista-furnizori-in-contract-PNS-01.01.2024.pdf |
pns | small (single CAS) |
| 44 | Valori-de-contract-pentru-furnizorii-de-servicii-medicale-de-consultatiii-de-urgenta-… |
urgenta_transport | unknown |
| 46 | FURNIZORI-SERVICII-ASISTENTA-MEDICALA-PRIMARA-ADMISI-IN-SESIUNEA-CONTRACTARE-NOV-2024-PENTRU-SITE-1.pdf |
medicina_familie | unknown |
| 56 | Lista-furnizori-radioterapie-2024.pdf |
radioterapie | small |
| 57 | Lista-furnizori-testare-hematologie-maligna-2024.pdf |
oncologie | small |
| 58 | Lista-furnizori-tumori-solide-maligne-2024.pdf |
oncologie | small |
Layout B shape (sample from testare_genetica)
BIHOR
SC Resident Laboratory SRL Oradea, Str.… email phone DA
CLUJ
Institutul Oncologic … Cluj-Napoca… email phone DA DA DA
Centrul Medical Unirea S.R.L Punct de lucru… email phone DA DA DA
BUCUREȘTI
Personal Genetics SRL București sector 1… email phone DA
Key signals:
- Single-word ALL-CAPS judet on its own line (left-aligned, ~4-12 chars).
- Provider rows are indented to a fixed column (~20 chars left margin).
- Multi-line addresses with continuation rows.
- Trailing DA/NU columns indicate which test panel / service the furnizor is contracted for (varies by PDF type — sometimes 1 column, sometimes 7+).
Recommended approach (~3-5h)
- Add a 2nd parser
parseProviderTextJudetGrouped(text, hints)invoked only whenparseProviderTextreturns 0 rows ANDtip_serviciu IN ('oncologie','testare_genetica','radioterapie','pns','medicina_familie'). - State machine: track
currentJudet; when a line matches^\s+([A-ZĂÂÎȘȚ]{3,15})\s*$(also accept variants likeBUCUREŞTI/BUCURESTI), updatecurrentJudet. When the next line is indented and non-empty, treat it as the start of a row. - Row assembly: gather lines until next judet header, next blank-line
block, or next provider name (heuristic: line starts with capital +
doesn't start with
Str./Mun./sector/nr./ city name). - Column extraction: split by
\s{3,}like the existing parser, but know that col 0 = name, col 1 = address, col 2 = email, col 3 = phone, cols 4+ = DA/NU flags. Capture flags into aspecialitateJSON field (would need a schema migration if we want to keep them structured) or collapse into a comma-separated text inspecialitate. - Judet override: when judet is detected from PDF body, override the filename-derived judet in cnas.furnizori per-row.
Schema-change consideration
To preserve the DA/NU flag matrix, add a specialitate_jsonb column to
cnas.furnizori (or reuse the existing specialitate text column with a
serialized string like "panel_1:DA,panel_2:DA,panel_3:NU"). Existing
column suffices for v1 if we encode as text.
Testing
Cache the 9-11 PDFs locally (/tmp/cnas-pdfs/) and run the parser
unit-style. For each PDF, the expected row count is roughly the number of
@gmail|yahoo|ro|com email-pattern hits in the body (15-50 per PDF on
average → estimated total: 200-500 additional providers).
Defer reason
3-5h of work for an estimated 200-500 rows (~10% of current cnas.furnizori size, which is 36k). Lower ROI than the WSP timezone fix (restores daily cron entirely) or ANRE electricieni (zero → ~101k rows).