Files
Claude VM a6c03a091e initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
2026-05-13 00:10:32 +03:00

4.2 KiB

CNAS Phase 2 — Layout B parser handoff

State at 2026-05-11 (after C4 partial fix):

  • 14 PDFs were stuck at parse_status='no_table'.
  • Commit bfa0b69 relaxed the nr_crt regex from \s{2,} to \s+ (guarded by a Romanian capital letter). This recovers ~3-5 of the 14 PDFs that use Layout A (numbered rows).
  • The remaining ~9-11 PDFs use Layout B (judet-grouped, no row numbers) and need a separate parser path that this handoff describes.

Layout B specimens

Tested via pdftotext -layout:

ID URL Tip Rows visible
1 Lista-furnizori-testare-genetica-2024-2025_all.pdf testare_genetica ~15
2 Lista-furnizori-tumori-solide-maligne-martie-2025.pdf oncologie ~15
14 Valori-de-contract-furnizori-PNS-13.11.2024.pdf pns unknown
15 CAS-GORJ-Lista-furnizori-in-contract-PNS-01.01.2024.pdf pns small (single CAS)
44 Valori-de-contract-pentru-furnizorii-de-servicii-medicale-de-consultatiii-de-urgenta-… urgenta_transport unknown
46 FURNIZORI-SERVICII-ASISTENTA-MEDICALA-PRIMARA-ADMISI-IN-SESIUNEA-CONTRACTARE-NOV-2024-PENTRU-SITE-1.pdf medicina_familie unknown
56 Lista-furnizori-radioterapie-2024.pdf radioterapie small
57 Lista-furnizori-testare-hematologie-maligna-2024.pdf oncologie small
58 Lista-furnizori-tumori-solide-maligne-2024.pdf oncologie small

Layout B shape (sample from testare_genetica)

    BIHOR
                    SC Resident Laboratory SRL   Oradea, Str.…       email   phone   DA
    CLUJ
                    Institutul Oncologic …       Cluj-Napoca…        email   phone   DA  DA  DA
                    Centrul Medical Unirea S.R.L Punct de lucru…     email   phone   DA  DA  DA
    BUCUREȘTI
                    Personal Genetics SRL        București sector 1… email   phone   DA

Key signals:

  • Single-word ALL-CAPS judet on its own line (left-aligned, ~4-12 chars).
  • Provider rows are indented to a fixed column (~20 chars left margin).
  • Multi-line addresses with continuation rows.
  • Trailing DA/NU columns indicate which test panel / service the furnizor is contracted for (varies by PDF type — sometimes 1 column, sometimes 7+).
  1. Add a 2nd parser parseProviderTextJudetGrouped(text, hints) invoked only when parseProviderText returns 0 rows AND tip_serviciu IN ('oncologie','testare_genetica','radioterapie','pns','medicina_familie').
  2. State machine: track currentJudet; when a line matches ^\s+([A-ZĂÂÎȘȚ]{3,15})\s*$ (also accept variants like BUCUREŞTI / BUCURESTI), update currentJudet. When the next line is indented and non-empty, treat it as the start of a row.
  3. Row assembly: gather lines until next judet header, next blank-line block, or next provider name (heuristic: line starts with capital + doesn't start with Str. / Mun. / sector / nr. / city name).
  4. Column extraction: split by \s{3,} like the existing parser, but know that col 0 = name, col 1 = address, col 2 = email, col 3 = phone, cols 4+ = DA/NU flags. Capture flags into a specialitate JSON field (would need a schema migration if we want to keep them structured) or collapse into a comma-separated text in specialitate.
  5. Judet override: when judet is detected from PDF body, override the filename-derived judet in cnas.furnizori per-row.

Schema-change consideration

To preserve the DA/NU flag matrix, add a specialitate_jsonb column to cnas.furnizori (or reuse the existing specialitate text column with a serialized string like "panel_1:DA,panel_2:DA,panel_3:NU"). Existing column suffices for v1 if we encode as text.

Testing

Cache the 9-11 PDFs locally (/tmp/cnas-pdfs/) and run the parser unit-style. For each PDF, the expected row count is roughly the number of @gmail|yahoo|ro|com email-pattern hits in the body (15-50 per PDF on average → estimated total: 200-500 additional providers).

Defer reason

3-5h of work for an estimated 200-500 rows (~10% of current cnas.furnizori size, which is 36k). Lower ROI than the WSP timezone fix (restores daily cron entirely) or ANRE electricieni (zero → ~101k rows).