Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix). - 22 pages migrated, 127 files total - All internal links: /achizitii/X → /X (176 occurrences fixed) - AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub - BaseLayout new (vreau.digital branding, OG tags, site URL) - astro.config.mjs: site https://vreau.digital, server output (was static) - docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital - deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log Backend shared with gov-agreg: - PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...) - Photon, Martin tiles - Infisical /vreaudigital path (DATABASE_URL etc. shared) build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
10 KiB
CNAS — Casa Națională de Asigurări de Sănătate — Ingest Plan
Lista furnizorilor de servicii medicale aflați în relație contractuală cu CAS-urile județene.
v1 status (2026-05-10)
Schema applied: services/seap-scraper/sql/031_cnas.sql (3 tables + 1 MV)
Scraper: services/seap-scraper/src/scrape-cnas.ts
Wrapper: services/seap-scraper/cron/scrape-cnas.sh
First-pass yield: 36,183 rows / 12,392 distinct provider names from 46 PDFs successfully parsed (61 furnizor PDFs registered, 14 with non-tabular layout).
What v1 captures
The CNAS WordPress media library at cnas.ro/wp-content/uploads/ exposes ~70-90 furnizor-related PDFs (CAS Bihor, CAS Bacău, CAS Gorj, CAS Arad upload most heavily; rest of counties don't use this central library). Discoverable via cnas.ro/wp-json/wp/v2/media REST API (no auth, no rate limit).
Working categories with >100 rows extracted:
medicina_dentara— 361 rows from FURNIZORI-IN-CONTRACT-AMBULATORIU-DE-SPECIALITATE-MEDICINA-DENTARA-2024medicina_familie— 488 rows total (mostly CAS Bihor)dispozitive_medicale— 268 rowsfarmacie— 119 rowsambulatoriu_clinic— 99 rowsrecuperare_medicala— 61 rows- 4,300+ rows each from 7 historical 2022 "Nr-furnizori-testare" PDFs (national snapshots, ~10K distinct lines)
Investigation findings
The CNAS source ecosystem is mid-migration between 3 layers:
- NEW —
cas.cnas.ro/casXX(Angular SPA, 42 county sub-instances). Uses Blazor admin/api at/admin/api/{home-content,menu-items,provider-map,pharmacy-report,dental-report,…}. Routes viaX-Instance-KeyHTTP header. As of 2026-05, all data endpoints return[]or 500 — the migration hasn't loaded provider lists yet. Watch script (see Phase 2 below) recommended. - CENTRAL —
cnas.ro/wp-content/uploads/(WordPress media library). 4,180 files total, ~70 furnizor PDFs. THIS IS WHAT v1 INGESTS. Updated weekly-ish. - OLD —
www.cnas.ro/casXX/page/lista-furnizori-*.html(pre-migration WP). All 301-redirect to dead stubs oncnas.ro/casXX/. Effectively removed. Archived content recoverable via Wayback CDX (web.archive.org/cdx/search/cdx?url=cas.cnas.ro/casXX&matchType=domain).
Phase 2 — Improve parser (effort: 2-3h)
Parser misses ~25% of files due to non-tabular layouts. Fixes needed:
"no_table" failures (14 files)
These have valid data but unusual layouts:
| File | Issue | Approach |
|---|---|---|
Lista-furnizori-testare-genetica-2024-2025_all.pdf (4 pages) |
First column is "Casa de asigurări" (judet header), nr_crt is implicit | Per-page re-parse: detect judet headers (BIHOR, CLUJ), assign to all rows below until next header |
Lista-furnizori-tumori-solide-maligne-martie-2025.pdf (1 page) |
Same as above — judet-grouped | Same |
Lista-furnizori-radioterapie-2024.pdf |
Same | Same |
Lista-furnizori-testare-hematologie-maligna-2024.pdf |
Same | Same |
FURNIZORI-INGRIJIRI-PALIATIVE-INCEPAND-CU-01.07.2023-2.pdf |
Header row says "Bacau" — county is in header, not column. Plus row#1 leading on the right column | Detect "CAS \w+" or "JUDET" in header text; skip first 5 lines; rows start with bare number followed by [A-Z] |
FURNIZORI-MEDICINA-DENTARA-LA-29-11-2024.pdf |
Multi-column page layout (2 columns side-by-side) | Use pdftotext -table instead of -layout, OR split page mid-x via pdftotext -x ... -W ... |
FURNIZORI-stomato-in-contract-la-1-noiembrie-2024.pdf |
Same as above | Same |
Valori-de-contract-furnizori-PNS-13.11.2024.pdf |
"Valori" files have name + sum, not provider lists | Reclassify or skip via filename regex Valori- |
CAS-GORJ-Lista-furnizori-in-contract-PNS-01.01.2024.pdf |
PDF text is image-based (scanned) — pdftotext returns empty | Add OCR via tesseract: pdftotext if empty → tesseract -l ron |
2024_SITE_FURNIZORI-SERVICII-PARACLINICE-09.2024.xlsx |
XLSX format unsupported | Add xlsx parsing via xlsx npm package or gnumeric ssconvert to CSV |
Drop-in fixes that recover 80% of these in <1h:
- Reclassify
Valori-filenames asparse_status='not_provider_list'(skip). - Detect
LISTA FURNIZORILOR ... CASA ... DE SANATATE A JUDETULUI [A-Z]+header at top of page → set document.judet from header. - Add per-page judet detection for testare-genetica-style files.
- Handle 2-column-per-page layouts by running
pdftotext -W $((width/2))twice with different-x.
"other" tip cleanup (34K rows)
The 7 "Nr-furnizori-testare" 2022 PDFs were each parsed at ~4,300 lines each — many of those rows are duplicates of the same providers plus some garbage (e.g. name="SRL", empty sediu). These dominate the dataset. Two options:
Option A (recommended): Mark these documents as parse_status='superseded' since 2024-2025 lists cover the same providers. Cuts dataset to ~1,900 high-quality rows.
Option B: Deduplicate by name+email post-ingest into a cnas.furnizori_clean table.
Phase 3 — Per-county SPA harvest (effort: 4-6h, deferred)
Once cas.cnas.ro/casXX data goes live (no clear timeline; check monthly):
// poc-cas-cnas-watch.ts
for (const judet of ['casmb', 'cascj', 'casbn', /* 42 total */]) {
const r = await fetch(`https://cas.cnas.ro/admin/api/home-content`, {
headers: { 'X-Instance-Key': judet }
});
// Currently always returns: {"data":null,"message":"Sequence contains no elements.","isSucces":false}
// When this turns into a real payload, the SPA will have working endpoints.
}
Confirmed working endpoints (return JSON when populated):
admin/api/home-content(header:X-Instance-Key: <slug>)admin/api/menu-itemsadmin/api/get-content?slug=<page-slug>admin/api/get-pages/<slug>(page tree)public/api/provider-map,public/api/pharmacy-report,public/api/dental-report,public/api/paraclinic-report,public/api/recuperare-report(per-tip plurals — pagination via?skip=&take=)
Phase 4 — CUI matching (effort: 1-2h)
Mirror match-cui-anre.sh pattern. CNAS provider names are messy (CMI prefixes, doctor titles, abbreviated SRL etc.). Strategy:
// services/seap-scraper/src/match-cui-cnas.ts
// 1. UPDATE cnas.furnizori SET name_norm = firms.normalize_company_name(name)
// 2. Try exact match: WHERE firms.entities.name_norm = cnas.furnizori.name_norm
// 3. Try trgm fuzzy with judet constraint (when judet known)
// 4. Mark cui_match_method ('exact_norm' | 'trgm_judet' | 'trgm_unique' | 'unmatched')
Expected match rate: 50-70% for SRL/SA-form providers; 5-15% for CMI (cabinete medicale individuale, often unregistered firms).
Phase 5 — Cross-source recipes (drafted SQL)
Recipe 1: "Furnizori medicali CNAS care apar și ca furnizori SEAP la CPV 33.* / 85.*"
WITH cnas_cui AS (
SELECT DISTINCT cui FROM cnas.furnizori WHERE cui IS NOT NULL
),
seap_med AS (
SELECT DISTINCT a.supplier_cui AS cui, COUNT(*) AS nr_castiguri,
SUM(a.value_eur) AS total_eur
FROM seap.announcements a
WHERE (a.cpv_code LIKE '33%' OR a.cpv_code LIKE '85%')
AND a.supplier_cui IS NOT NULL
GROUP BY a.supplier_cui
)
SELECT c.cui, e.name, sm.nr_castiguri, sm.total_eur,
array_agg(DISTINCT cf.tip_serviciu) AS tipuri_cnas
FROM cnas_cui c
JOIN seap_med sm USING (cui)
JOIN firms.entities e ON e.cui = c.cui
JOIN cnas.furnizori cf USING (cui)
GROUP BY c.cui, e.name, sm.nr_castiguri, sm.total_eur
ORDER BY sm.total_eur DESC NULLS LAST
LIMIT 100;
Recipe 2: "Spitale CNAS care au datorii ANAF" — red flag
SELECT DISTINCT
cf.cui, e.name, cf.judet,
cf.tip_serviciu,
ad.sume_datorate_buget_general_consolidat AS datorii_total
FROM cnas.furnizori cf
JOIN firms.entities e ON e.cui = cf.cui
JOIN anaf_datornici.datornic ad ON ad.cui = cf.cui
WHERE cf.tip_serviciu IN ('spital','clinic','ambulatoriu_clinic')
AND ad.sume_datorate_buget_general_consolidat > 100000
ORDER BY datorii_total DESC;
Recipe 3: "Furnizori CNAS care primesc fonduri EU (POIM-Sănătate)" — EU-linked
SELECT DISTINCT
cf.cui, e.name, cf.tip_serviciu,
fp.titlu_proiect, fp.valoare_totala_eligibila
FROM cnas.furnizori cf
JOIN firms.entities e ON e.cui = cf.cui
JOIN fonduri.proiect_v2 fp ON fp.beneficiar_cui = cf.cui
WHERE fp.titlu_proiect ILIKE '%sanatate%' OR fp.programul_operational ILIKE '%POIM%'
ORDER BY fp.valoare_totala_eligibila DESC;
Recipe 4: "Spitale CNAS cu zero contracte SEAP" — anomaly
Hospitals contracted with state insurance but never appearing as SEAP suppliers/buyers:
SELECT cf.cui, e.name, cf.judet
FROM cnas.furnizori cf
JOIN firms.entities e ON e.cui = cf.cui
WHERE cf.tip_serviciu = 'spital'
AND NOT EXISTS (
SELECT 1 FROM seap.announcements a
WHERE a.supplier_cui = cf.cui OR a.buyer_cui = cf.cui
)
ORDER BY e.name;
Operational
# Smoke (5 docs, ~30s)
sudo LIMIT=5 /opt/vreaudigital/services/seap-scraper/cron/scrape-cnas.sh
# Full ingest (61 docs, ~3 min, idempotent)
sudo /opt/vreaudigital/services/seap-scraper/cron/scrape-cnas.sh
# Just refresh document catalog without re-parsing
sudo MODE=metadata-only /opt/vreaudigital/services/seap-scraper/cron/scrape-cnas.sh
# Re-parse existing pending/failed only
sudo MODE=parse-only /opt/vreaudigital/services/seap-scraper/cron/scrape-cnas.sh
# Cron suggested: weekly (CNAS uploads ~5-15 files/month)
# 0 5 * * 1 root /opt/vreaudigital/services/seap-scraper/cron/scrape-cnas.sh
Remaining county sites — handoff list
When cas.cnas.ro/casXX SPA goes live, all 42 sub-instances follow the same URL pattern:
casab Alba casdj Dolj casnt Neamt
casag Argeș casgj Gorj casot Olt
casar Arad casgl Galați casph Prahova
casbc Bacău casgr Giurgiu cassb Sibiu
casbh Bihor cashd Hunedoara cassj Sălaj
casbn Bistrița-N. cashr Harghita cassv Suceava
casbr Brăila casif Ilfov casts Teleorman ?
casbt Botoșani casil Ialomița castl Tulcea
casbv Brașov casis Iași castm Timiș
casbz Buzău casmb București castr Teleorman ?
cascj Cluj casmh Mehedinți casvl Vâlcea
cascl Călărași casmm Maramureș casvn Vrancea
cascs Caraș-Severin casms Mureș casvs Vaslui
casct Constanța cassam Satu Mare casaopsnaj (Apărare/Ord. publică)
cascv Covasna
casdb Dâmbovița
Total: 43 sub-sites including casaopsnaj. v1 ingests 0 of these directly (relies on central WP catalog only).