Files
vreau-digital/services/seap-scraper/HANDOFF-asf-other-registers.md
T
Claude VM a6c03a091e initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
2026-05-13 00:10:32 +03:00

75 lines
3.4 KiB
Markdown

# ASF other registers — handoff
State at 2026-05-11:
- `asf.entitati`: 849 entities (61 asigurator + 788 broker) — only the
`/scr/ra` insurance registry is ingested.
- ASF has additional registries (private pensions, capital markets,
secondary intermediaries, software providers, lecturers, etc.) at
separate pages — NOT exposed via the same `/scr/ra/cautare` JSON endpoint.
## Why deferred
Each register appears to use a different access pattern:
- `/scr/ra` (used by current scraper) — only insurance + brokers.
- Pension funds (Pilonul II/III) — no `/scr/` endpoint visible. Likely PDF
or static HTML on `asfromania.ro/ro/a/2365/...`.
- Capital markets entities — likely a different `/scr/...` path needs to
be discovered via browser-network-tab inspection.
Confirmation needed via interactive exploration (curl with realistic
Referer + Cookie, or browser dev-tools). Cannot be done blindly from
high-level webpages.
## Registries discovered (from `/ro/a/1544/registre-entitati-autorizate`)
### Insurance (Asigurări)
-`/scr/ra/cautare` — currently scraped (asigurator + broker).
-`/ro/a/2082/registrul-asigurătorilor-și-intermediarilor-din-see`
EEA insurers and intermediaries (likely overlap with main register).
-`/app.php/ro/a/1704/intermediari-secundari` — secondary intermediaries
(post-2019).
-`/ro/a/1997/intermediari-secundari---persoane-fizice` (pre-2019).
-`/ro/a/1998/intermediari-secundari---persoane-juridice` (pre-2019).
-`/ro/a/1999/specialisti-constatare-daune` — damage assessors.
-`/ro/a/2068/registrul-furnizorilor-de-programe-(activi)` — software
providers.
-`/ro/a/2067/registrul-lectorilor` — authorized lecturers.
### Capital Markets (Piață de capital)
-`/app.php/ro/a/1705/registrul-instrumentelor-si-investitiilor-financiare`
### Private Pensions (Pensii private)
-`/ro/a/2365/registrul-entitatilor-din-piata-pensiilor-private` — Pilonul
II + III administrators (SAFI), pension funds, fund managers.
## Recommended approach (~4-6h)
1. **Discovery phase (1h)**: open each `?` URL in browser, inspect Network
tab for actual data endpoints. Note: most are likely Drupal/Symfony
pages serving an embedded JSON or rendering an HTML table. Some may
only offer PDF download (need OCR/parsing).
2. **Per-register scraper (1-2h each)**:
- If it's a JSON endpoint similar to `/scr/ra/cautare`, clone the
scrape-asf.ts pattern with a new `register_type` value
(e.g., `pensie_administrator`, `intermediar_secundar`).
- If it's an HTML table, parse with cheerio.
- If it's a PDF, use pdftotext like CNAS.
3. **Schema**: `asf.entitati.register_type` is already a text column —
add new enum-like values without DDL.
4. **Volume estimate**:
- Pension funds: ~10 administrators (SAFI/SIF), ~20 funds.
- Capital markets: ~50-200 entities.
- Secondary intermediaries: ~3,000-10,000 individuals + firms.
- Lecturers: ~50.
- **Total ~3,500-10,300 new entities** if all done.
## Defer reason
Multi-day discovery + per-register scraper development. The 2-3h
single-candidate budget cannot accommodate even one full register
implementation without first doing the discovery for all of them.
Recommended next sub-agent: pick **secondary intermediaries** (largest
volume → 3-10k entities) as the first target, since the data shape
should mirror existing broker entries.