Files
vreau-digital/services/seap-scraper/HANDOFF-asf-other-registers.md
T
Claude VM a6c03a091e initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
2026-05-13 00:10:32 +03:00

3.4 KiB

ASF other registers — handoff

State at 2026-05-11:

  • asf.entitati: 849 entities (61 asigurator + 788 broker) — only the /scr/ra insurance registry is ingested.
  • ASF has additional registries (private pensions, capital markets, secondary intermediaries, software providers, lecturers, etc.) at separate pages — NOT exposed via the same /scr/ra/cautare JSON endpoint.

Why deferred

Each register appears to use a different access pattern:

  • /scr/ra (used by current scraper) — only insurance + brokers.
  • Pension funds (Pilonul II/III) — no /scr/ endpoint visible. Likely PDF or static HTML on asfromania.ro/ro/a/2365/....
  • Capital markets entities — likely a different /scr/... path needs to be discovered via browser-network-tab inspection.

Confirmation needed via interactive exploration (curl with realistic Referer + Cookie, or browser dev-tools). Cannot be done blindly from high-level webpages.

Registries discovered (from /ro/a/1544/registre-entitati-autorizate)

Insurance (Asigurări)

  • /scr/ra/cautare — currently scraped (asigurator + broker).
  • /ro/a/2082/registrul-asigurătorilor-și-intermediarilor-din-see — EEA insurers and intermediaries (likely overlap with main register).
  • /app.php/ro/a/1704/intermediari-secundari — secondary intermediaries (post-2019).
  • /ro/a/1997/intermediari-secundari---persoane-fizice (pre-2019).
  • /ro/a/1998/intermediari-secundari---persoane-juridice (pre-2019).
  • /ro/a/1999/specialisti-constatare-daune — damage assessors.
  • /ro/a/2068/registrul-furnizorilor-de-programe-(activi) — software providers.
  • /ro/a/2067/registrul-lectorilor — authorized lecturers.

Capital Markets (Piață de capital)

  • /app.php/ro/a/1705/registrul-instrumentelor-si-investitiilor-financiare

Private Pensions (Pensii private)

  • /ro/a/2365/registrul-entitatilor-din-piata-pensiilor-private — Pilonul II + III administrators (SAFI), pension funds, fund managers.
  1. Discovery phase (1h): open each ? URL in browser, inspect Network tab for actual data endpoints. Note: most are likely Drupal/Symfony pages serving an embedded JSON or rendering an HTML table. Some may only offer PDF download (need OCR/parsing).
  2. Per-register scraper (1-2h each):
    • If it's a JSON endpoint similar to /scr/ra/cautare, clone the scrape-asf.ts pattern with a new register_type value (e.g., pensie_administrator, intermediar_secundar).
    • If it's an HTML table, parse with cheerio.
    • If it's a PDF, use pdftotext like CNAS.
  3. Schema: asf.entitati.register_type is already a text column — add new enum-like values without DDL.
  4. Volume estimate:
    • Pension funds: ~10 administrators (SAFI/SIF), ~20 funds.
    • Capital markets: ~50-200 entities.
    • Secondary intermediaries: ~3,000-10,000 individuals + firms.
    • Lecturers: ~50.
    • Total ~3,500-10,300 new entities if all done.

Defer reason

Multi-day discovery + per-register scraper development. The 2-3h single-candidate budget cannot accommodate even one full register implementation without first doing the discovery for all of them.

Recommended next sub-agent: pick secondary intermediaries (largest volume → 3-10k entities) as the first target, since the data shape should mirror existing broker entries.