initial: split from gov-agreg — vreau.digital standalone platform
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix). - 22 pages migrated, 127 files total - All internal links: /achizitii/X → /X (176 occurrences fixed) - AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub - BaseLayout new (vreau.digital branding, OG tags, site URL) - astro.config.mjs: site https://vreau.digital, server output (was static) - docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital - deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log Backend shared with gov-agreg: - PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...) - Photon, Martin tiles - Infisical /vreaudigital path (DATABASE_URL etc. shared) build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
This commit is contained in:
@@ -0,0 +1,74 @@
|
||||
# ASF other registers — handoff
|
||||
|
||||
State at 2026-05-11:
|
||||
- `asf.entitati`: 849 entities (61 asigurator + 788 broker) — only the
|
||||
`/scr/ra` insurance registry is ingested.
|
||||
- ASF has additional registries (private pensions, capital markets,
|
||||
secondary intermediaries, software providers, lecturers, etc.) at
|
||||
separate pages — NOT exposed via the same `/scr/ra/cautare` JSON endpoint.
|
||||
|
||||
## Why deferred
|
||||
|
||||
Each register appears to use a different access pattern:
|
||||
- `/scr/ra` (used by current scraper) — only insurance + brokers.
|
||||
- Pension funds (Pilonul II/III) — no `/scr/` endpoint visible. Likely PDF
|
||||
or static HTML on `asfromania.ro/ro/a/2365/...`.
|
||||
- Capital markets entities — likely a different `/scr/...` path needs to
|
||||
be discovered via browser-network-tab inspection.
|
||||
|
||||
Confirmation needed via interactive exploration (curl with realistic
|
||||
Referer + Cookie, or browser dev-tools). Cannot be done blindly from
|
||||
high-level webpages.
|
||||
|
||||
## Registries discovered (from `/ro/a/1544/registre-entitati-autorizate`)
|
||||
|
||||
### Insurance (Asigurări)
|
||||
- ✅ `/scr/ra/cautare` — currently scraped (asigurator + broker).
|
||||
- ❓ `/ro/a/2082/registrul-asigurătorilor-și-intermediarilor-din-see` —
|
||||
EEA insurers and intermediaries (likely overlap with main register).
|
||||
- ❓ `/app.php/ro/a/1704/intermediari-secundari` — secondary intermediaries
|
||||
(post-2019).
|
||||
- ❓ `/ro/a/1997/intermediari-secundari---persoane-fizice` (pre-2019).
|
||||
- ❓ `/ro/a/1998/intermediari-secundari---persoane-juridice` (pre-2019).
|
||||
- ❓ `/ro/a/1999/specialisti-constatare-daune` — damage assessors.
|
||||
- ❓ `/ro/a/2068/registrul-furnizorilor-de-programe-(activi)` — software
|
||||
providers.
|
||||
- ❓ `/ro/a/2067/registrul-lectorilor` — authorized lecturers.
|
||||
|
||||
### Capital Markets (Piață de capital)
|
||||
- ❓ `/app.php/ro/a/1705/registrul-instrumentelor-si-investitiilor-financiare`
|
||||
|
||||
### Private Pensions (Pensii private)
|
||||
- ❓ `/ro/a/2365/registrul-entitatilor-din-piata-pensiilor-private` — Pilonul
|
||||
II + III administrators (SAFI), pension funds, fund managers.
|
||||
|
||||
## Recommended approach (~4-6h)
|
||||
|
||||
1. **Discovery phase (1h)**: open each `?` URL in browser, inspect Network
|
||||
tab for actual data endpoints. Note: most are likely Drupal/Symfony
|
||||
pages serving an embedded JSON or rendering an HTML table. Some may
|
||||
only offer PDF download (need OCR/parsing).
|
||||
2. **Per-register scraper (1-2h each)**:
|
||||
- If it's a JSON endpoint similar to `/scr/ra/cautare`, clone the
|
||||
scrape-asf.ts pattern with a new `register_type` value
|
||||
(e.g., `pensie_administrator`, `intermediar_secundar`).
|
||||
- If it's an HTML table, parse with cheerio.
|
||||
- If it's a PDF, use pdftotext like CNAS.
|
||||
3. **Schema**: `asf.entitati.register_type` is already a text column —
|
||||
add new enum-like values without DDL.
|
||||
4. **Volume estimate**:
|
||||
- Pension funds: ~10 administrators (SAFI/SIF), ~20 funds.
|
||||
- Capital markets: ~50-200 entities.
|
||||
- Secondary intermediaries: ~3,000-10,000 individuals + firms.
|
||||
- Lecturers: ~50.
|
||||
- **Total ~3,500-10,300 new entities** if all done.
|
||||
|
||||
## Defer reason
|
||||
|
||||
Multi-day discovery + per-register scraper development. The 2-3h
|
||||
single-candidate budget cannot accommodate even one full register
|
||||
implementation without first doing the discovery for all of them.
|
||||
|
||||
Recommended next sub-agent: pick **secondary intermediaries** (largest
|
||||
volume → 3-10k entities) as the first target, since the data shape
|
||||
should mirror existing broker entries.
|
||||
Reference in New Issue
Block a user