initial: split from gov-agreg — vreau.digital standalone platform

Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix).
- 22 pages migrated, 127 files total
- All internal links: /achizitii/X → /X (176 occurrences fixed)
- AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub
- BaseLayout new (vreau.digital branding, OG tags, site URL)
- astro.config.mjs: site https://vreau.digital, server output (was static)
- docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital
- deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log

Backend shared with gov-agreg:
- PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...)
- Photon, Martin tiles
- Infisical /vreaudigital path (DATABASE_URL etc. shared)

build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
This commit is contained in:
Claude VM
2026-05-13 00:10:32 +03:00
commit a6c03a091e
352 changed files with 75295 additions and 0 deletions
@@ -0,0 +1,62 @@
# Geocoding strategy — firms.entities
Data: 2026-05-11. Sub-agent A2.
## Final coverage
| Source | Rows | Accuracy | Notes |
|---|---:|---|---|
| `geonames_postal` | 2,128,990 | ~100m2km | Exact 5/6-digit RO postal match against geonames RO.zip (firms.postal_codes). |
| `photon` | 839,643 | ~50500m | Komoot Photon OSM geocoder, free-text `adr_full`. Earlier batch (services/seap-scraper/src/geocode-photon.ts). |
| `uat_centroid` | 670,657 | 530km | UAT polygon centroid match by locality+county. |
| `judet_centroid` | 346,675 | 30150km | Median of all postal codes within the judet. Filled the 2026-05-11 gap where `judet_fallback` was tagged but lat/lng never written. |
| `seap_siruta_centroid` | 4,681 | 530km | NEW stub rows for SEAP-only CUIs (not present in ONRC firme dataset) using SIRUTA → gis_uats centroid. |
| `seap_judet_centroid` | 2,497 | 30150km | NEW stub rows for SEAP-only CUIs with city/county data in seap.cui_location. |
| _unmapped_ | 2 | — | Two firms with literally zero address fields. Out of reach. |
**Total: 3,993,143 / 3,993,145 = 100.00 %.**
## Fallback chain (priority order)
For any new row entering firms.entities, apply in this order, stop at first hit:
1. **Postal-code exact match**`firms.postal_codes.postal_code = adr_cod_postal` (5/6 digit). Source = `geonames_postal`.
2. **Postal-code normalized** (strip non-digit), same lookup. (Adds ~9K to the bucket — already covered in current dataset.)
3. **Photon free-text** on `adr_full` (OSM geocoder, requires network — see geocode-photon.ts).
4. **UAT centroid** by `(adr_localitate, adr_judet)``firms.postal_codes` median of matching place_name + county_code, OR `public.gis_uats` polygon centroid.
5. **Judet centroid** — median of all `firms.postal_codes` rows for the normalized judet name (`upper(unaccent(replace(adr_judet,'MUNICIPIUL ','')))`). 42 distinct judet keys cover all of RO + București.
6. **SIRUTA centroid** — for SEAP-mentioned CUIs only, where firms.entities row didn't exist: `seap.announcements.{authority,supplier}_siruta``gis_uats.siruta` centroid (transformed 3844→4326).
7. **City+county from seap.cui_location** → judet centroid fallback (`seap_judet_centroid`).
## Authority / supplier coverage (downstream)
After backfill, JOIN-based coverage from SEAP:
| Bucket | Total distinct CUIs | Geocoded | Pct |
|---|---:|---:|---:|
| authority_cui | 14,617 | 14,119 | 96.6 % |
| supplier_cui | 65,675 | 64,793 | 98.7 % |
Residual: 498 authorities + 882 suppliers (~1,373 unique) — these CUIs appear nowhere with address data (no siruta, no city/county in seap.cui_location, no usable address in any announcement). Most are malformed CUI strings (commas, semicolons, trailing punctuation) — should be cleaned up at SEAP ingestion. Out of scope for geocoding.
## Cross-schema enrichment
- `aaas.firme` — 11 rows total, all 11 have geocoded parent in firms.entities via CUI. No action needed; UI agents JOIN.
- `anre.licente` — 27,275 rows with titular_cui populated, 11,043 distinct. All 11,043 CUIs match a geocoded firm. UI agents JOIN on `firms.entities.cui = anre.licente.titular_cui`.
- `seap.announcements``supplier_address`, `authority_address`, `supplier_siruta`, `authority_siruta` are populated. After this batch, almost every announcement can render on a map via firms.entities lookup.
## Geom integrity
- `firms.entities.geom` (geography 4326) is now 1:1 with lat/lng (12,735 prior mismatches fixed where judet_fallback had stale geom from an older run).
- 2 unmapped firms have NULL on both. PostGIS spatial indexes still valid.
## Forward maintenance
1. Anyone ingesting new firms (ANAF/ONRC weekly refresh) must apply the fallback chain in code before INSERT.
2. The seap_siruta_centroid and seap_judet_centroid stubs should be **upgraded** the moment an ANAF/ONRC record arrives for the same CUI — re-run the chain with the real `adr_full`.
3. If the SEAP CUI hygiene gets fixed (A1's domain), the 1,373 residual can be re-attempted.
4. `judet_centroid` (and the two seap variants) have only `geocode_score = 0.1` and `0.3`. UI clustering should down-weight or hide these at high zoom.
## Queries used
All idempotent UPDATEs filtered on `lat IS NULL`. Centroid sources read from `firms.postal_codes` and `public.gis_uats` (SRID 3844 → 4326). Saved in-line in the agent transcript; the strategy itself is the artifact.