a6c03a091e
Moved from gov-agreg/src/pages/achizitii/* to root (drop prefix). - 22 pages migrated, 127 files total - All internal links: /achizitii/X → /X (176 occurrences fixed) - AchizitiiLayout subnav rewritten: /X paths, top-right link to vreaudigital.ro hub - BaseLayout new (vreau.digital branding, OG tags, site URL) - astro.config.mjs: site https://vreau.digital, server output (was static) - docker-compose: port 5096 (vreaudigital is 5095), container vreau-digital - deploy.sh: paths /opt/vreau-digital, log /var/log/vreau-digital-deploy.log Backend shared with gov-agreg: - PostgreSQL satra (same schemas: seap, firms, anaf, anre, ...) - Photon, Martin tiles - Infisical /vreaudigital path (DATABASE_URL etc. shared) build: PASS (npx astro check 0 errors, npm run build 5s vite + 10s server)
4.3 KiB
4.3 KiB
Geocoding strategy — firms.entities
Data: 2026-05-11. Sub-agent A2.
Final coverage
| Source | Rows | Accuracy | Notes |
|---|---|---|---|
geonames_postal |
2,128,990 | ~100m–2km | Exact 5/6-digit RO postal match against geonames RO.zip (firms.postal_codes). |
photon |
839,643 | ~50–500m | Komoot Photon OSM geocoder, free-text adr_full. Earlier batch (services/seap-scraper/src/geocode-photon.ts). |
uat_centroid |
670,657 | 5–30km | UAT polygon centroid match by locality+county. |
judet_centroid |
346,675 | 30–150km | Median of all postal codes within the judet. Filled the 2026-05-11 gap where judet_fallback was tagged but lat/lng never written. |
seap_siruta_centroid |
4,681 | 5–30km | NEW stub rows for SEAP-only CUIs (not present in ONRC firme dataset) using SIRUTA → gis_uats centroid. |
seap_judet_centroid |
2,497 | 30–150km | NEW stub rows for SEAP-only CUIs with city/county data in seap.cui_location. |
| unmapped | 2 | — | Two firms with literally zero address fields. Out of reach. |
Total: 3,993,143 / 3,993,145 = 100.00 %.
Fallback chain (priority order)
For any new row entering firms.entities, apply in this order, stop at first hit:
- Postal-code exact match →
firms.postal_codes.postal_code = adr_cod_postal(5/6 digit). Source =geonames_postal. - Postal-code normalized (strip non-digit), same lookup. (Adds ~9K to the bucket — already covered in current dataset.)
- Photon free-text on
adr_full(OSM geocoder, requires network — see geocode-photon.ts). - UAT centroid by
(adr_localitate, adr_judet)→firms.postal_codesmedian of matching place_name + county_code, ORpublic.gis_uatspolygon centroid. - Judet centroid — median of all
firms.postal_codesrows for the normalized judet name (upper(unaccent(replace(adr_judet,'MUNICIPIUL ','')))). 42 distinct judet keys cover all of RO + București. - SIRUTA centroid — for SEAP-mentioned CUIs only, where firms.entities row didn't exist:
seap.announcements.{authority,supplier}_siruta→gis_uats.sirutacentroid (transformed 3844→4326). - City+county from seap.cui_location → judet centroid fallback (
seap_judet_centroid).
Authority / supplier coverage (downstream)
After backfill, JOIN-based coverage from SEAP:
| Bucket | Total distinct CUIs | Geocoded | Pct |
|---|---|---|---|
| authority_cui | 14,617 | 14,119 | 96.6 % |
| supplier_cui | 65,675 | 64,793 | 98.7 % |
Residual: 498 authorities + 882 suppliers (~1,373 unique) — these CUIs appear nowhere with address data (no siruta, no city/county in seap.cui_location, no usable address in any announcement). Most are malformed CUI strings (commas, semicolons, trailing punctuation) — should be cleaned up at SEAP ingestion. Out of scope for geocoding.
Cross-schema enrichment
aaas.firme— 11 rows total, all 11 have geocoded parent in firms.entities via CUI. No action needed; UI agents JOIN.anre.licente— 27,275 rows with titular_cui populated, 11,043 distinct. All 11,043 CUIs match a geocoded firm. UI agents JOIN onfirms.entities.cui = anre.licente.titular_cui.seap.announcements—supplier_address,authority_address,supplier_siruta,authority_sirutaare populated. After this batch, almost every announcement can render on a map via firms.entities lookup.
Geom integrity
firms.entities.geom(geography 4326) is now 1:1 with lat/lng (12,735 prior mismatches fixed where judet_fallback had stale geom from an older run).- 2 unmapped firms have NULL on both. PostGIS spatial indexes still valid.
Forward maintenance
- Anyone ingesting new firms (ANAF/ONRC weekly refresh) must apply the fallback chain in code before INSERT.
- The seap_siruta_centroid and seap_judet_centroid stubs should be upgraded the moment an ANAF/ONRC record arrives for the same CUI — re-run the chain with the real
adr_full. - If the SEAP CUI hygiene gets fixed (A1's domain), the 1,373 residual can be re-attempted.
judet_centroid(and the two seap variants) have onlygeocode_score = 0.1and0.3. UI clustering should down-weight or hide these at high zoom.
Queries used
All idempotent UPDATEs filtered on lat IS NULL. Centroid sources read from firms.postal_codes and public.gis_uats (SRID 3844 → 4326). Saved in-line in the agent transcript; the strategy itself is the artifact.