- cartCount tracks actual cart rows (decrement only on confirmed delete) so a failed cleanup delete can't trigger a false dirty-cart abort. - docs/plans/006: the multi-tenant CF-service architecture (DB-backed fulfiller, account pool, catalog dedup, per-tenant credential model, reversible flag flip) — the executable next phase. The Phase-F flag flip is gated on the orchestrator fulfiller existing (Plan 003 Faza F was wrong). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8.5 KiB
Plan 006 — ePay CF-Extract as a Multi-Tenant Service
Status: design (executable). Author: deep-dive 2026-06-04 + hardening 2026-06-05.
Prereq reading: plans 002/003 (architools thin-client cutover), project_epay_cf_roadmap_2026_06 memory.
Why this exists
The CF-extract capability (ANCPI ePay paid extracts + the free cf-intern copycf
circuit) will be offered beyond internal use — multi-tenant (ArchiTools, eterra-live,
Planhub, external paying customers). Today it runs as an in-process queue inside
ArchiTools (src/modules/parcel-sync/services/epay-*). That path is now hardened
(commit f49fdb1: cart hygiene, auth/IDOR gates, single-page fetch, parallel
downloads, recover-by-extractId) and is billing-safe and correct for the internal
tool — but it is the wrong shape for a service:
- Queue + ePay session are in-memory globals → die on redeploy mid-batch.
- One serial cart per process → no multi-tenant throughput.
- No catalog dedup on the paid path → the same parcel is paid for repeatedly.
EPAY_ORDERING_VIA_GIS_AC=falsebecause gis-apiPOST /enrichment/cfinserts a pending row that nothing fulfills — the orchestrator has no ePay worker. Plan 003 Faza F ("endpoints already exist") is wrong: the fulfiller is the unwritten keystone.
This plan is the path from the hardened-internal state to a real service. Each phase is independently shippable; do them in order, validate, then flip the flag per-tenant.
Invariants carried over from the hardened internal path (do NOT regress)
These were learned the hard way (2026-06-04 incident, order 10009605). The new worker MUST preserve every one:
- Submit is timeout-resilient. A slow
EditCartSubmitthat ANCPI completes must never be marked failed. Resolve the order viafindNewOrderId(previous, known)which never adopts a stale/known id. (SUBMIT_TIMEOUT_MS, today's fix.) - Cart hygiene invariant. ePay has ONE global cart per account;
EditCartSubmitchecks out everything in it. After N adds a clean cart reports exactly N items — any excess = orphan from a crash → wipe + abort, never submit a cart you didn't build. - CF-number matching is authoritative; index fallback is
review, notcompleted. %PDFmagic-byte check on every download (expired session returns login HTML).- Single-page order fetch via
itemsPerPage(5/page default silently drops docs). - Recover is idempotent (re-poll + re-download an already-paid order, no new charge).
Phase A — DB-backed fulfiller worker (eterra.cf-epay) — THE KEYSTONE
A pg-boss worker in gis-sync-orchestrator (next to enrichment-drainer, cron 1–2 min).
The CfExtract row IS the work item — no in-memory queue.
- Claim:
SELECT … FROM gis_enrichment."CfExtract" WHERE status='pending' AND type='epay' [AND account-compatible] ORDER BY "createdAt" FOR UPDATE SKIP LOCKED LIMIT N; UPDATE → status='claimed', claimedAt=now(). SKIP LOCKED → two instances never grab the same rows. - State machine (each transition = one UPDATE = a precise resumable marker):
pending → claimed → cart → submitted_unconfirmed → polling → downloading → completed | review | failed | cancelled. Extend gis-api'sExtractStatusenum (gis-api/src/routes/enrichment.ts:9) withclaimed,submitted_unconfirmed,review. - Crash recovery: a boot reaper requeues rows stuck in a transitional state past a
heartbeat TTL.
submitted_unconfirmedrows are resolved via the recover pattern (find the order at ANCPI, never re-charge). This structurally eliminates the in-memory-queue orphan class (criticals C2). - Idempotent submit: before
EditCartSubmit, persist on the claimed rows the account's current latest orderId + the intendednrCadastralset. On timeout/crash, resume re-runsfindNewOrderIdagainst that snapshot — never adopts a stale id. - Port the hardened
epay-clienthere (see Phase G — shared package).
Phase B — epay_accounts pool with one-batch-per-account lock
Mirror gis_meta.eterra_accounts (busc-infra migration 004): AES-256-GCM creds,
status active/blocked/retired, blocked_reason, credits_cached, optional hourly cap,
in_flight_batch_id.
pickEpayAccount:FOR UPDATE SKIP LOCKED, but because ePay's cart is global per account, atomically setin_flight_batch_id(statusbusy) so no second batch can touch that account's cart. This is the structural fix for cart contamination (C1) in the pooled world.- Refuse to claim a batch larger than the account's cached credits. ePay credits are a hard consumable (real money) — unlike the soft eTerra quota, the credit cap is mandatory, not advisory.
Phase C — Catalog dedup (largest recurring economic win)
CfExtractCatalog is written only on the cf-intern path today; nothing writes it
when a paid ePay order completes → a paid extract by tenant A is never "fresh" for tenant
B, so the 30-day money-saver is structurally unrealized.
- On ePay completion,
upsert CfExtractCatalog(nrCadastral, latestId, expiresAt=documentDate+30d, isFresh=true). POST /enrichment/cf/claim {nrCadastral}: on a catalog hit, create a B-owned rowtype='catalog', status='completed', creditsUsed=0pointing at the shared MinIO object (or a copy). This turns today's 409catalog_hit(enrichment.ts:226) into instant, free fulfillment. RLS unchanged (B reads B's row). One paid extract serves every tenant that needs that parcel within 30 days, at marginal zero ANCPI cost.
Phase D — Credential model (tenant-policy-driven)
Store the strategy per-tenant; don't pick one globally:
- Internal Beletage group → pooled company accounts (Infisical, encrypted in
epay_accounts). Best batching + catalog sharing; per-credit attribution via audit. - External paying tenants (eterra-live model) → dedicated per-tenant accounts so credits/billing stay clean.
- Record
account_id+creditsUsedon everyCfExtractfor attribution regardless. - All three apps converge as thin callers of
POST /enrichment/cf(Authentik multi-issuer- tenant claim already in place,
gis-api/src/lib/auth.ts). Reuse eterra-livecrypto.ts(AES-256-GCM) + a 1-byte key-version prefix for rotation.
- tenant claim already in place,
Phase E — gis-api gaps for async consumption
- Completion webhook/SSE, tenant-scoped + RLS-filtered (
GET /enrichment/cf/events) → kills polling and the dead-Brevo dependency. - Bulk-zip
GET /enrichment/cf/zip?orderId=streaming from MinIO (port the V3 streaming-zip approach). ExtractStatusenum additions (see Phase A).- List filters
creditsUsed=0/type='catalog'so the UI can label shared extracts.
Phase F — Reversible migration, per-tenant flip
- Phase 0 (now):
EPAY_ORDERING_VIA_GIS_AC=false, hardened legacy queue is the sole fulfiller./api/ancpi/recoverstays as the manual safety net. - Phase 1: deploy worker + pool + catalog-write; seed
epay_accountswith ONLY the Beletage account; flip the flag forclaims.tenant === 'architools'. - Phase 2: run both paths in parallel a grace window; reconcile on orderId (no double charge).
- Phase 3: onboard external tenants with dedicated accounts; delete
epay-queue/epay-client/epay-session-store+src/app/api/ancpi/*from ArchiTools. The flag is the kill-switch throughout.
Phase G — Shared epay-client package (do regardless of phase)
ArchiTools and eterra-live each have a near-identical epay-client.ts that has already
diverged dangerously: ArchiTools got today's fixes; eterra-live got the method-internal
ports (commit eterra-live d30128b) but lacks cart-hygiene + the per-page parser refactor.
Extract @beletage/epay-client (natural home: gis-sync-orchestrator, which owns the
account pool) so a fix lands once. Until then, any epay-client change MUST be mirrored to
both repos in the same change.
Known follow-ups not yet done
- eterra-live still lacks the cart-hygiene
numberOfItemsinvariant (single-order flow makes it lower-risk, but a crashed prior order can still orphan a cart row). Needs a route-level touch + testing on that product before shipping. BREVO_API_KEYreturns 401 "Key not found" → ArchiTools email notifications are dead; the correct fix is the Phase E webhook, not patching Brevo. SMTP relay creds still work.- ArchiTools
auth-options.tshas pre-existingreact-hooks/rules-of-hookslint errors on theuseGisAcFlag/useBasicPanelFlagsession calls (tolerated bynext build).