28c870fb12
- cartCount tracks actual cart rows (decrement only on confirmed delete) so a failed cleanup delete can't trigger a false dirty-cart abort. - docs/plans/006: the multi-tenant CF-service architecture (DB-backed fulfiller, account pool, catalog dedup, per-tenant credential model, reversible flag flip) — the executable next phase. The Phase-F flag flip is gated on the orchestrator fulfiller existing (Plan 003 Faza F was wrong). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
144 lines
8.5 KiB
Markdown
144 lines
8.5 KiB
Markdown
# Plan 006 — ePay CF-Extract as a Multi-Tenant Service
|
||
|
||
**Status:** design (executable). **Author:** deep-dive 2026-06-04 + hardening 2026-06-05.
|
||
**Prereq reading:** plans 002/003 (architools thin-client cutover), `project_epay_cf_roadmap_2026_06` memory.
|
||
|
||
## Why this exists
|
||
|
||
The CF-extract capability (ANCPI ePay paid extracts + the free `cf-intern` copycf
|
||
circuit) will be offered beyond internal use — multi-tenant (ArchiTools, eterra-live,
|
||
Planhub, external paying customers). Today it runs as an **in-process queue inside
|
||
ArchiTools** (`src/modules/parcel-sync/services/epay-*`). That path is now hardened
|
||
(commit `f49fdb1`: cart hygiene, auth/IDOR gates, single-page fetch, parallel
|
||
downloads, recover-by-extractId) and is **billing-safe and correct for the internal
|
||
tool** — but it is the wrong *shape* for a service:
|
||
|
||
- Queue + ePay session are in-memory globals → die on redeploy mid-batch.
|
||
- One serial cart per process → no multi-tenant throughput.
|
||
- No catalog dedup on the paid path → the same parcel is paid for repeatedly.
|
||
- `EPAY_ORDERING_VIA_GIS_AC=false` because **gis-api `POST /enrichment/cf` inserts a
|
||
pending row that nothing fulfills** — the orchestrator has no ePay worker. Plan 003
|
||
Faza F ("endpoints already exist") is wrong: the fulfiller is the unwritten keystone.
|
||
|
||
This plan is the path from the hardened-internal state to a real service. Each phase
|
||
is independently shippable; do them in order, validate, then flip the flag per-tenant.
|
||
|
||
## Invariants carried over from the hardened internal path (do NOT regress)
|
||
|
||
These were learned the hard way (2026-06-04 incident, order 10009605). The new worker
|
||
MUST preserve every one:
|
||
|
||
1. **Submit is timeout-resilient.** A slow `EditCartSubmit` that ANCPI completes must
|
||
never be marked failed. Resolve the order via `findNewOrderId(previous, known)` which
|
||
never adopts a stale/known id. (`SUBMIT_TIMEOUT_MS`, today's fix.)
|
||
2. **Cart hygiene invariant.** ePay has ONE global cart per account; `EditCartSubmit`
|
||
checks out everything in it. After N adds a clean cart reports exactly N items — any
|
||
excess = orphan from a crash → wipe + abort, never submit a cart you didn't build.
|
||
3. **CF-number matching is authoritative; index fallback is `review`, not `completed`.**
|
||
4. **`%PDF` magic-byte check** on every download (expired session returns login HTML).
|
||
5. **Single-page order fetch** via `itemsPerPage` (5/page default silently drops docs).
|
||
6. **Recover is idempotent** (re-poll + re-download an already-paid order, no new charge).
|
||
|
||
## Phase A — DB-backed fulfiller worker (`eterra.cf-epay`) — THE KEYSTONE
|
||
|
||
A pg-boss worker in `gis-sync-orchestrator` (next to `enrichment-drainer`, cron 1–2 min).
|
||
**The CfExtract row IS the work item** — no in-memory queue.
|
||
|
||
- **Claim:** `SELECT … FROM gis_enrichment."CfExtract" WHERE status='pending' AND
|
||
type='epay' [AND account-compatible] ORDER BY "createdAt" FOR UPDATE SKIP LOCKED LIMIT N;
|
||
UPDATE → status='claimed', claimedAt=now()`. SKIP LOCKED → two instances never grab the
|
||
same rows.
|
||
- **State machine** (each transition = one UPDATE = a precise resumable marker):
|
||
`pending → claimed → cart → submitted_unconfirmed → polling → downloading → completed |
|
||
review | failed | cancelled`. Extend gis-api's `ExtractStatus` enum
|
||
(`gis-api/src/routes/enrichment.ts:9`) with `claimed`, `submitted_unconfirmed`, `review`.
|
||
- **Crash recovery:** a boot **reaper** requeues rows stuck in a transitional state past a
|
||
heartbeat TTL. `submitted_unconfirmed` rows are resolved via the recover pattern (find
|
||
the order at ANCPI, never re-charge). This structurally eliminates the in-memory-queue
|
||
orphan class (criticals C2).
|
||
- **Idempotent submit:** before `EditCartSubmit`, persist on the claimed rows the account's
|
||
current latest orderId + the intended `nrCadastral` set. On timeout/crash, resume
|
||
re-runs `findNewOrderId` against that snapshot — never adopts a stale id.
|
||
- Port the hardened `epay-client` here (see Phase G — shared package).
|
||
|
||
## Phase B — `epay_accounts` pool with one-batch-per-account lock
|
||
|
||
Mirror `gis_meta.eterra_accounts` (busc-infra migration 004): AES-256-GCM creds,
|
||
`status active/blocked/retired`, `blocked_reason`, `credits_cached`, optional hourly cap,
|
||
`in_flight_batch_id`.
|
||
|
||
- `pickEpayAccount`: `FOR UPDATE SKIP LOCKED`, but because ePay's cart is **global per
|
||
account**, atomically set `in_flight_batch_id` (status `busy`) so no second batch can
|
||
touch that account's cart. This is the structural fix for cart contamination (C1) in the
|
||
pooled world.
|
||
- Refuse to claim a batch larger than the account's cached credits. ePay credits are a
|
||
**hard consumable (real money)** — unlike the soft eTerra quota, the credit cap is
|
||
mandatory, not advisory.
|
||
|
||
## Phase C — Catalog dedup (largest recurring economic win)
|
||
|
||
`CfExtractCatalog` is written **only** on the `cf-intern` path today; nothing writes it
|
||
when a paid ePay order completes → a paid extract by tenant A is never "fresh" for tenant
|
||
B, so the 30-day money-saver is structurally unrealized.
|
||
|
||
- On ePay completion, `upsert CfExtractCatalog(nrCadastral, latestId,
|
||
expiresAt=documentDate+30d, isFresh=true)`.
|
||
- `POST /enrichment/cf/claim {nrCadastral}`: on a catalog hit, create a B-owned row
|
||
`type='catalog', status='completed', creditsUsed=0` pointing at the shared MinIO object
|
||
(or a copy). This turns today's 409 `catalog_hit` (`enrichment.ts:226`) into **instant,
|
||
free fulfillment**. RLS unchanged (B reads B's row). One paid extract serves every tenant
|
||
that needs that parcel within 30 days, at marginal zero ANCPI cost.
|
||
|
||
## Phase D — Credential model (tenant-policy-driven)
|
||
|
||
Store the strategy per-tenant; don't pick one globally:
|
||
|
||
- **Internal Beletage group** → pooled company accounts (Infisical, encrypted in
|
||
`epay_accounts`). Best batching + catalog sharing; per-credit attribution via audit.
|
||
- **External paying tenants** (eterra-live model) → dedicated per-tenant accounts so
|
||
credits/billing stay clean.
|
||
- Record `account_id` + `creditsUsed` on every `CfExtract` for attribution regardless.
|
||
- All three apps converge as thin callers of `POST /enrichment/cf` (Authentik multi-issuer
|
||
+ tenant claim already in place, `gis-api/src/lib/auth.ts`). Reuse eterra-live
|
||
`crypto.ts` (AES-256-GCM) + a 1-byte key-version prefix for rotation.
|
||
|
||
## Phase E — gis-api gaps for async consumption
|
||
|
||
1. **Completion webhook/SSE**, tenant-scoped + RLS-filtered (`GET /enrichment/cf/events`)
|
||
→ kills polling and the dead-Brevo dependency.
|
||
2. **Bulk-zip** `GET /enrichment/cf/zip?orderId=` streaming from MinIO (port the V3
|
||
streaming-zip approach).
|
||
3. `ExtractStatus` enum additions (see Phase A).
|
||
4. List filters `creditsUsed=0` / `type='catalog'` so the UI can label shared extracts.
|
||
|
||
## Phase F — Reversible migration, per-tenant flip
|
||
|
||
- **Phase 0 (now):** `EPAY_ORDERING_VIA_GIS_AC=false`, hardened legacy queue is the sole
|
||
fulfiller. `/api/ancpi/recover` stays as the manual safety net.
|
||
- **Phase 1:** deploy worker + pool + catalog-write; seed `epay_accounts` with ONLY the
|
||
Beletage account; flip the flag for `claims.tenant === 'architools'`.
|
||
- **Phase 2:** run both paths in parallel a grace window; reconcile on orderId (no double
|
||
charge).
|
||
- **Phase 3:** onboard external tenants with dedicated accounts; delete `epay-queue` /
|
||
`epay-client` / `epay-session-store` + `src/app/api/ancpi/*` from ArchiTools. The flag is
|
||
the kill-switch throughout.
|
||
|
||
## Phase G — Shared `epay-client` package (do regardless of phase)
|
||
|
||
ArchiTools and eterra-live each have a near-identical `epay-client.ts` that has **already
|
||
diverged dangerously**: ArchiTools got today's fixes; eterra-live got the method-internal
|
||
ports (commit `eterra-live d30128b`) but lacks cart-hygiene + the per-page parser refactor.
|
||
Extract `@beletage/epay-client` (natural home: `gis-sync-orchestrator`, which owns the
|
||
account pool) so a fix lands once. Until then, any epay-client change MUST be mirrored to
|
||
both repos in the same change.
|
||
|
||
## Known follow-ups not yet done
|
||
|
||
- eterra-live still lacks the cart-hygiene `numberOfItems` invariant (single-order flow
|
||
makes it lower-risk, but a crashed prior order can still orphan a cart row). Needs a
|
||
route-level touch + testing on that product before shipping.
|
||
- `BREVO_API_KEY` returns 401 "Key not found" → ArchiTools email notifications are dead;
|
||
the correct fix is the Phase E webhook, not patching Brevo. SMTP relay creds still work.
|
||
- ArchiTools `auth-options.ts` has pre-existing `react-hooks/rules-of-hooks` lint errors on
|
||
the `useGisAcFlag`/`useBasicPanelFlag` session calls (tolerated by `next build`).
|