# Plan 006 — ePay CF-Extract as a Multi-Tenant Service **Status:** design (executable). **Author:** deep-dive 2026-06-04 + hardening 2026-06-05. **Prereq reading:** plans 002/003 (architools thin-client cutover), `project_epay_cf_roadmap_2026_06` memory. ## Why this exists The CF-extract capability (ANCPI ePay paid extracts + the free `cf-intern` copycf circuit) will be offered beyond internal use — multi-tenant (ArchiTools, eterra-live, Planhub, external paying customers). Today it runs as an **in-process queue inside ArchiTools** (`src/modules/parcel-sync/services/epay-*`). That path is now hardened (commit `f49fdb1`: cart hygiene, auth/IDOR gates, single-page fetch, parallel downloads, recover-by-extractId) and is **billing-safe and correct for the internal tool** — but it is the wrong *shape* for a service: - Queue + ePay session are in-memory globals → die on redeploy mid-batch. - One serial cart per process → no multi-tenant throughput. - No catalog dedup on the paid path → the same parcel is paid for repeatedly. - `EPAY_ORDERING_VIA_GIS_AC=false` because **gis-api `POST /enrichment/cf` inserts a pending row that nothing fulfills** — the orchestrator has no ePay worker. Plan 003 Faza F ("endpoints already exist") is wrong: the fulfiller is the unwritten keystone. This plan is the path from the hardened-internal state to a real service. Each phase is independently shippable; do them in order, validate, then flip the flag per-tenant. ## Invariants carried over from the hardened internal path (do NOT regress) These were learned the hard way (2026-06-04 incident, order 10009605). The new worker MUST preserve every one: 1. **Submit is timeout-resilient.** A slow `EditCartSubmit` that ANCPI completes must never be marked failed. Resolve the order via `findNewOrderId(previous, known)` which never adopts a stale/known id. (`SUBMIT_TIMEOUT_MS`, today's fix.) 2. **Cart hygiene invariant.** ePay has ONE global cart per account; `EditCartSubmit` checks out everything in it. After N adds a clean cart reports exactly N items — any excess = orphan from a crash → wipe + abort, never submit a cart you didn't build. 3. **CF-number matching is authoritative; index fallback is `review`, not `completed`.** 4. **`%PDF` magic-byte check** on every download (expired session returns login HTML). 5. **Single-page order fetch** via `itemsPerPage` (5/page default silently drops docs). 6. **Recover is idempotent** (re-poll + re-download an already-paid order, no new charge). ## Phase A — DB-backed fulfiller worker (`eterra.cf-epay`) — THE KEYSTONE A pg-boss worker in `gis-sync-orchestrator` (next to `enrichment-drainer`, cron 1–2 min). **The CfExtract row IS the work item** — no in-memory queue. - **Claim:** `SELECT … FROM gis_enrichment."CfExtract" WHERE status='pending' AND type='epay' [AND account-compatible] ORDER BY "createdAt" FOR UPDATE SKIP LOCKED LIMIT N; UPDATE → status='claimed', claimedAt=now()`. SKIP LOCKED → two instances never grab the same rows. - **State machine** (each transition = one UPDATE = a precise resumable marker): `pending → claimed → cart → submitted_unconfirmed → polling → downloading → completed | review | failed | cancelled`. Extend gis-api's `ExtractStatus` enum (`gis-api/src/routes/enrichment.ts:9`) with `claimed`, `submitted_unconfirmed`, `review`. - **Crash recovery:** a boot **reaper** requeues rows stuck in a transitional state past a heartbeat TTL. `submitted_unconfirmed` rows are resolved via the recover pattern (find the order at ANCPI, never re-charge). This structurally eliminates the in-memory-queue orphan class (criticals C2). - **Idempotent submit:** before `EditCartSubmit`, persist on the claimed rows the account's current latest orderId + the intended `nrCadastral` set. On timeout/crash, resume re-runs `findNewOrderId` against that snapshot — never adopts a stale id. - Port the hardened `epay-client` here (see Phase G — shared package). ## Phase B — `epay_accounts` pool with one-batch-per-account lock Mirror `gis_meta.eterra_accounts` (busc-infra migration 004): AES-256-GCM creds, `status active/blocked/retired`, `blocked_reason`, `credits_cached`, optional hourly cap, `in_flight_batch_id`. - `pickEpayAccount`: `FOR UPDATE SKIP LOCKED`, but because ePay's cart is **global per account**, atomically set `in_flight_batch_id` (status `busy`) so no second batch can touch that account's cart. This is the structural fix for cart contamination (C1) in the pooled world. - Refuse to claim a batch larger than the account's cached credits. ePay credits are a **hard consumable (real money)** — unlike the soft eTerra quota, the credit cap is mandatory, not advisory. ## Phase C — Catalog dedup (largest recurring economic win) `CfExtractCatalog` is written **only** on the `cf-intern` path today; nothing writes it when a paid ePay order completes → a paid extract by tenant A is never "fresh" for tenant B, so the 30-day money-saver is structurally unrealized. - On ePay completion, `upsert CfExtractCatalog(nrCadastral, latestId, expiresAt=documentDate+30d, isFresh=true)`. - `POST /enrichment/cf/claim {nrCadastral}`: on a catalog hit, create a B-owned row `type='catalog', status='completed', creditsUsed=0` pointing at the shared MinIO object (or a copy). This turns today's 409 `catalog_hit` (`enrichment.ts:226`) into **instant, free fulfillment**. RLS unchanged (B reads B's row). One paid extract serves every tenant that needs that parcel within 30 days, at marginal zero ANCPI cost. ## Phase D — Credential model (tenant-policy-driven) Store the strategy per-tenant; don't pick one globally: - **Internal Beletage group** → pooled company accounts (Infisical, encrypted in `epay_accounts`). Best batching + catalog sharing; per-credit attribution via audit. - **External paying tenants** (eterra-live model) → dedicated per-tenant accounts so credits/billing stay clean. - Record `account_id` + `creditsUsed` on every `CfExtract` for attribution regardless. - All three apps converge as thin callers of `POST /enrichment/cf` (Authentik multi-issuer + tenant claim already in place, `gis-api/src/lib/auth.ts`). Reuse eterra-live `crypto.ts` (AES-256-GCM) + a 1-byte key-version prefix for rotation. ## Phase E — gis-api gaps for async consumption 1. **Completion webhook/SSE**, tenant-scoped + RLS-filtered (`GET /enrichment/cf/events`) → kills polling and the dead-Brevo dependency. 2. **Bulk-zip** `GET /enrichment/cf/zip?orderId=` streaming from MinIO (port the V3 streaming-zip approach). 3. `ExtractStatus` enum additions (see Phase A). 4. List filters `creditsUsed=0` / `type='catalog'` so the UI can label shared extracts. ## Phase F — Reversible migration, per-tenant flip - **Phase 0 (now):** `EPAY_ORDERING_VIA_GIS_AC=false`, hardened legacy queue is the sole fulfiller. `/api/ancpi/recover` stays as the manual safety net. - **Phase 1:** deploy worker + pool + catalog-write; seed `epay_accounts` with ONLY the Beletage account; flip the flag for `claims.tenant === 'architools'`. - **Phase 2:** run both paths in parallel a grace window; reconcile on orderId (no double charge). - **Phase 3:** onboard external tenants with dedicated accounts; delete `epay-queue` / `epay-client` / `epay-session-store` + `src/app/api/ancpi/*` from ArchiTools. The flag is the kill-switch throughout. ## Phase G — Shared `epay-client` package (do regardless of phase) ArchiTools and eterra-live each have a near-identical `epay-client.ts` that has **already diverged dangerously**: ArchiTools got today's fixes; eterra-live got the method-internal ports (commit `eterra-live d30128b`) but lacks cart-hygiene + the per-page parser refactor. Extract `@beletage/epay-client` (natural home: `gis-sync-orchestrator`, which owns the account pool) so a fix lands once. Until then, any epay-client change MUST be mirrored to both repos in the same change. ## Known follow-ups not yet done - eterra-live still lacks the cart-hygiene `numberOfItems` invariant (single-order flow makes it lower-risk, but a crashed prior order can still orphan a cart row). Needs a route-level touch + testing on that product before shipping. - `BREVO_API_KEY` returns 401 "Key not found" → ArchiTools email notifications are dead; the correct fix is the Phase E webhook, not patching Brevo. SMTP relay creds still work. - ArchiTools `auth-options.ts` has pre-existing `react-hooks/rules-of-hooks` lint errors on the `useGisAcFlag`/`useBasicPanelFlag` session calls (tolerated by `next build`).