Files
ArchiTools/docs/plans/006-epay-cf-service-architecture.md
Claude VM 28c870fb12 harden(epay): cart-hygiene invariant uses confirmed cart count + add service architecture plan
- cartCount tracks actual cart rows (decrement only on confirmed delete) so a
  failed cleanup delete can't trigger a false dirty-cart abort.
- docs/plans/006: the multi-tenant CF-service architecture (DB-backed
  fulfiller, account pool, catalog dedup, per-tenant credential model,
  reversible flag flip) — the executable next phase. The Phase-F flag flip is
  gated on the orchestrator fulfiller existing (Plan 003 Faza F was wrong).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 00:06:06 +03:00

144 lines
8.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Plan 006 — ePay CF-Extract as a Multi-Tenant Service
**Status:** design (executable). **Author:** deep-dive 2026-06-04 + hardening 2026-06-05.
**Prereq reading:** plans 002/003 (architools thin-client cutover), `project_epay_cf_roadmap_2026_06` memory.
## Why this exists
The CF-extract capability (ANCPI ePay paid extracts + the free `cf-intern` copycf
circuit) will be offered beyond internal use — multi-tenant (ArchiTools, eterra-live,
Planhub, external paying customers). Today it runs as an **in-process queue inside
ArchiTools** (`src/modules/parcel-sync/services/epay-*`). That path is now hardened
(commit `f49fdb1`: cart hygiene, auth/IDOR gates, single-page fetch, parallel
downloads, recover-by-extractId) and is **billing-safe and correct for the internal
tool** — but it is the wrong *shape* for a service:
- Queue + ePay session are in-memory globals → die on redeploy mid-batch.
- One serial cart per process → no multi-tenant throughput.
- No catalog dedup on the paid path → the same parcel is paid for repeatedly.
- `EPAY_ORDERING_VIA_GIS_AC=false` because **gis-api `POST /enrichment/cf` inserts a
pending row that nothing fulfills** — the orchestrator has no ePay worker. Plan 003
Faza F ("endpoints already exist") is wrong: the fulfiller is the unwritten keystone.
This plan is the path from the hardened-internal state to a real service. Each phase
is independently shippable; do them in order, validate, then flip the flag per-tenant.
## Invariants carried over from the hardened internal path (do NOT regress)
These were learned the hard way (2026-06-04 incident, order 10009605). The new worker
MUST preserve every one:
1. **Submit is timeout-resilient.** A slow `EditCartSubmit` that ANCPI completes must
never be marked failed. Resolve the order via `findNewOrderId(previous, known)` which
never adopts a stale/known id. (`SUBMIT_TIMEOUT_MS`, today's fix.)
2. **Cart hygiene invariant.** ePay has ONE global cart per account; `EditCartSubmit`
checks out everything in it. After N adds a clean cart reports exactly N items — any
excess = orphan from a crash → wipe + abort, never submit a cart you didn't build.
3. **CF-number matching is authoritative; index fallback is `review`, not `completed`.**
4. **`%PDF` magic-byte check** on every download (expired session returns login HTML).
5. **Single-page order fetch** via `itemsPerPage` (5/page default silently drops docs).
6. **Recover is idempotent** (re-poll + re-download an already-paid order, no new charge).
## Phase A — DB-backed fulfiller worker (`eterra.cf-epay`) — THE KEYSTONE
A pg-boss worker in `gis-sync-orchestrator` (next to `enrichment-drainer`, cron 12 min).
**The CfExtract row IS the work item** — no in-memory queue.
- **Claim:** `SELECT … FROM gis_enrichment."CfExtract" WHERE status='pending' AND
type='epay' [AND account-compatible] ORDER BY "createdAt" FOR UPDATE SKIP LOCKED LIMIT N;
UPDATE → status='claimed', claimedAt=now()`. SKIP LOCKED → two instances never grab the
same rows.
- **State machine** (each transition = one UPDATE = a precise resumable marker):
`pending → claimed → cart → submitted_unconfirmed → polling → downloading → completed |
review | failed | cancelled`. Extend gis-api's `ExtractStatus` enum
(`gis-api/src/routes/enrichment.ts:9`) with `claimed`, `submitted_unconfirmed`, `review`.
- **Crash recovery:** a boot **reaper** requeues rows stuck in a transitional state past a
heartbeat TTL. `submitted_unconfirmed` rows are resolved via the recover pattern (find
the order at ANCPI, never re-charge). This structurally eliminates the in-memory-queue
orphan class (criticals C2).
- **Idempotent submit:** before `EditCartSubmit`, persist on the claimed rows the account's
current latest orderId + the intended `nrCadastral` set. On timeout/crash, resume
re-runs `findNewOrderId` against that snapshot — never adopts a stale id.
- Port the hardened `epay-client` here (see Phase G — shared package).
## Phase B — `epay_accounts` pool with one-batch-per-account lock
Mirror `gis_meta.eterra_accounts` (busc-infra migration 004): AES-256-GCM creds,
`status active/blocked/retired`, `blocked_reason`, `credits_cached`, optional hourly cap,
`in_flight_batch_id`.
- `pickEpayAccount`: `FOR UPDATE SKIP LOCKED`, but because ePay's cart is **global per
account**, atomically set `in_flight_batch_id` (status `busy`) so no second batch can
touch that account's cart. This is the structural fix for cart contamination (C1) in the
pooled world.
- Refuse to claim a batch larger than the account's cached credits. ePay credits are a
**hard consumable (real money)** — unlike the soft eTerra quota, the credit cap is
mandatory, not advisory.
## Phase C — Catalog dedup (largest recurring economic win)
`CfExtractCatalog` is written **only** on the `cf-intern` path today; nothing writes it
when a paid ePay order completes → a paid extract by tenant A is never "fresh" for tenant
B, so the 30-day money-saver is structurally unrealized.
- On ePay completion, `upsert CfExtractCatalog(nrCadastral, latestId,
expiresAt=documentDate+30d, isFresh=true)`.
- `POST /enrichment/cf/claim {nrCadastral}`: on a catalog hit, create a B-owned row
`type='catalog', status='completed', creditsUsed=0` pointing at the shared MinIO object
(or a copy). This turns today's 409 `catalog_hit` (`enrichment.ts:226`) into **instant,
free fulfillment**. RLS unchanged (B reads B's row). One paid extract serves every tenant
that needs that parcel within 30 days, at marginal zero ANCPI cost.
## Phase D — Credential model (tenant-policy-driven)
Store the strategy per-tenant; don't pick one globally:
- **Internal Beletage group** → pooled company accounts (Infisical, encrypted in
`epay_accounts`). Best batching + catalog sharing; per-credit attribution via audit.
- **External paying tenants** (eterra-live model) → dedicated per-tenant accounts so
credits/billing stay clean.
- Record `account_id` + `creditsUsed` on every `CfExtract` for attribution regardless.
- All three apps converge as thin callers of `POST /enrichment/cf` (Authentik multi-issuer
+ tenant claim already in place, `gis-api/src/lib/auth.ts`). Reuse eterra-live
`crypto.ts` (AES-256-GCM) + a 1-byte key-version prefix for rotation.
## Phase E — gis-api gaps for async consumption
1. **Completion webhook/SSE**, tenant-scoped + RLS-filtered (`GET /enrichment/cf/events`)
→ kills polling and the dead-Brevo dependency.
2. **Bulk-zip** `GET /enrichment/cf/zip?orderId=` streaming from MinIO (port the V3
streaming-zip approach).
3. `ExtractStatus` enum additions (see Phase A).
4. List filters `creditsUsed=0` / `type='catalog'` so the UI can label shared extracts.
## Phase F — Reversible migration, per-tenant flip
- **Phase 0 (now):** `EPAY_ORDERING_VIA_GIS_AC=false`, hardened legacy queue is the sole
fulfiller. `/api/ancpi/recover` stays as the manual safety net.
- **Phase 1:** deploy worker + pool + catalog-write; seed `epay_accounts` with ONLY the
Beletage account; flip the flag for `claims.tenant === 'architools'`.
- **Phase 2:** run both paths in parallel a grace window; reconcile on orderId (no double
charge).
- **Phase 3:** onboard external tenants with dedicated accounts; delete `epay-queue` /
`epay-client` / `epay-session-store` + `src/app/api/ancpi/*` from ArchiTools. The flag is
the kill-switch throughout.
## Phase G — Shared `epay-client` package (do regardless of phase)
ArchiTools and eterra-live each have a near-identical `epay-client.ts` that has **already
diverged dangerously**: ArchiTools got today's fixes; eterra-live got the method-internal
ports (commit `eterra-live d30128b`) but lacks cart-hygiene + the per-page parser refactor.
Extract `@beletage/epay-client` (natural home: `gis-sync-orchestrator`, which owns the
account pool) so a fix lands once. Until then, any epay-client change MUST be mirrored to
both repos in the same change.
## Known follow-ups not yet done
- eterra-live still lacks the cart-hygiene `numberOfItems` invariant (single-order flow
makes it lower-risk, but a crashed prior order can still orphan a cart row). Needs a
route-level touch + testing on that product before shipping.
- `BREVO_API_KEY` returns 401 "Key not found" → ArchiTools email notifications are dead;
the correct fix is the Phase E webhook, not patching Brevo. SMTP relay creds still work.
- ArchiTools `auth-options.ts` has pre-existing `react-hooks/rules-of-hooks` lint errors on
the `useGisAcFlag`/`useBasicPanelFlag` session calls (tolerated by `next build`).