Spaces:
Paused
Paused
| title: PayOps — Payment Operations Incident Response | |
| emoji: 💳 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_port: 7860 | |
| tags: | |
| - openenv | |
| - finance | |
| - fraud-detection | |
| - compliance | |
| - reinforcement-learning | |
| pinned: false | |
| fullWidth: false | |
| build_version: 2026-04-12-v6 | |
| # PayOps — Payment Operations Incident Response | |
| An **OpenEnv-compatible** reinforcement-learning environment where an AI agent | |
| acts as a Payment Operations analyst. The agent reviews financial transactions | |
| one by one and must decide the correct compliance action for each. | |
| --- | |
| ## Motivation | |
| Payment operations teams process thousands of transactions every day. A | |
| skilled analyst uses dozens of signals — risk scores, velocity, KYC status, | |
| flag patterns — to make fast, accurate decisions. This environment lets an AI | |
| agent learn and be evaluated on exactly this task, spanning clear-cut cases all | |
| the way to subtle adversarial patterns like model-score poisoning and | |
| Authorised Push Payment (APP) scams. | |
| --- | |
| ## Environment Description | |
| Each **episode** steps through all **30 transactions** (6 easy, 8 medium, 10 hard, 6 critical). | |
| For each transaction the agent observes a rich set of signals and chooses one | |
| of **10 possible actions** — 5 terminal decisions and 5 investigation sub-actions. | |
| A reward is returned immediately, and the next transaction is presented until | |
| the episode is complete. | |
| --- | |
| ## Action Space | |
| Terminal decisions (no budget cost) commit to a final outcome for the transaction. | |
| Investigation sub-actions (with budget cost) reveal more information and let the agent act again on the same transaction. | |
| | Action | Type | Description | Budget Cost | | |
| |-----------------|---------------|-------------|-------------| | |
| | `approve` | terminal | Mark transaction as legitimate; allow it through | — | | |
| | `reject` | terminal | Block the transaction outright | — | | |
| | `flag` | terminal | Soft hold; mark for manual review | — | | |
| | `escalate` | terminal | Route to senior compliance officer / fraud team | — | | |
| | `hold` | terminal | Temporary hold pending more information | — | | |
| | `inspect` | investigation | Pull additional signals (logs, KYC, velocity) — yields `inspection_notes` | 0.10 | | |
| | `request_docs` | investigation | Ask sender for supporting documents (invoice, contract) — yields `docs_notes` | 0.20 | | |
| | `verify_kyc` | investigation | Trigger an active KYC re-verification check — yields `kyc_notes` | 0.20 | | |
| | `contact_sender` | investigation | Contact the sender directly to confirm intent — yields `contact_notes` | 0.30 | | |
| | `file_sar` | investigation | File a Suspicious Activity Report to the regulator (required on AML/structuring tasks) | 0.10 | | |
| --- | |
| ## Observation Space | |
| | Field | Type | Description | | |
| |------------------------|-------------------|-------------| | |
| | `transaction_id` | `str` | Unique transaction identifier | | |
| | `amount` | `float` | Transaction amount in the stated currency | | |
| | `currency` | `str` | ISO-4217 currency code | | |
| | `sender` | `str` | Sender identifier (email / account / alias) | | |
| | `receiver` | `str` | Receiver identifier | | |
| | `transaction_type` | `str` | transfer \| payment \| withdrawal \| refund \| internal \| loan_repayment \| payroll | | |
| | `status` | `str` | pending \| approved \| rejected \| flagged \| escalated \| held \| inspected \| docs_requested \| kyc_triggered \| sender_contacted \| sar_filed | | |
| | `risk_score` | `float [0,1]` | Composite ML risk score | | |
| | `ml_confidence` | `float [0,1]` | Model's self-reported confidence in `risk_score` — low value signals possible model poisoning | | |
| | `flags` | `List[str]` | Active risk flags (e.g. `high_value`, `unknown_sender`, `velocity_breach`) | | |
| | `velocity_1h` | `int?` | Transactions from sender in the past hour | | |
| | `velocity_24h` | `int?` | Transactions from sender in the past 24 hours | | |
| | `avg_transaction_amount`| `float?` | Sender's historical average transaction amount | | |
| | `account_age_days` | `int?` | Age of the sender account in days | | |
| | `country_risk` | `str?` | low \| medium \| high \| sanctioned | | |
| | `kyc_status` | `str?` | verified \| pending \| failed \| none \| expired | | |
| | `kyc_expiry_days` | `int?` | Days until KYC expires (negative = already expired) | | |
| | `previous_violations` | `int?` | Prior compliance violations for this sender | | |
| | `previous_sars` | `int?` | Suspicious Activity Reports previously filed for this sender | | |
| | `counterparty_risk` | `str?` | clean \| unknown \| watchlist \| blacklist | | |
| | `chain_step` | `int` | Current step in a multi-hop investigation chain (1 = initial presentation) | | |
| | `chain_total` | `int` | Total investigation steps for this task (1 = single-step) | | |
| | `chain_context` | `str?` | Accumulated summary of findings from earlier chain steps | | |
| | `steps_remaining` | `int?` | Investigation sub-steps remaining before a terminal decision is required | | |
| | `action_cost` | `float` | Budget cost incurred by the last action | | |
| | `budget_remaining` | `float` | Remaining investigation budget (starts at 5.0; decreases with each investigation action) | | |
| | `inspection_notes` | `str?` | Additional details revealed after an `inspect` action | | |
| | `docs_notes` | `str?` | Document review findings after a `request_docs` action | | |
| | `kyc_notes` | `str?` | KYC re-verification outcome after a `verify_kyc` action | | |
| | `contact_notes` | `str?` | Outcome after a `contact_sender` action | | |
| | `investigation_hints` | `List[str]` | Sub-actions recommended for this task (e.g. `inspect`, `verify_kyc`). Using them before the terminal decision earns bonus reward. Empty = no specific investigation required. | | |
| | `recent_decisions` | `List[dict]` | Last ≤3 completed decisions in this episode (for pattern context) | | |
| | `network_graph` | `dict?` | Mule-chain / correspondent-bank relationship graph where present | | |
| | `task_id` | `str` | Identifier of the active task | | |
| | `task_difficulty` | `str` | easy \| medium \| hard \| critical | | |
| | `step_in_episode` | `int` | Steps elapsed in this episode | | |
| | `reward` | `float` | Reward from the last action | | |
| | `reward_breakdown` | `dict` | Itemised reward components: base, confidence_bonus, cost_penalty, etc. | | |
| | `cumulative_reward` | `float` | Total reward accumulated so far in this episode | | |
| | `done` | `bool` | Whether the episode has ended | | |
| | `info` | `dict` | Diagnostic info (event, correct action, etc.) | | |
| --- | |
| ## Task Descriptions | |
| ### Easy (4 tasks — clear signals) | |
| | ID | Description | Correct Action | | |
| |----------|-------------|----------------| | |
| | EASY-001 | Low-value domestic transfer between verified users; no flags | `approve` | | |
| | EASY-002 | Textbook fraud: unknown sender, offshore, sanctioned country, risk=0.97 | `reject` | | |
| | EASY-003 | Standard refund to verified customer; tiny amount, no flags | `approve` | | |
| | EASY-004 | ATM withdrawal burst — 15 withdrawals in 58 minutes | `flag` | | |
| | EASY-005 | Scheduled monthly mortgage repayment; regular amount, verified borrower | `approve` | | |
| | EASY-006 | Suspected duplicate payment: same sender/receiver/amount submitted twice in 4 minutes | `flag` | | |
| ### Medium (8 tasks — ambiguous, multi-signal reasoning required) | |
| | ID | Description | Correct Action | | |
| |---------|-------------|----------------| | |
| | MED-001 | Large B2B wire, verified CFO, cross-border to medium-risk jurisdiction | `escalate` | | |
| | MED-002 | Internal treasury transfer; large amount, KYC pending renewal | `hold` | | |
| | MED-003 | Recurring subscription 3× higher than historical average | `flag` | | |
| | MED-004 | Payment to licensed crypto exchange from verified personal account | `flag` | | |
| | MED-005 | Expired KYC on high-frequency corporate payroll account; KYC lapsed 12 days ago | `hold` | | |
| | MED-006 | Real estate advance payment; large first-time transfer to new receiver but signed contract exists | `escalate` | | |
| | MED-007 | Supplier emails to say bank details have changed; first payment to new account matches large invoice (BEC indicator) | `hold` | | |
| | MED-008 | Buy Now Pay Later high-value purchase; new account, thin credit file, elevated risk signals | `flag` | | |
| ### Hard (10 tasks — adversarial / edge-case) | |
| | ID | Description | Correct Action | | |
| |----------|-------------|----------------| | |
| | HARD-001 | Fraud model poisoning: risk_score=0.18 but manual signals scream escalate | `escalate` | | |
| | HARD-002 | APP (Authorised Push Payment) scam: victim sending willingly to mule account | `reject` | | |
| | HARD-003 | Structuring / smurfing: just-below-CTR-threshold payments, same UBO | `reject` | | |
| | HARD-004 | Legitimate FX correspondent banking settlement — looks alarming, is not | `approve` | | |
| | HARD-005 | Insider threat: employee initiating transfers to personal family accounts | `escalate` | | |
| | HARD-006 | Ghost account: dormant 5 years, suddenly received 20 inbound transfers this week | `flag` | | |
| | HARD-007 | SIM-swap attack: phone ported 6 hours ago; account now requesting large crypto withdrawal to new address | `reject` | | |
| | HARD-008 | Romance scam / pig butchering: 4th escalating transfer to overseas 'romantic partner' met online | `reject` | | |
| | HARD-009 | Synthetic identity fraud: new business account with AI-generated-looking perfect profile | `escalate` | | |
| | HARD-010 | Payroll diversion: HR system breach rerouted employee salary to newly added account | `reject` | | |
| ### Critical (6 tasks — regulatory + multi-step investigation chains) | |
| | ID | Description | Correct Action | | |
| |----------|-------------|----------------| | |
| | CRIT-001 | Multi-step chain: large PE wire to new counterparty; inspect then request docs before deciding (chain of 3) | `approve` | | |
| | CRIT-002 | Fraud ring: coordinated small payments from 3 related accounts aggregating above reporting threshold; SAR required | `reject` | | |
| | CRIT-003 | Trade-based money laundering: over-invoiced international trade payment (4× market price) | `escalate` | | |
| | CRIT-004 | Compromised corporate account: geo-impossible login (NY → Lagos in 8 min); confirmed account takeover | `reject` | | |
| | CRIT-005 | OFAC sanctions evasion: large USD payment routed through UAE shell chain; UBO is on SDN list (chain of 3) | `reject` | | |
| | CRIT-006 | Correspondent banking: partner bank added to FinCEN 311 Special Measures list; in-flight payments must be escalated | `escalate` | | |
| --- | |
| ## Reward Design | |
| | Outcome | Reward | | |
| |---------|--------| | |
| | Correct action | **+1.0** | | |
| | Partial-credit adjacent action (per-task) | **+0.2 – +0.6** | | |
| | `inspect` (information seeking, first time) | **+0.15** | | |
| | `approve` when correct is `reject` / `escalate` | **−1.0** | | |
| | `approve` when correct is `flag` / `hold` | **−0.5** | | |
| | `reject` when correct is `approve` | **−0.5** | | |
| | Any other wrong action | **−0.25** | | |
| The **episode score** (0–1) is: `max(0, total_reward) / max_possible_reward`. | |
| A score ≥ 0.5 is considered a passing episode. | |
| --- | |
| ## API Endpoints | |
| | Method | Path | Description | | |
| |--------|------|-------------| | |
| | `POST` | `/reset` | Reset environment, return first observation | | |
| | `POST` | `/step` | Execute an action | | |
| | `GET` | `/state` | Current internal environment state | | |
| | `GET` | `/schema` | JSON schemas for action / observation / state | | |
| | `GET` | `/tasks` | Full task list with metadata | | |
| | `GET` | `/grader` | Grade the current episode | | |
| | `POST` | `/baseline` | Run rule-based baseline and return scores | | |
| | `GET` | `/health` | Health check | | |
| | `WS` | `/ws` | WebSocket persistent session | | |
| Interactive API docs: `http://localhost:8000/docs` | |
| --- | |
| ## Setup & Running | |
| ### Local (Python) | |
| ```bash | |
| # 1. Install dependencies | |
| pip install -r requirements.txt | |
| # 2. Start the server (from the parent directory of payops_env) | |
| PYTHONPATH=$(pwd) uvicorn payops_env.server.app:app --host 0.0.0.0 --port 8000 | |
| # 3. Verify | |
| curl http://localhost:8000/health | |
| ``` | |
| ### Run the baseline agent | |
| ```bash | |
| # Via the API endpoint (no extra script needed) | |
| curl -s -X POST http://localhost:8000/baseline | python3 -m json.tool | |
| ``` | |
| ### Docker | |
| ```bash | |
| # Build | |
| docker build -t payops-env . | |
| # Run locally on port 8000 | |
| docker run -p 8000:7860 -e PORT=7860 payops-env | |
| # Verify | |
| curl http://localhost:8000/health | |
| ``` | |
| ### HuggingFace Space | |
| The `Dockerfile` exposes port **7860** (HF Spaces default). Push the repo to | |
| a HF Space with Docker runtime — no additional configuration required. | |
| --- | |
| ## Example Agent Interaction | |
| ```python | |
| import httpx | |
| base = "http://localhost:8000" | |
| # Reset | |
| obs = httpx.post(f"{base}/reset").json() | |
| print(obs["transaction_id"], obs["risk_score"], obs["flags"]) | |
| # Step | |
| while not obs["done"]: | |
| # ... agent decides action_type ... | |
| obs = httpx.post(f"{base}/step", json={ | |
| "action_type": "approve", | |
| "transaction_id": obs["transaction_id"], | |
| }).json() | |
| print(f"reward={obs['reward']:+.2f} done={obs['done']}") | |
| # Grade | |
| score = httpx.get(f"{base}/grader").json() | |
| print(f"Episode score: {score['normalised_score']:.4f}") | |
| ``` | |
| --- | |
| ## Baseline Results | |
| ### Rule-based baseline (`POST /baseline`) | |
| The rule-based baseline uses a deterministic priority-ordered policy in `scripts_util.py`. | |
| | Metric | Rule-based baseline (v2, 30 tasks) | | |
| |--------|------------------------------------| | |
| | Normalised score | 0.68–0.76 | | |
| | Passed (≥ 0.5) | Yes | | |
| | Strong at | Easy tasks, clear velocity/flag patterns | | |
| | Weak at | Hard adversarial tasks (HARD-001 model-poisoning, HARD-004 FX settlement) | | |
| | Critical coverage | Partial — misses some SAR filing requirements | | |
| Scores vary slightly per run due to per-episode parameter jitter. | |
| Run `POST /baseline` to reproduce. | |
| ### LLM baseline (`inference.py` — `llama-3.1-8b-instant` via Groq) | |
| Run locally against seed 42 (reproducible) with investigation sub-actions enabled. | |
| | Metric | llama-3.1-8b-instant (Groq) | | |
| |--------|-----------------------------| | |
| | Normalised score | **0.6028** | | |
| | Total reward | 17.000 / 28.200 max | | |
| | Tasks correct | 6 / 20 (30%) | | |
| | Budget spent | 5.50 / 5.00 | | |
| | Budget penalty | 0.05 | | |
| | Episode steps | 57 (incl. investigation sub-actions) | | |
| | Duration | ~290 s | | |
| | Passed (≥ 0.5) | **YES ✓** | | |
| | Seed | 42 (fixed — deterministic across re-runs) | | |
| **Per-task decisions:** | |
| | Task | LLM Action | Correct Action | Weighted Reward | | |
| |------|-----------|----------------|----------------| | |
| | EASY-001 | `approve` | `approve` | +1.000 ✓ | | |
| | EASY-002 | `flag` | `reject` | −0.250 ✗ (flag no longer partial credit) | | |
| | EASY-003 | `approve` | `approve` | +1.000 ✓ | | |
| | EASY-004 | `flag` | `flag` | +1.000 ✓ | | |
| | MED-001 | `flag` | `escalate` | +0.900 (partial + investigation bonus) | | |
| | MED-002 | `flag` | `hold` | +0.540 (partial + investigation bonus) | | |
| | MED-003 | `flag` | `flag` | +1.200 ✓ | | |
| | MED-004 | `flag` | `flag` | +1.200 ✓ | | |
| | MED-005 | `flag` | `hold` | +0.660 (partial + investigation bonus) | | |
| | MED-006 | `flag` | `escalate` | +0.600 (partial + investigation bonus) | | |
| | HARD-001 | `flag` | `escalate` | +1.275 (partial + investigation bonus) | | |
| | HARD-002 | `flag` | `reject` | +0.525 (partial + investigation bonus) | | |
| | HARD-003 | `flag` | `reject` | +0.675 (partial + investigation bonus) | | |
| | HARD-004 | `flag` | `approve` | +0.825 (partial + investigation bonus) | | |
| | HARD-005 | `flag` | `escalate` | +0.825 (partial + investigation bonus) | | |
| | HARD-006 | `flag` | `flag` | +2.025 ✓ (+ investigation bonus) | | |
| | CRIT-001 | `flag` | `approve` | +1.100 (partial + investigation bonus) | | |
| | CRIT-002 | `flag` | `reject` | +0.900 (partial + investigation bonus) | | |
| | CRIT-003 | `flag` | `escalate` | +1.300 (partial + investigation bonus) | | |
| | CRIT-004 | `flag` | `reject` | −0.250 ✗ | | |
| **Observations:** The model used investigation sub-actions (`inspect`, `verify_kyc`, `contact_sender`) before terminal decisions, earning investigation bonuses that raised the score from a naive always-flag baseline. Easy cases with clear evidence now penalise lazy `flag` decisions (e.g. EASY-002). Agents that correctly identify terminal actions on top of proper investigation can exceed 0.90. | |
| To reproduce exactly (seed=42 is the default): | |
| ```bash | |
| export OPENAI_API_KEY="gsk_..." # your Groq API key | |
| export API_BASE_URL="https://api.groq.com/openai/v1" | |
| export MODEL_NAME="llama-3.1-8b-instant" | |
| export PAYOPS_BASE_URL="https://padmapriyagosakan-payops-env.hf.space" | |
| # INFERENCE_SEED=42 # default; set to "random" for a fresh episode | |
| PYTHONPATH=$(pwd) python payops_env/inference.py | |
| ``` | |
| For Groq setup instructions see the **Running inference with Groq** section below. | |
| --- | |
| ## Running inference with Groq (recommended — free) | |
| [Groq](https://console.groq.com) provides a completely free API with no monthly credit cap and no installation required. It uses the same OpenAI-compatible interface that `inference.py` already targets. | |
| ### Prerequisites | |
| 1. **Create a free Groq account** — go to [console.groq.com](https://console.groq.com) and sign up (Google / GitHub login available) | |
| 2. **Generate an API key** — click **API Keys → Create API Key**, copy the key (starts with `gsk_`) | |
| 3. **Install the Python dependency** (already in `requirements.txt`): | |
| ```bash | |
| pip install openai | |
| ``` | |
| ### Run inference | |
| ```bash | |
| cd /path/to/payops_env # project root (parent of payops_env/) | |
| export OPENAI_API_KEY="gsk_..." # your Groq API key | |
| export API_BASE_URL="https://api.groq.com/openai/v1" | |
| export MODEL_NAME="llama-3.1-8b-instant" | |
| export PAYOPS_BASE_URL="https://padmapriyagosakan-payops-env.hf.space" | |
| PYTHONPATH=$(pwd) python payops_env/inference.py | |
| ``` | |
| > **Why Groq?** | |
| > - Free tier: 14,400 requests/day, 500,000 tokens/minute — a 20-task episode uses ~30 calls | |
| > - No monthly credit pool that runs out mid-run (unlike the HF free tier) | |
| > - No installation or model download (unlike Ollama) | |
| > - `temperature=0.0` is already set in `inference.py` so results are reproducible | |
| > - Inference speed: ~750 tok/s → full episode completes in under 30 seconds | |
| ### Alternative free models on Groq | |
| | Model | Notes | | |
| |-------|-------| | |
| | `llama-3.1-8b-instant` | Fastest, good reasoning | | |
| | `llama-3.3-70b-versatile` | Best quality on hard tasks; same free tier | | |
| | `mixtral-8x7b-32768` | Large context window | | |
| | `gemma2-9b-it` | Google Gemma 2 | | |
| ### Alternative: Ollama (fully local, no internet required for LLM calls) | |
| If you prefer to run the model entirely on your machine: | |
| ```bash | |
| # 1. Install | |
| brew install ollama | |
| # 2. Pull a model (choose based on available RAM) | |
| ollama pull qwen2.5:3b # ~2 GB – 8 GB RAM | |
| ollama pull qwen2.5:7b # ~4.7 GB – 16 GB RAM | |
| # 3. Start the server (keep running in a separate terminal) | |
| ollama serve | |
| # 4. Run inference | |
| export OPENAI_API_KEY=ollama | |
| export API_BASE_URL="http://localhost:11434/v1" | |
| export MODEL_NAME="qwen2.5:3b" | |
| export PAYOPS_BASE_URL="https://padmapriyagosakan-payops-env.hf.space" | |
| PYTHONPATH=$(pwd) python payops_env/inference.py | |
| ``` | |
| --- | |
| ## Project Structure | |
| ``` | |
| payops_env/ | |
| ├── models.py # PayOpsAction, PayOpsObservation, PayOpsState (Pydantic) | |
| ├── environment.py # PayOpsEnvironment — reset_async / step_async / state | |
| ├── tasks.py # 30 tasks (EASY×6, MED×8, HARD×10, CRIT×6) with ground-truth labels | |
| ├── grader.py # Partial-credit reward function + episode grader | |
| ├── scripts_util.py # Baseline runner helper (used by /baseline endpoint) | |
| ├── server/ | |
| │ └── app.py # FastAPI server with all required endpoints | |
| ├── inference.py # Competition inference script (OpenAI client, root-level) | |
| ├── validate.py # Pre-submission checklist validator | |
| ├── openenv.yaml # OpenEnv manifest v2.0.0 | |
| ├── Dockerfile # Docker / HuggingFace Space container (port 7860) | |
| ├── requirements.txt # Python dependencies | |
| └── README.md # This file | |
| ``` | |
| --- | |
| ## Evaluation Criteria Alignment | |
| | Criterion | Implementation | | |
| |-----------|---------------| | |
| | Real-world utility | Payment fraud and compliance triage — deployed daily by fintech ops teams worldwide | | |
| | Task & grader quality | 30 tasks across 4 difficulty tiers (easy→critical); partial-credit grader; clear pass/fail | | |
| | Environment design | 30-field observation space; 10-action space (5 terminal + 5 investigation); budget mechanic; episode state tracking | | |
| | Code quality & spec compliance | Pydantic v2 models; async API; all 11 required endpoints; openenv.yaml v2; Dockerfile; validate.py | | |
| | Creativity & novelty | Adversarial model-poisoning task; APP scam; AML structuring with SAR requirement; PEP detection | | |
| --- | |
| ## Reward Design (v2 — Trajectory-Based) | |
| Rewards are dense across the full trajectory, not just on the final decision: | |
| | Component | Value | Condition | | |
| |-----------|-------|-----------| | |
| | Correct terminal action | **+1.0** | per task (difficulty-weighted in episode score) | | |
| | Investigation sub-action | **+0.15** | per eligible sub-action, first use only | | |
| | Flag identification | **+0.20** | agent used `inspect` AND key diagnostic flags present | | |
| | Confidence bonus | +0.10 | confidence ≥ 0.8 AND correct | | |
| | Confidence penalty | −0.10 | confidence ≥ 0.8 AND wrong | | |
| | Regulatory SAR bonus | +0.20 | `file_sar` before terminal on a regulatory task | | |
| | Duplicate investigation | −0.05 | same sub-action used twice on same task | | |
| | Approve a fraud/sanctioned | **−1.00** | worst mistake | | |
| Difficulty weights: easy×1.0, medium×1.2, hard×1.5, critical×2.0 | |
| Episode score is **strictly clamped to `[0.0, 1.0]`**. Passing threshold: **0.5**. | |
| ### Per-Episode Parameter Jitter | |
| Each `POST /reset` generates a unique `episode_seed` and applies small random perturbations to prevent agent overfitting: | |
| | Field | Jitter | | |
| |-------|--------| | |
| | `amount` | × Uniform(0.85, 1.20) | | |
| | `risk_score` | + Gauss(0, 0.03), clamped [0,1] | | |
| | `velocity_1h` | + Randint(−3, +3), min 0 | | |
| | `velocity_24h` | + Randint(−3, +3), min 0 | | |
| The `correct_action` and all ground-truth labels are **never changed** — only the observable values the agent uses to make decisions. | |
| The `episode_seed` is returned by `GET /health` and `GET /state` for reproducibility. | |
| ### Network Graph | |
| Selected tasks include a `network_graph` field in the observation exposing mule-chain / correspondent-banking relationships (e.g. victim → mule → offshore). This gives agents richer context for complex fraud patterns. | |