payops_env

Paused

App Files Files Community

payops_env / README.md

padmapriyagosakan

Fix grader import path: use root-level graders module instead of server.graders

220acb1 3 months ago

preview code

Raw

History Blame Contribute Delete

23.2 kB

	---
	title: PayOps — Payment Operations Incident Response
	emoji: 💳
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_port: 7860
	tags:
	- openenv
	- finance
	- fraud-detection
	- compliance
	- reinforcement-learning
	pinned: false
	fullWidth: false
	build_version: 2026-04-12-v6
	---

	# PayOps — Payment Operations Incident Response

	An OpenEnv-compatible reinforcement-learning environment where an AI agent
	acts as a Payment Operations analyst. The agent reviews financial transactions
	one by one and must decide the correct compliance action for each.

	---

	## Motivation

	Payment operations teams process thousands of transactions every day. A
	skilled analyst uses dozens of signals — risk scores, velocity, KYC status,
	flag patterns — to make fast, accurate decisions. This environment lets an AI
	agent learn and be evaluated on exactly this task, spanning clear-cut cases all
	the way to subtle adversarial patterns like model-score poisoning and
	Authorised Push Payment (APP) scams.

	---

	## Environment Description

	Each episode steps through all 30 transactions (6 easy, 8 medium, 10 hard, 6 critical).
	For each transaction the agent observes a rich set of signals and chooses one
	of 10 possible actions — 5 terminal decisions and 5 investigation sub-actions.
	A reward is returned immediately, and the next transaction is presented until
	the episode is complete.

	---

	## Action Space

	Terminal decisions (no budget cost) commit to a final outcome for the transaction.
	Investigation sub-actions (with budget cost) reveal more information and let the agent act again on the same transaction.

	\| Action \| Type \| Description \| Budget Cost \|
	\|-----------------\|---------------\|-------------\|-------------\|
	\| `approve` \| terminal \| Mark transaction as legitimate; allow it through \| — \|
	\| `reject` \| terminal \| Block the transaction outright \| — \|
	\| `flag` \| terminal \| Soft hold; mark for manual review \| — \|
	\| `escalate` \| terminal \| Route to senior compliance officer / fraud team \| — \|
	\| `hold` \| terminal \| Temporary hold pending more information \| — \|
	\| `inspect` \| investigation \| Pull additional signals (logs, KYC, velocity) — yields `inspection_notes` \| 0.10 \|
	\| `request_docs` \| investigation \| Ask sender for supporting documents (invoice, contract) — yields `docs_notes` \| 0.20 \|
	\| `verify_kyc` \| investigation \| Trigger an active KYC re-verification check — yields `kyc_notes` \| 0.20 \|
	\| `contact_sender` \| investigation \| Contact the sender directly to confirm intent — yields `contact_notes` \| 0.30 \|
	\| `file_sar` \| investigation \| File a Suspicious Activity Report to the regulator (required on AML/structuring tasks) \| 0.10 \|

	---

	## Observation Space

	\| Field \| Type \| Description \|
	\|------------------------\|-------------------\|-------------\|
	\| `transaction_id` \| `str` \| Unique transaction identifier \|
	\| `amount` \| `float` \| Transaction amount in the stated currency \|
	\| `currency` \| `str` \| ISO-4217 currency code \|
	\| `sender` \| `str` \| Sender identifier (email / account / alias) \|
	\| `receiver` \| `str` \| Receiver identifier \|
	\| `transaction_type` \| `str` \| transfer \\| payment \\| withdrawal \\| refund \\| internal \\| loan_repayment \\| payroll \|
	\| `status` \| `str` \| pending \\| approved \\| rejected \\| flagged \\| escalated \\| held \\| inspected \\| docs_requested \\| kyc_triggered \\| sender_contacted \\| sar_filed \|
	\| `risk_score` \| `float [0,1]` \| Composite ML risk score \|
	\| `ml_confidence` \| `float [0,1]` \| Model's self-reported confidence in `risk_score` — low value signals possible model poisoning \|
	\| `flags` \| `List[str]` \| Active risk flags (e.g. `high_value`, `unknown_sender`, `velocity_breach`) \|
	\| `velocity_1h` \| `int?` \| Transactions from sender in the past hour \|
	\| `velocity_24h` \| `int?` \| Transactions from sender in the past 24 hours \|
	\| `avg_transaction_amount`\| `float?` \| Sender's historical average transaction amount \|
	\| `account_age_days` \| `int?` \| Age of the sender account in days \|
	\| `country_risk` \| `str?` \| low \\| medium \\| high \\| sanctioned \|
	\| `kyc_status` \| `str?` \| verified \\| pending \\| failed \\| none \\| expired \|
	\| `kyc_expiry_days` \| `int?` \| Days until KYC expires (negative = already expired) \|
	\| `previous_violations` \| `int?` \| Prior compliance violations for this sender \|
	\| `previous_sars` \| `int?` \| Suspicious Activity Reports previously filed for this sender \|
	\| `counterparty_risk` \| `str?` \| clean \\| unknown \\| watchlist \\| blacklist \|
	\| `chain_step` \| `int` \| Current step in a multi-hop investigation chain (1 = initial presentation) \|
	\| `chain_total` \| `int` \| Total investigation steps for this task (1 = single-step) \|
	\| `chain_context` \| `str?` \| Accumulated summary of findings from earlier chain steps \|
	\| `steps_remaining` \| `int?` \| Investigation sub-steps remaining before a terminal decision is required \|
	\| `action_cost` \| `float` \| Budget cost incurred by the last action \|
	\| `budget_remaining` \| `float` \| Remaining investigation budget (starts at 5.0; decreases with each investigation action) \|
	\| `inspection_notes` \| `str?` \| Additional details revealed after an `inspect` action \|
	\| `docs_notes` \| `str?` \| Document review findings after a `request_docs` action \|
	\| `kyc_notes` \| `str?` \| KYC re-verification outcome after a `verify_kyc` action \|
	\| `contact_notes` \| `str?` \| Outcome after a `contact_sender` action \|
	\| `investigation_hints` \| `List[str]` \| Sub-actions recommended for this task (e.g. `inspect`, `verify_kyc`). Using them before the terminal decision earns bonus reward. Empty = no specific investigation required. \|
	\| `recent_decisions` \| `List[dict]` \| Last ≤3 completed decisions in this episode (for pattern context) \|
	\| `network_graph` \| `dict?` \| Mule-chain / correspondent-bank relationship graph where present \|
	\| `task_id` \| `str` \| Identifier of the active task \|
	\| `task_difficulty` \| `str` \| easy \\| medium \\| hard \\| critical \|
	\| `step_in_episode` \| `int` \| Steps elapsed in this episode \|
	\| `reward` \| `float` \| Reward from the last action \|
	\| `reward_breakdown` \| `dict` \| Itemised reward components: base, confidence_bonus, cost_penalty, etc. \|
	\| `cumulative_reward` \| `float` \| Total reward accumulated so far in this episode \|
	\| `done` \| `bool` \| Whether the episode has ended \|
	\| `info` \| `dict` \| Diagnostic info (event, correct action, etc.) \|

	---

	## Task Descriptions

	### Easy (4 tasks — clear signals)

	\| ID \| Description \| Correct Action \|
	\|----------\|-------------\|----------------\|
	\| EASY-001 \| Low-value domestic transfer between verified users; no flags \| `approve` \|
	\| EASY-002 \| Textbook fraud: unknown sender, offshore, sanctioned country, risk=0.97 \| `reject` \|
	\| EASY-003 \| Standard refund to verified customer; tiny amount, no flags \| `approve` \|
	\| EASY-004 \| ATM withdrawal burst — 15 withdrawals in 58 minutes \| `flag` \|
	\| EASY-005 \| Scheduled monthly mortgage repayment; regular amount, verified borrower \| `approve` \|
	\| EASY-006 \| Suspected duplicate payment: same sender/receiver/amount submitted twice in 4 minutes \| `flag` \|

	### Medium (8 tasks — ambiguous, multi-signal reasoning required)

	\| ID \| Description \| Correct Action \|
	\|---------\|-------------\|----------------\|
	\| MED-001 \| Large B2B wire, verified CFO, cross-border to medium-risk jurisdiction \| `escalate` \|
	\| MED-002 \| Internal treasury transfer; large amount, KYC pending renewal \| `hold` \|
	\| MED-003 \| Recurring subscription 3× higher than historical average \| `flag` \|
	\| MED-004 \| Payment to licensed crypto exchange from verified personal account \| `flag` \|
	\| MED-005 \| Expired KYC on high-frequency corporate payroll account; KYC lapsed 12 days ago \| `hold` \|
	\| MED-006 \| Real estate advance payment; large first-time transfer to new receiver but signed contract exists \| `escalate` \|
	\| MED-007 \| Supplier emails to say bank details have changed; first payment to new account matches large invoice (BEC indicator) \| `hold` \|
	\| MED-008 \| Buy Now Pay Later high-value purchase; new account, thin credit file, elevated risk signals \| `flag` \|

	### Hard (10 tasks — adversarial / edge-case)

	\| ID \| Description \| Correct Action \|
	\|----------\|-------------\|----------------\|
	\| HARD-001 \| Fraud model poisoning: risk_score=0.18 but manual signals scream escalate \| `escalate` \|
	\| HARD-002 \| APP (Authorised Push Payment) scam: victim sending willingly to mule account \| `reject` \|
	\| HARD-003 \| Structuring / smurfing: just-below-CTR-threshold payments, same UBO \| `reject` \|
	\| HARD-004 \| Legitimate FX correspondent banking settlement — looks alarming, is not \| `approve` \|
	\| HARD-005 \| Insider threat: employee initiating transfers to personal family accounts \| `escalate` \|
	\| HARD-006 \| Ghost account: dormant 5 years, suddenly received 20 inbound transfers this week \| `flag` \|
	\| HARD-007 \| SIM-swap attack: phone ported 6 hours ago; account now requesting large crypto withdrawal to new address \| `reject` \|
	\| HARD-008 \| Romance scam / pig butchering: 4th escalating transfer to overseas 'romantic partner' met online \| `reject` \|
	\| HARD-009 \| Synthetic identity fraud: new business account with AI-generated-looking perfect profile \| `escalate` \|
	\| HARD-010 \| Payroll diversion: HR system breach rerouted employee salary to newly added account \| `reject` \|

	### Critical (6 tasks — regulatory + multi-step investigation chains)

	\| ID \| Description \| Correct Action \|
	\|----------\|-------------\|----------------\|
	\| CRIT-001 \| Multi-step chain: large PE wire to new counterparty; inspect then request docs before deciding (chain of 3) \| `approve` \|
	\| CRIT-002 \| Fraud ring: coordinated small payments from 3 related accounts aggregating above reporting threshold; SAR required \| `reject` \|
	\| CRIT-003 \| Trade-based money laundering: over-invoiced international trade payment (4× market price) \| `escalate` \|
	\| CRIT-004 \| Compromised corporate account: geo-impossible login (NY → Lagos in 8 min); confirmed account takeover \| `reject` \|
	\| CRIT-005 \| OFAC sanctions evasion: large USD payment routed through UAE shell chain; UBO is on SDN list (chain of 3) \| `reject` \|
	\| CRIT-006 \| Correspondent banking: partner bank added to FinCEN 311 Special Measures list; in-flight payments must be escalated \| `escalate` \|

	---

	## Reward Design

	\| Outcome \| Reward \|
	\|---------\|--------\|
	\| Correct action \| +1.0 \|
	\| Partial-credit adjacent action (per-task) \| +0.2 – +0.6 \|
	\| `inspect` (information seeking, first time) \| +0.15 \|
	\| `approve` when correct is `reject` / `escalate` \| −1.0 \|
	\| `approve` when correct is `flag` / `hold` \| −0.5 \|
	\| `reject` when correct is `approve` \| −0.5 \|
	\| Any other wrong action \| −0.25 \|

	The episode score (0–1) is: `max(0, total_reward) / max_possible_reward`.
	A score ≥ 0.5 is considered a passing episode.

	---

	## API Endpoints

	\| Method \| Path \| Description \|
	\|--------\|------\|-------------\|
	\| `POST` \| `/reset` \| Reset environment, return first observation \|
	\| `POST` \| `/step` \| Execute an action \|
	\| `GET` \| `/state` \| Current internal environment state \|
	\| `GET` \| `/schema` \| JSON schemas for action / observation / state \|
	\| `GET` \| `/tasks` \| Full task list with metadata \|
	\| `GET` \| `/grader` \| Grade the current episode \|
	\| `POST` \| `/baseline` \| Run rule-based baseline and return scores \|
	\| `GET` \| `/health` \| Health check \|
	\| `WS` \| `/ws` \| WebSocket persistent session \|

	Interactive API docs: `http://localhost:8000/docs`

	---

	## Setup & Running

	### Local (Python)

	```bash
	# 1. Install dependencies
	pip install -r requirements.txt

	# 2. Start the server (from the parent directory of payops_env)
	PYTHONPATH=$(pwd) uvicorn payops_env.server.app:app --host 0.0.0.0 --port 8000

	# 3. Verify
	curl http://localhost:8000/health
	```

	### Run the baseline agent

	```bash
	# Via the API endpoint (no extra script needed)
	curl -s -X POST http://localhost:8000/baseline \| python3 -m json.tool
	```

	### Docker

	```bash
	# Build
	docker build -t payops-env .

	# Run locally on port 8000
	docker run -p 8000:7860 -e PORT=7860 payops-env

	# Verify
	curl http://localhost:8000/health
	```

	### HuggingFace Space

	The `Dockerfile` exposes port 7860 (HF Spaces default). Push the repo to
	a HF Space with Docker runtime — no additional configuration required.

	---

	## Example Agent Interaction

	```python
	import httpx

	base = "http://localhost:8000"

	# Reset
	obs = httpx.post(f"{base}/reset").json()
	print(obs["transaction_id"], obs["risk_score"], obs["flags"])

	# Step
	while not obs["done"]:
	# ... agent decides action_type ...
	obs = httpx.post(f"{base}/step", json={
	"action_type": "approve",
	"transaction_id": obs["transaction_id"],
	}).json()
	print(f"reward={obs['reward']:+.2f} done={obs['done']}")

	# Grade
	score = httpx.get(f"{base}/grader").json()
	print(f"Episode score: {score['normalised_score']:.4f}")
	```

	---

	## Baseline Results

	### Rule-based baseline (`POST /baseline`)

	The rule-based baseline uses a deterministic priority-ordered policy in `scripts_util.py`.

	\| Metric \| Rule-based baseline (v2, 30 tasks) \|
	\|--------\|------------------------------------\|
	\| Normalised score \| 0.68–0.76 \|
	\| Passed (≥ 0.5) \| Yes \|
	\| Strong at \| Easy tasks, clear velocity/flag patterns \|
	\| Weak at \| Hard adversarial tasks (HARD-001 model-poisoning, HARD-004 FX settlement) \|
	\| Critical coverage \| Partial — misses some SAR filing requirements \|

	Scores vary slightly per run due to per-episode parameter jitter.

	Run `POST /baseline` to reproduce.

	### LLM baseline (`inference.py` — `llama-3.1-8b-instant` via Groq)

	Run locally against seed 42 (reproducible) with investigation sub-actions enabled.

	\| Metric \| llama-3.1-8b-instant (Groq) \|
	\|--------\|-----------------------------\|
	\| Normalised score \| 0.6028 \|
	\| Total reward \| 17.000 / 28.200 max \|
	\| Tasks correct \| 6 / 20 (30%) \|
	\| Budget spent \| 5.50 / 5.00 \|
	\| Budget penalty \| 0.05 \|
	\| Episode steps \| 57 (incl. investigation sub-actions) \|
	\| Duration \| ~290 s \|
	\| Passed (≥ 0.5) \| YES ✓ \|
	\| Seed \| 42 (fixed — deterministic across re-runs) \|

	Per-task decisions:

	\| Task \| LLM Action \| Correct Action \| Weighted Reward \|
	\|------\|-----------\|----------------\|----------------\|
	\| EASY-001 \| `approve` \| `approve` \| +1.000 ✓ \|
	\| EASY-002 \| `flag` \| `reject` \| −0.250 ✗ (flag no longer partial credit) \|
	\| EASY-003 \| `approve` \| `approve` \| +1.000 ✓ \|
	\| EASY-004 \| `flag` \| `flag` \| +1.000 ✓ \|
	\| MED-001 \| `flag` \| `escalate` \| +0.900 (partial + investigation bonus) \|
	\| MED-002 \| `flag` \| `hold` \| +0.540 (partial + investigation bonus) \|
	\| MED-003 \| `flag` \| `flag` \| +1.200 ✓ \|
	\| MED-004 \| `flag` \| `flag` \| +1.200 ✓ \|
	\| MED-005 \| `flag` \| `hold` \| +0.660 (partial + investigation bonus) \|
	\| MED-006 \| `flag` \| `escalate` \| +0.600 (partial + investigation bonus) \|
	\| HARD-001 \| `flag` \| `escalate` \| +1.275 (partial + investigation bonus) \|
	\| HARD-002 \| `flag` \| `reject` \| +0.525 (partial + investigation bonus) \|
	\| HARD-003 \| `flag` \| `reject` \| +0.675 (partial + investigation bonus) \|
	\| HARD-004 \| `flag` \| `approve` \| +0.825 (partial + investigation bonus) \|
	\| HARD-005 \| `flag` \| `escalate` \| +0.825 (partial + investigation bonus) \|
	\| HARD-006 \| `flag` \| `flag` \| +2.025 ✓ (+ investigation bonus) \|
	\| CRIT-001 \| `flag` \| `approve` \| +1.100 (partial + investigation bonus) \|
	\| CRIT-002 \| `flag` \| `reject` \| +0.900 (partial + investigation bonus) \|
	\| CRIT-003 \| `flag` \| `escalate` \| +1.300 (partial + investigation bonus) \|
	\| CRIT-004 \| `flag` \| `reject` \| −0.250 ✗ \|

	Observations: The model used investigation sub-actions (`inspect`, `verify_kyc`, `contact_sender`) before terminal decisions, earning investigation bonuses that raised the score from a naive always-flag baseline. Easy cases with clear evidence now penalise lazy `flag` decisions (e.g. EASY-002). Agents that correctly identify terminal actions on top of proper investigation can exceed 0.90.

	To reproduce exactly (seed=42 is the default):

	```bash
	export OPENAI_API_KEY="gsk_..." # your Groq API key
	export API_BASE_URL="https://api.groq.com/openai/v1"
	export MODEL_NAME="llama-3.1-8b-instant"
	export PAYOPS_BASE_URL="https://padmapriyagosakan-payops-env.hf.space"
	# INFERENCE_SEED=42 # default; set to "random" for a fresh episode
	PYTHONPATH=$(pwd) python payops_env/inference.py
	```

	For Groq setup instructions see the Running inference with Groq section below.

	---

	## Running inference with Groq (recommended — free)

	[Groq](https://console.groq.com) provides a completely free API with no monthly credit cap and no installation required. It uses the same OpenAI-compatible interface that `inference.py` already targets.

	### Prerequisites

	1. Create a free Groq account — go to [console.groq.com](https://console.groq.com) and sign up (Google / GitHub login available)

	2. Generate an API key — click API Keys → Create API Key, copy the key (starts with `gsk_`)

	3. Install the Python dependency (already in `requirements.txt`):
	```bash
	pip install openai
	```

	### Run inference

	```bash
	cd /path/to/payops_env # project root (parent of payops_env/)

	export OPENAI_API_KEY="gsk_..." # your Groq API key
	export API_BASE_URL="https://api.groq.com/openai/v1"
	export MODEL_NAME="llama-3.1-8b-instant"
	export PAYOPS_BASE_URL="https://padmapriyagosakan-payops-env.hf.space"

	PYTHONPATH=$(pwd) python payops_env/inference.py
	```

	> Why Groq?
	> - Free tier: 14,400 requests/day, 500,000 tokens/minute — a 20-task episode uses ~30 calls
	> - No monthly credit pool that runs out mid-run (unlike the HF free tier)
	> - No installation or model download (unlike Ollama)
	> - `temperature=0.0` is already set in `inference.py` so results are reproducible
	> - Inference speed: ~750 tok/s → full episode completes in under 30 seconds

	### Alternative free models on Groq

	\| Model \| Notes \|
	\|-------\|-------\|
	\| `llama-3.1-8b-instant` \| Fastest, good reasoning \|
	\| `llama-3.3-70b-versatile` \| Best quality on hard tasks; same free tier \|
	\| `mixtral-8x7b-32768` \| Large context window \|
	\| `gemma2-9b-it` \| Google Gemma 2 \|

	### Alternative: Ollama (fully local, no internet required for LLM calls)

	If you prefer to run the model entirely on your machine:

	```bash
	# 1. Install
	brew install ollama

	# 2. Pull a model (choose based on available RAM)
	ollama pull qwen2.5:3b # ~2 GB – 8 GB RAM
	ollama pull qwen2.5:7b # ~4.7 GB – 16 GB RAM

	# 3. Start the server (keep running in a separate terminal)
	ollama serve

	# 4. Run inference
	export OPENAI_API_KEY=ollama
	export API_BASE_URL="http://localhost:11434/v1"
	export MODEL_NAME="qwen2.5:3b"
	export PAYOPS_BASE_URL="https://padmapriyagosakan-payops-env.hf.space"
	PYTHONPATH=$(pwd) python payops_env/inference.py
	```

	---

	## Project Structure

	```
	payops_env/
	├── models.py # PayOpsAction, PayOpsObservation, PayOpsState (Pydantic)
	├── environment.py # PayOpsEnvironment — reset_async / step_async / state
	├── tasks.py # 30 tasks (EASY×6, MED×8, HARD×10, CRIT×6) with ground-truth labels
	├── grader.py # Partial-credit reward function + episode grader
	├── scripts_util.py # Baseline runner helper (used by /baseline endpoint)
	├── server/
	│ └── app.py # FastAPI server with all required endpoints
	├── inference.py # Competition inference script (OpenAI client, root-level)
	├── validate.py # Pre-submission checklist validator
	├── openenv.yaml # OpenEnv manifest v2.0.0
	├── Dockerfile # Docker / HuggingFace Space container (port 7860)
	├── requirements.txt # Python dependencies
	└── README.md # This file
	```

	---

	## Evaluation Criteria Alignment

	\| Criterion \| Implementation \|
	\|-----------\|---------------\|
	\| Real-world utility \| Payment fraud and compliance triage — deployed daily by fintech ops teams worldwide \|
	\| Task & grader quality \| 30 tasks across 4 difficulty tiers (easy→critical); partial-credit grader; clear pass/fail \|
	\| Environment design \| 30-field observation space; 10-action space (5 terminal + 5 investigation); budget mechanic; episode state tracking \|
	\| Code quality & spec compliance \| Pydantic v2 models; async API; all 11 required endpoints; openenv.yaml v2; Dockerfile; validate.py \|
	\| Creativity & novelty \| Adversarial model-poisoning task; APP scam; AML structuring with SAR requirement; PEP detection \|


	---

	## Reward Design (v2 — Trajectory-Based)

	Rewards are dense across the full trajectory, not just on the final decision:

	\| Component \| Value \| Condition \|
	\|-----------\|-------\|-----------\|
	\| Correct terminal action \| +1.0 \| per task (difficulty-weighted in episode score) \|
	\| Investigation sub-action \| +0.15 \| per eligible sub-action, first use only \|
	\| Flag identification \| +0.20 \| agent used `inspect` AND key diagnostic flags present \|
	\| Confidence bonus \| +0.10 \| confidence ≥ 0.8 AND correct \|
	\| Confidence penalty \| −0.10 \| confidence ≥ 0.8 AND wrong \|
	\| Regulatory SAR bonus \| +0.20 \| `file_sar` before terminal on a regulatory task \|
	\| Duplicate investigation \| −0.05 \| same sub-action used twice on same task \|
	\| Approve a fraud/sanctioned \| −1.00 \| worst mistake \|

	Difficulty weights: easy×1.0, medium×1.2, hard×1.5, critical×2.0
	Episode score is strictly clamped to `[0.0, 1.0]`. Passing threshold: 0.5.

	### Per-Episode Parameter Jitter

	Each `POST /reset` generates a unique `episode_seed` and applies small random perturbations to prevent agent overfitting:

	\| Field \| Jitter \|
	\|-------\|--------\|
	\| `amount` \| × Uniform(0.85, 1.20) \|
	\| `risk_score` \| + Gauss(0, 0.03), clamped [0,1] \|
	\| `velocity_1h` \| + Randint(−3, +3), min 0 \|
	\| `velocity_24h` \| + Randint(−3, +3), min 0 \|

	The `correct_action` and all ground-truth labels are never changed — only the observable values the agent uses to make decisions.

	The `episode_seed` is returned by `GET /health` and `GET /state` for reproducibility.

	### Network Graph

	Selected tasks include a `network_graph` field in the observation exposing mule-chain / correspondent-banking relationships (e.g. victim → mule → offshore). This gives agents richer context for complex fraud patterns.