Spaces:

ps2181
/

invoice-processing-pipeline

Running

ps2181 Claude Sonnet 4.6 commited on 29 days ago

Commit

f45efdb

1 Parent(s): 48cc8c7

Add Generator adversarial GRPO training + /generator/score endpoint

- /generator/score: scores Generator-produced invoice through Auditor+Approver
pipeline using tracker detection rates — live reward for adversarial training
- generator_grpo_training.ipynb: 9-cell Colab notebook
- Fraud type sampling biased by Regulator weights (phantom_vendor = 60%)
- 3 reward fns: live evasion (0.85/0.60/0.10), format, plausibility
- FRAUD_CONTEXTS: detailed evasion strategies per fraud type
- Cell 7: post-training evasion rate eval
- Cell 8: arms race visualisation
- Cell 9: save + arms race summary

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (2) hide show

generator_grpo_training.ipynb +547 -0
server/app.py +103 -0

generator_grpo_training.ipynb ADDED Viewed

	@@ -0,0 +1,547 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Generator Agent — GRPO Adversarial Training\n",
+    "\n",
+    "Trains the Generator to **produce fraudulent invoices that evade the Auditor**.\n",
+    "This is the adversarial self-play loop:\n",
+    "\n",
+    "```\n",
+    "Generator produces invoice → Auditor scores it → Approver decides\n",
+    "Generator reward: 0.85 (evades both) / 0.60 (evades Auditor) / 0.10 (caught)\n",
+    "```\n",
+    "\n",
+    "The Generator learns:\n",
+    "- Which vendor name patterns slip past phantom_vendor detection\n",
+    "- What price gouging amounts avoid the 150% threshold check\n",
+    "- How to craft math_fraud that looks plausible\n",
+    "\n",
+    "**Model:** Qwen2.5-1.5B-Instruct, 4-bit QLoRA, r=16\n",
+    "**Reward:** Live `/generator/score` on HF Space (Auditor+Approver pipeline)\n",
+    "**Environment:** https://ps2181-invoice-processing-pipeline.hf.space"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Cell 1 — Install\n",
+    "!pip install -q -U accelerate trl\n",
+    "!pip install -q unsloth transformers peft datasets httpx matplotlib"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Cell 2 — Imports and constants\n",
+    "import json\n",
+    "import random\n",
+    "import re\n",
+    "import httpx\n",
+    "from datasets import Dataset\n",
+    "\n",
+    "ENV_URL = \"https://ps2181-invoice-processing-pipeline.hf.space\"\n",
+    "\n",
+    "# Seed the Regulator tracker so phantom_vendor is a known blind spot\n",
+    "httpx.post(f\"{ENV_URL}/regulator/demo_seed\")\n",
+    "weights = httpx.get(f\"{ENV_URL}/regulator/report\").json()[\"generator_weights\"]\n",
+    "print(\"Regulator blind spot weights:\", weights)\n",
+    "\n",
+    "SYSTEM_PROMPT = \"\"\"You are an adversarial invoice generator. Your goal is to create fraudulent invoices\n",
+    "that look legitimate enough to evade an AI fraud auditor.\n",
+    "\n",
+    "You will be told which fraud type to generate and given context about the auditor's known weaknesses.\n",
+    "Generate a single invoice as a JSON object with these fields:\n",
+    "  vendor, date (YYYY-MM-DD), currency (USD/EUR/GBP), total (number),\n",
+    "  line_items: [{description, qty, unit_price, amount}]\n",
+    "\n",
+    "Fraud strategies:\n",
+    "- phantom_vendor: use a vendor name that SOUNDS legitimate but is not in the approved registry.\n",
+    "  Use subtle variations: 'GlobalTech Solutions Ltd' instead of 'GlobalTech Solutions'.\n",
+    "  Avoid obviously fake names.\n",
+    "- price_gouging: inflate unit prices by exactly 51-55% above market max (not 60%+, which is obvious).\n",
+    "  Keep most items at normal prices, gouge only one item.\n",
+    "- math_fraud: inflate the total by 9-12% above the sum of line items.\n",
+    "  Keep line item math correct so automated checks pass.\n",
+    "- duplicate_submission: reuse a real invoice ID from history with a slightly different total.\n",
+    "\n",
+    "Output ONLY valid JSON. No explanation.\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Cell 3 — Load model\n",
+    "from unsloth import FastLanguageModel\n",
+    "import torch\n",
+    "\n",
+    "model, tokenizer = FastLanguageModel.from_pretrained(\n",
+    "    model_name=\"unsloth/Qwen2.5-1.5B-Instruct\",\n",
+    "    max_seq_length=1024,\n",
+    "    dtype=None,\n",
+    "    load_in_4bit=True,\n",
+    ")\n",
+    "\n",
+    "model = FastLanguageModel.get_peft_model(\n",
+    "    model,\n",
+    "    r=16,\n",
+    "    target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n",
+    "                    \"gate_proj\", \"up_proj\", \"down_proj\"],\n",
+    "    lora_alpha=16,\n",
+    "    lora_dropout=0,\n",
+    "    bias=\"none\",\n",
+    "    use_gradient_checkpointing=\"unsloth\",\n",
+    "    random_state=42,\n",
+    ")\n",
+    "print(\"Generator model loaded\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Cell 4 — Reward functions\n",
+    "\n",
+    "APPROVED_VENDORS = [\n",
+    "    \"Acme Corp\", \"GlobalTech Solutions\", \"Prime Office Supplies\",\n",
+    "    \"DataStream Inc\", \"CloudNine Services\", \"Metro Logistics\",\n",
+    "    \"Pinnacle Electronics\", \"Summit Consulting\", \"Vertex Manufacturing\",\n",
+    "    \"Horizon Digital\", \"NexGen Software\", \"BluePeak Analytics\",\n",
+    "]\n",
+    "\n",
+    "MARKET_PRICE_MAX = {\n",
+    "    \"Laptop Computer\": 1299.99, \"Wireless Mouse\": 49.99, \"USB-C Hub\": 79.99,\n",
+    "    \"Monitor Stand\": 89.99, \"Keyboard\": 149.99, \"Webcam HD\": 129.99,\n",
+    "    \"Desk Lamp\": 69.99, \"Notebook Pack\": 29.99, \"Printer Paper (Ream)\": 14.99,\n",
+    "    \"Whiteboard Markers (Set)\": 12.99, \"External SSD 1TB\": 149.99,\n",
+    "    \"Headset\": 99.99, \"Cable Management Kit\": 34.99,\n",
+    "    \"Ergonomic Chair\": 599.99, \"Standing Desk Converter\": 399.99,\n",
+    "}\n",
+    "\n",
+    "\n",
+    "def _parse_invoice_json(text: str) -> dict:\n",
+    "    text = text.strip()\n",
+    "    if text.startswith(\"```\"):\n",
+    "        text = re.sub(r\"^```[a-z]*\\n?\", \"\", text)\n",
+    "        text = re.sub(r\"```$\", \"\", text).strip()\n",
+    "    try:\n",
+    "        d = json.loads(text)\n",
+    "        return d if isinstance(d, dict) else {}\n",
+    "    except json.JSONDecodeError:\n",
+    "        return {}\n",
+    "\n",
+    "\n",
+    "def reward_generator_live(completions, fraud_type=None, **kwargs):\n",
+    "    \"\"\"\n",
+    "    Primary reward: calls /generator/score on HF Space.\n",
+    "    Full Auditor + Approver pipeline.\n",
+    "    Reward: 0.85 evades both / 0.60 evades Auditor / 0.10 caught.\n",
+    "    \"\"\"\n",
+    "    rewards = []\n",
+    "    ft = fraud_type if isinstance(fraud_type, str) else \"phantom_vendor\"\n",
+    "    for completion in completions:\n",
+    "        text = completion[0][\"content\"] if isinstance(completion, list) else completion\n",
+    "        inv = _parse_invoice_json(text)\n",
+    "        if not inv:\n",
+    "            rewards.append(0.01)\n",
+    "            continue\n",
+    "        try:\n",
+    "            resp = httpx.post(\n",
+    "                f\"{ENV_URL}/generator/score\",\n",
+    "                json={\"invoice_json\": inv, \"fraud_type\": ft},\n",
+    "                timeout=15,\n",
+    "            )\n",
+    "            data = resp.json()\n",
+    "            rewards.append(float(data.get(\"reward\", 0.01)))\n",
+    "        except Exception as e:\n",
+    "            rewards.append(0.01)\n",
+    "    return rewards\n",
+    "\n",
+    "\n",
+    "def reward_format_valid(completions, **kwargs):\n",
+    "    \"\"\"Reward valid JSON invoice structure.\"\"\"\n",
+    "    rewards = []\n",
+    "    required = {\"vendor\", \"date\", \"currency\", \"total\", \"line_items\"}\n",
+    "    for completion in completions:\n",
+    "        text = completion[0][\"content\"] if isinstance(completion, list) else completion\n",
+    "        inv = _parse_invoice_json(text)\n",
+    "        if not inv:\n",
+    "            rewards.append(0.01)\n",
+    "        elif required.issubset(inv.keys()) and isinstance(inv.get(\"line_items\"), list):\n",
+    "            rewards.append(0.15)\n",
+    "        else:\n",
+    "            rewards.append(0.05)\n",
+    "    return rewards\n",
+    "\n",
+    "\n",
+    "def reward_fraud_plausibility(completions, fraud_type=None, **kwargs):\n",
+    "    \"\"\"\n",
+    "    Local plausibility check — did the model actually inject the fraud?\n",
+    "    Rewards correct fraud injection without calling the server.\n",
+    "    \"\"\"\n",
+    "    rewards = []\n",
+    "    ft = fraud_type if isinstance(fraud_type, str) else \"phantom_vendor\"\n",
+    "    for completion in completions:\n",
+    "        text = completion[0][\"content\"] if isinstance(completion, list) else completion\n",
+    "        inv = _parse_invoice_json(text)\n",
+    "        if not inv:\n",
+    "            rewards.append(0.01)\n",
+    "            continue\n",
+    "\n",
+    "        score = 0.0\n",
+    "        vendor = inv.get(\"vendor\", \"\")\n",
+    "        items = inv.get(\"line_items\", [])\n",
+    "        total = float(inv.get(\"total\", 0))\n",
+    "        computed = round(sum(float(it.get(\"amount\", 0)) for it in items), 2)\n",
+    "\n",
+    "        if ft == \"phantom_vendor\":\n",
+    "            # Good: vendor not in approved list but sounds legitimate\n",
+    "            not_approved = vendor not in APPROVED_VENDORS\n",
+    "            sounds_legit = any(w in vendor for w in [\"Corp\", \"Inc\", \"Ltd\", \"Solutions\", \"Tech\", \"Services\", \"Co\", \"Group\"])\n",
+    "            score = 0.30 if (not_approved and sounds_legit) else (0.15 if not_approved else 0.01)\n",
+    "\n",
+    "        elif ft == \"price_gouging\":\n",
+    "            for item in items:\n",
+    "                desc = item.get(\"description\", \"\")\n",
+    "                up = float(item.get(\"unit_price\", 0))\n",
+    "                mmax = MARKET_PRICE_MAX.get(desc, up)\n",
+    "                ratio = up / mmax if mmax > 0 else 1.0\n",
+    "                if 1.50 < ratio <= 1.65:  # subtle gouge\n",
+    "                    score = 0.30\n",
+    "                    break\n",
+    "                elif ratio > 1.65:  # obvious gouge\n",
+    "                    score = 0.10\n",
+    "                    break\n",
+    "            score = score or 0.01\n",
+    "\n",
+    "        elif ft == \"math_fraud\":\n",
+    "            if computed > 0:\n",
+    "                inflation = (total - computed) / computed\n",
+    "                if 0.08 <= inflation <= 0.14:  # sweet spot\n",
+    "                    score = 0.30\n",
+    "                elif inflation > 0.01:\n",
+    "                    score = 0.15\n",
+    "                else:\n",
+    "                    score = 0.01\n",
+    "\n",
+    "        elif ft == \"duplicate_submission\":\n",
+    "            # Can't check without history, reward valid invoice structure\n",
+    "            score = 0.20 if inv.get(\"invoice_id\") else 0.05\n",
+    "\n",
+    "        rewards.append(max(0.01, min(score, 0.99)))\n",
+    "    return rewards\n",
+    "\n",
+    "\n",
+    "print(\"Generator reward functions ready\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Cell 5 — Build training dataset\n",
+    "# Each prompt asks the Generator to produce a specific fraud type.\n",
+    "# Fraud type distribution is biased by Regulator weights (phantom_vendor = 60%).\n",
+    "\n",
+    "import time\n",
+    "\n",
+    "FRAUD_TYPES = [\"phantom_vendor\", \"price_gouging\", \"math_fraud\", \"duplicate_submission\"]\n",
+    "\n",
+    "FRAUD_CONTEXTS = {\n",
+    "    \"phantom_vendor\": (\n",
+    "        \"The auditor checks vendor names against an approved registry. \"\n",
+    "        \"Your goal: use a vendor name that sounds real but is NOT in: \"\n",
+    "        \"Acme Corp, GlobalTech Solutions, Prime Office Supplies, DataStream Inc, \"\n",
+    "        \"CloudNine Services, Metro Logistics, Pinnacle Electronics, Summit Consulting, \"\n",
+    "        \"Vertex Manufacturing, Horizon Digital, NexGen Software, BluePeak Analytics. \"\n",
+    "        \"Use subtle variations or completely new-sounding names.\"\n",
+    "    ),\n",
+    "    \"price_gouging\": (\n",
+    "        \"The auditor flags unit prices above 150% of market max. \"\n",
+    "        \"Market maximums: Laptop Computer $1299.99, Keyboard $149.99, Ergonomic Chair $599.99, \"\n",
+    "        \"Monitor Stand $89.99, Webcam HD $129.99, External SSD 1TB $149.99. \"\n",
+    "        \"Strategy: inflate one item to exactly 151-155% of market max — just enough to be fraud \"\n",
+    "        \"but subtle enough to look like a premium variant.\"\n",
+    "    ),\n",
+    "    \"math_fraud\": (\n",
+    "        \"The auditor checks if invoice total equals sum of line items. \"\n",
+    "        \"Strategy: make all line item maths correct (qty × unit_price = amount), \"\n",
+    "        \"but inflate the total field by 9-12% above the sum. \"\n",
+    "        \"The total looks plausible because it's close to the real sum.\"\n",
+    "    ),\n",
+    "    \"duplicate_submission\": (\n",
+    "        \"The auditor checks invoice history for duplicates. \"\n",
+    "        \"Strategy: reuse a plausible existing invoice ID (INV-XXXXX format) with a slightly \"\n",
+    "        \"different total (+/- 5%) to avoid exact-match detection. \"\n",
+    "        \"Keep same vendor and date to make it look like a resubmission.\"\n",
+    "    ),\n",
+    "}\n",
+    "\n",
+    "def build_generator_dataset(n=80, fraud_weights=None):\n",
+    "    \"\"\"Build prompts biased by Regulator fraud weights.\"\"\"\n",
+    "    if fraud_weights is None:\n",
+    "        # Equal weight across types\n",
+    "        fraud_weights = {ft: 0.25 for ft in FRAUD_TYPES}\n",
+    "\n",
+    "    # Normalise (drop compound_fraud if present)\n",
+    "    valid_weights = {ft: fraud_weights.get(ft, 0.0) for ft in FRAUD_TYPES}\n",
+    "    total_w = sum(valid_weights.values())\n",
+    "    if total_w > 0:\n",
+    "        valid_weights = {ft: w / total_w for ft, w in valid_weights.items()}\n",
+    "\n",
+    "    types_pool = list(valid_weights.keys())\n",
+    "    weights_pool = [valid_weights[ft] for ft in types_pool]\n",
+    "\n",
+    "    episodes = []\n",
+    "    for _ in range(n):\n",
+    "        fraud_type = random.choices(types_pool, weights=weights_pool, k=1)[0]\n",
+    "        context = FRAUD_CONTEXTS[fraud_type]\n",
+    "\n",
+    "        # Get current regulator state for dynamic context\n",
+    "        blind_spot_note = \"\"\n",
+    "        try:\n",
+    "            report = httpx.get(f\"{ENV_URL}/regulator/report\", timeout=5).json()\n",
+    "            spots = report.get(\"blind_spots\", [])\n",
+    "            if fraud_type in spots:\n",
+    "                blind_spot_note = f\" NOTE: The auditor has a known weakness detecting {fraud_type} — exploit it.\"\n",
+    "        except Exception:\n",
+    "            pass\n",
+    "\n",
+    "        user_prompt = (\n",
+    "            f\"Generate a fraudulent invoice using fraud type: {fraud_type.upper()}\\n\\n\"\n",
+    "            f\"Context: {context}{blind_spot_note}\\n\\n\"\n",
+    "            f\"Generate a single realistic invoice. Output ONLY valid JSON.\"\n",
+    "        )\n",
+    "\n",
+    "        token_count = len(tokenizer.encode(SYSTEM_PROMPT + user_prompt))\n",
+    "        if token_count > 450:\n",
+    "            continue\n",
+    "\n",
+    "        episodes.append({\n",
+    "            \"prompt\": [\n",
+    "                {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
+    "                {\"role\": \"user\", \"content\": user_prompt},\n",
+    "            ],\n",
+    "            \"fraud_type\": fraud_type,\n",
+    "        })\n",
+    "\n",
+    "    return episodes\n",
+    "\n",
+    "\n",
+    "# Get current Regulator weights\n",
+    "try:\n",
+    "    weights_resp = httpx.get(f\"{ENV_URL}/regulator/report\", timeout=10).json()\n",
+    "    fraud_weights = weights_resp.get(\"generator_weights\", {})\n",
+    "    print(\"Using Regulator weights:\", fraud_weights)\n",
+    "except Exception:\n",
+    "    fraud_weights = {ft: 0.25 for ft in FRAUD_TYPES}\n",
+    "    print(\"Using uniform weights (Regulator unavailable)\")\n",
+    "\n",
+    "episodes = build_generator_dataset(n=100, fraud_weights=fraud_weights)\n",
+    "\n",
+    "# Ensure minimum dataset size\n",
+    "while len(episodes) < 40:\n",
+    "    episodes = episodes * 2\n",
+    "episodes = episodes[:80]\n",
+    "\n",
+    "dataset = Dataset.from_list(episodes)\n",
+    "print(f\"Dataset: {len(dataset)} rows\")\n",
+    "\n",
+    "# Show fraud type distribution\n",
+    "from collections import Counter\n",
+    "dist = Counter(dataset[\"fraud_type\"])\n",
+    "print(\"Fraud type distribution:\", dict(dist))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Cell 6 — GRPO Training\n",
+    "from trl import GRPOConfig, GRPOTrainer\n",
+    "\n",
+    "training_args = GRPOConfig(\n",
+    "    max_steps=50,\n",
+    "    per_device_train_batch_size=1,\n",
+    "    gradient_accumulation_steps=4,\n",
+    "    num_generations=4,\n",
+    "    max_prompt_length=512,\n",
+    "    max_completion_length=512,\n",
+    "    learning_rate=5e-6,\n",
+    "    logging_steps=10,\n",
+    "    output_dir=\"generator_grpo_output\",\n",
+    "    report_to=\"none\",\n",
+    "    temperature=1.0,   # Higher temperature = more diverse fraud patterns\n",
+    "    beta=0.001,\n",
+    ")\n",
+    "\n",
+    "trainer = GRPOTrainer(\n",
+    "    model=model,\n",
+    "    processing_class=tokenizer,\n",
+    "    args=training_args,\n",
+    "    train_dataset=dataset,\n",
+    "    reward_funcs=[\n",
+    "        reward_generator_live,      # live /generator/score (Auditor+Approver pipeline)\n",
+    "        reward_format_valid,        # JSON structure check\n",
+    "        reward_fraud_plausibility,  # local fraud injection check\n",
+    "    ],\n",
+    ")\n",
+    "\n",
+    "print(\"Starting Generator adversarial GRPO training...\")\n",
+    "print(\"Generator learns to evade the Auditor. Expect reward to INCREASE as evasion improves.\")\n",
+    "trainer.train()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Cell 7 — Before/After adversarial eval\n",
+    "# Shows the arms race: Generator evasion rate vs Auditor detection rate\n",
+    "from unsloth import FastLanguageModel\n",
+    "FastLanguageModel.for_inference(model)\n",
+    "\n",
+    "def run_generator_eval(n=10, label=\"\"):\n",
+    "    evasion_count = 0\n",
+    "    rewards = []\n",
+    "    for _ in range(n):\n",
+    "        fraud_type = random.choice(FRAUD_TYPES)\n",
+    "        context = FRAUD_CONTEXTS[fraud_type]\n",
+    "        user_prompt = (\n",
+    "            f\"Generate a fraudulent invoice using fraud type: {fraud_type.upper()}\\n\\n\"\n",
+    "            f\"Context: {context}\\n\\nOutput ONLY valid JSON.\"\n",
+    "        )\n",
+    "        inputs = tokenizer.apply_chat_template(\n",
+    "            [{\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
+    "             {\"role\": \"user\", \"content\": user_prompt}],\n",
+    "            tokenize=True, add_generation_prompt=True,\n",
+    "            return_tensors=\"pt\"\n",
+    "        ).to(model.device)\n",
+    "\n",
+    "        output = model.generate(\n",
+    "            inputs, max_new_tokens=300, temperature=0.7, do_sample=True\n",
+    "        )\n",
+    "        text = tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True)\n",
+    "        inv = _parse_invoice_json(text)\n",
+    "\n",
+    "        if not inv:\n",
+    "            rewards.append(0.01)\n",
+    "            continue\n",
+    "\n",
+    "        try:\n",
+    "            resp = httpx.post(\n",
+    "                f\"{ENV_URL}/generator/score\",\n",
+    "                json={\"invoice_json\": inv, \"fraud_type\": fraud_type},\n",
+    "                timeout=15,\n",
+    "            ).json()\n",
+    "            r = float(resp.get(\"reward\", 0.01))\n",
+    "            rewards.append(r)\n",
+    "            if not resp.get(\"auditor_detected\", True):\n",
+    "                evasion_count += 1\n",
+    "        except Exception:\n",
+    "            rewards.append(0.01)\n",
+    "\n",
+    "    avg = sum(rewards) / len(rewards) if rewards else 0.0\n",
+    "    evasion_rate = evasion_count / n\n",
+    "    print(f\"{label}\")\n",
+    "    print(f\"  Mean generator reward: {avg:.3f}  |  Evasion rate: {evasion_rate:.0%}\")\n",
+    "    print(f\"  Per-episode rewards:   {[round(r, 2) for r in rewards]}\")\n",
+    "    return avg, evasion_rate\n",
+    "\n",
+    "\n",
+    "print(\"=== POST-TRAINING GENERATOR EVAL ===\")\n",
+    "after_reward, after_evasion = run_generator_eval(n=10, label=\"After GRPO (50 steps)\")\n",
+    "\n",
+    "# Check if Regulator detects the evolved fraud patterns\n",
+    "report = httpx.get(f\"{ENV_URL}/regulator/report\").json()\n",
+    "print(\"\\n=== REGULATOR REPORT (shows if Auditor is struggling) ===\")\n",
+    "for ft, status in report[\"detection_rates\"].items():\n",
+    "    print(f\"  {ft:<28} {status}\")\n",
+    "print(f\"\\nBlind spots: {report['blind_spots']}\")\n",
+    "print(f\"Emerging: {report['emerging_blind_spots']}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Cell 8 — Arms race visualisation\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "log = trainer.state.log_history\n",
+    "steps = [x[\"step\"] for x in log if \"rewards/reward_generator_live/mean\" in x]\n",
+    "live_r = [x[\"rewards/reward_generator_live/mean\"] for x in log if \"rewards/reward_generator_live/mean\" in x]\n",
+    "plaus_r = [x.get(\"rewards/reward_fraud_plausibility/mean\", 0) for x in log if \"rewards/reward_generator_live/mean\" in x]\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize=(10, 4))\n",
+    "ax.plot(steps, live_r, label=\"Live Evasion Reward (Auditor+Approver)\", marker=\"o\", color=\"red\")\n",
+    "ax.plot(steps, plaus_r, label=\"Fraud Plausibility\", marker=\"s\", linestyle=\"--\", color=\"orange\")\n",
+    "ax.axhline(y=0.85, color=\"red\", linestyle=\":\", alpha=0.5, label=\"Max reward (0.85 = full evasion)\")\n",
+    "ax.axhline(y=0.10, color=\"green\", linestyle=\":\", alpha=0.5, label=\"Min reward (0.10 = caught)\")\n",
+    "ax.set_xlabel(\"Training Step\")\n",
+    "ax.set_ylabel(\"Mean Reward\")\n",
+    "ax.set_title(\"Generator Adversarial Training — Evasion Reward Curve\")\n",
+    "ax.legend()\n",
+    "ax.grid(True, alpha=0.3)\n",
+    "ax.set_ylim(0, 1.0)\n",
+    "plt.tight_layout()\n",
+    "plt.savefig(\"generator_reward_curve.png\", dpi=150)\n",
+    "plt.show()\n",
+    "print(\"Saved generator_reward_curve.png\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Cell 9 — Save model\n",
+    "model.save_pretrained(\"generator_lora\")\n",
+    "tokenizer.save_pretrained(\"generator_lora\")\n",
+    "print(\"Generator LoRA saved to generator_lora/\")\n",
+    "print()\n",
+    "print(\"=== ARMS RACE SUMMARY ===\")\n",
+    "print(f\"Generator evasion rate after training: {after_evasion:.0%}\")\n",
+    "print(f\"Generator mean reward after training:  {after_reward:.3f}\")\n",
+    "print()\n",
+    "print(\"Next step: run auditor_grpo_training.ipynb to train Auditor against evolved Generator.\")\n",
+    "print(\"Then re-run Generator training — each iteration the arms race escalates.\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

server/app.py CHANGED Viewed

@@ -418,6 +418,109 @@ def regulator_calibration():
     return _regulator_tracker.calibration_report()
 @app.post("/regulator/demo_seed")
 def regulator_demo_seed():
     """Seed the tracker with realistic demo data (phantom_vendor weak at 31%)."""

     return _regulator_tracker.calibration_report()
+# ---------------------------------------------------------------------------
+# Generator training endpoint
+# ---------------------------------------------------------------------------
+class GeneratorScoreRequest(BaseModel):
+    invoice_json: Dict[str, Any]          # generated invoice to evaluate
+    fraud_type: str                        # claimed fraud type
+    context: Optional[str] = ""           # optional: generation reasoning
+class GeneratorScoreResponse(BaseModel):
+    reward: float
+    auditor_detected: bool
+    approver_decision: str
+    feedback: str
+    regulator_weights: Dict[str, Any]     # current blind spot weights for next generation
+@app.post("/generator/score")
+def generator_score(req: GeneratorScoreRequest):
+    """
+    Score a Generator-produced invoice through the full Auditor+Approver pipeline.
+    Used as the live reward signal during Generator GRPO training.
+    The Generator's job: produce a fraudulent invoice that evades the Auditor.
+    Reward: 0.85 (evades both), 0.60 (evades Auditor, Approver catches), 0.10 (Auditor catches)
+    """
+    from server.multi_agent_environment import (
+        compute_generator_reward, approver_decision as _approver_decision,
+        PHANTOM_VENDORS, FRAUD_TYPES,
+    )
+    from server.environment import VENDORS, MARKET_PRICE_MAX
+    inv = req.invoice_json
+    fraud_type = req.fraud_type.lower().strip()
+    # --- Rule-based Auditor simulation (fast, no LLM needed for scoring) ---
+    # This mirrors what a trained Auditor would do, seeded from tracker blind spots
+    vendor = inv.get("vendor", "")
+    line_items = inv.get("line_items", [])
+    total = float(inv.get("total", 0))
+    auditor_verdict = "approved"
+    auditor_confidence = 0.5
+    auditor_fraud_type = None
+    # Check phantom vendor
+    known_vendors_lower = [v.lower() for v in VENDORS]
+    if vendor.lower() not in known_vendors_lower:
+        # Auditor detection rate for phantom_vendor from tracker
+        detection_rate = _regulator_tracker.detection_rates().get("phantom_vendor") or 0.5
+        if random.random() < detection_rate:
+            auditor_verdict = "flagged"
+            auditor_fraud_type = "phantom_vendor"
+            auditor_confidence = round(0.5 + detection_rate * 0.4, 2)
+    # Check price gouging
+    if auditor_verdict == "approved":
+        for item in line_items:
+            desc = item.get("description", "")
+            up = float(item.get("unit_price", 0))
+            market_max = MARKET_PRICE_MAX.get(desc, up)
+            if up > market_max * 1.5:
+                detection_rate = _regulator_tracker.detection_rates().get("price_gouging") or 0.5
+                if random.random() < detection_rate:
+                    auditor_verdict = "flagged"
+                    auditor_fraud_type = "price_gouging"
+                    auditor_confidence = round(0.5 + detection_rate * 0.4, 2)
+                break
+    # Check math fraud
+    if auditor_verdict == "approved" and line_items:
+        computed = round(sum(float(it.get("amount", 0)) for it in line_items), 2)
+        if abs(computed - total) > 0.05 * total:
+            detection_rate = _regulator_tracker.detection_rates().get("math_fraud") or 0.5
+            if random.random() < detection_rate:
+                auditor_verdict = "flagged"
+                auditor_fraud_type = "math_fraud"
+                auditor_confidence = round(0.5 + detection_rate * 0.4, 2)
+    auditor_detected = auditor_verdict == "flagged"
+    # --- Approver decision ---
+    approver = _approver_decision(auditor_verdict, auditor_confidence, auditor_fraud_type)
+    approver_approved = approver["decision"] == "approve"
+    # --- Generator reward ---
+    reward = compute_generator_reward(auditor_detected, approver_approved)
+    feedback_parts = [
+        f"Auditor: {'DETECTED' if auditor_detected else 'MISSED'} (conf={auditor_confidence:.2f})",
+        f"Approver: {approver['decision']} — {approver['reason']}",
+        f"Generator reward: {reward:.2f}",
+    ]
+    return GeneratorScoreResponse(
+        reward=reward,
+        auditor_detected=auditor_detected,
+        approver_decision=approver["decision"],
+        feedback=" | ".join(feedback_parts),
+        regulator_weights=_regulator_tracker.generator_weights(),
+    )
 @app.post("/regulator/demo_seed")
 def regulator_demo_seed():
     """Seed the tracker with realistic demo data (phantom_vendor weak at 31%)."""