Spaces:

DevikaJ2005
/

fraudshield-1

Sleeping

App Files Files Community

DevikaJ2005 commited on Apr 26

Commit

ccd0934

1 Parent(s): 4049a11

Add public-data curriculum and harden LLM agent

Browse files

Files changed (3) hide show

README.md +39 -5
llm_agent_openai.py +105 -25
notebooks/fraudshield_trl_colab.ipynb +214 -71

README.md CHANGED Viewed

@@ -122,6 +122,31 @@ Run the heuristic or configured agent:
 python inference.py
 ```
 Run the OpenEnv API server:
 ```bash
@@ -164,9 +189,12 @@ It is designed to:
 1. install `openenv-core`, `trl`, `unsloth`, `transformers`, `datasets`, and `peft`
 2. clone the repo and install FraudShield
-3. wrap `FraudShieldEnvironment` for GRPO-style training
-4. combine environment reward with a JSON-format reward
-5. train easy -> medium -> hard
 6. save:
    - `reward_curve.png`
    - `loss_curve.png`
@@ -175,7 +203,7 @@ It is designed to:
    - heuristic via `python inference.py`
    - trained model via `LOCAL_MODEL_PATH=... python inference.py`
-At the moment the notebook is wired for this flow, but the real training run still depends on available compute credits.
 ## Results
@@ -186,7 +214,7 @@ Current heuristic baseline, measured with `python inference.py`:
 - Hard: `0.7425`
 - Final: `0.6942`
-This baseline is intentionally rule-based and not trained. It is strong on easy, weaker on medium, and still imperfect on hard, which leaves headroom for the training run.
 Once training is completed, this section should include:
@@ -195,6 +223,12 @@ Once training is completed, this section should include:
 - trained-vs-heuristic comparison table
 - one short qualitative trace comparison
 ## Live Links
 - Hugging Face Space: `https://huggingface.co/spaces/DevikaJ2005/fraudshield-1`

 python inference.py
 ```
+FraudShield supports three agent modes:
+- `heuristic` by default when no model credentials are set
+- `llm_local` when `LOCAL_MODEL_PATH` points to a trained Hugging Face / PEFT checkpoint
+- `llm_remote` when an API-compatible model is configured
+For a no-paid-model open-source setup, the recommended options are:
+### Option 1: Use your locally trained model
+```bash
+LOCAL_MODEL_PATH=trained_policy python inference.py
+```
+### Option 2: Use a Hugging Face hosted open-source model
+```bash
+HF_TOKEN=your_token_here \
+MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct \
+API_BASE_URL=https://router.huggingface.co/v1 \
+python inference.py
+```
+If `HF_TOKEN` is present and `API_BASE_URL` is not set, FraudShield defaults to the Hugging Face router automatically.
 Run the OpenEnv API server:
 ```bash
 1. install `openenv-core`, `trl`, `unsloth`, `transformers`, `datasets`, and `peft`
 2. clone the repo and install FraudShield
+3. load a public fraud curriculum dataset from Hugging Face
+4. build a second-stage training set from real FraudShield rollouts
+5. run two-stage fine-tuning with Unsloth LoRA and TRL `SFTTrainer`
+   - stage 1: public fraud-data adaptation
+   - stage 2: FraudShield policy adaptation
+5. save a reusable local policy checkpoint
 6. save:
    - `reward_curve.png`
    - `loss_curve.png`
    - heuristic via `python inference.py`
    - trained model via `LOCAL_MODEL_PATH=... python inference.py`
+The notebook is designed for Colab + GPU execution and does not require a paid proprietary LLM. The current public curriculum source is `Phoenix21/mock_fraud-detection-dataset`, which gives the model broader fraud-signal exposure before it is adapted to FraudShield actions.
 ## Results
 - Hard: `0.7425`
 - Final: `0.6942`
+This baseline is intentionally rule-based and not trained. It is strong on easy, weaker on medium, and still imperfect on hard, which leaves headroom for a trained policy that can learn broader fraud patterns from public data and then adapt them to FraudShield.
 Once training is completed, this section should include:
 - trained-vs-heuristic comparison table
 - one short qualitative trace comparison
+The preferred final story is:
+- heuristic baseline
+- base open-source LLM or hosted HF model
+- fine-tuned local policy checkpoint
 ## Live Links
 - Hugging Face Space: `https://huggingface.co/spaces/DevikaJ2005/fraudshield-1`

llm_agent_openai.py CHANGED Viewed

@@ -4,6 +4,7 @@ from __future__ import annotations
 import json
 import logging
 from pathlib import Path
 from typing import Any, Dict, Optional
@@ -16,6 +17,38 @@ except ImportError:  # pragma: no cover
 logger = logging.getLogger(__name__)
 class LLMFraudDetectionAgent:
     """OpenAI-compatible LLM agent with heuristic fallback."""
@@ -61,6 +94,7 @@ class LLMFraudDetectionAgent:
             return self._fallback(observation, exc)
     def _build_messages(self, observation) -> list[Dict[str, str]]:
         observation_payload = {
             "case_id": observation.case_id,
             "task_name": observation.task_name.value,
@@ -73,21 +107,15 @@ class LLMFraudDetectionAgent:
             "remaining_sla": observation.remaining_sla,
             "note_required": observation.note_required,
             "allowed_public_actions": [action.value for action in observation.allowed_actions],
-            "available_investigation_aliases": [
-                "merchant_profile",
-                "customer_profile",
-                "network_graph",
-                "device_intel",
-                "payment_trace",
-                "fulfillment_review",
-                "policy_review",
-            ],
             "app_context": observation.app_context,
         }
         system_prompt = (
             "You are a fraud analyst operating inside a simulated investigation workflow. "
             "Only use the visible evidence shown to you. Choose either one investigation alias or one final "
-            "decision. Respond with JSON only using this schema: "
             '{"action_type":"investigate|decide","investigation_target":"string|null",'
             '"decision":"fraud|legitimate|null","confidence":0.0,"reasoning":"one sentence"}'
         )
@@ -101,7 +129,8 @@ class LLMFraudDetectionAgent:
         reasoning = self._normalize_reasoning(payload.get("reasoning"))
         if action_type == "investigate":
             investigation_target = str(payload.get("investigation_target", "")).strip().lower()
-            mapped_action = self._map_investigation_alias(investigation_target)
             return FraudCheckAction(
                 case_id=observation.case_id,
                 action_type=mapped_action,
@@ -128,20 +157,71 @@ class LLMFraudDetectionAgent:
         raise ValueError(f"Unsupported action_type from model: {action_type!r}")
-    def _map_investigation_alias(self, alias: str) -> ActionTypeEnum:
-        mapping = {
-            "merchant_profile": ActionTypeEnum.FETCH_MERCHANT_PROFILE,
-            "customer_profile": ActionTypeEnum.FETCH_CUSTOMER_PROFILE,
-            "network_graph": ActionTypeEnum.FETCH_NETWORK_GRAPH,
-            "device_intel": ActionTypeEnum.FETCH_NETWORK_GRAPH,
-            "payment_trace": ActionTypeEnum.REVIEW_TRANSACTION,
-            "fulfillment_review": ActionTypeEnum.REVIEW_TRANSACTION,
-            "policy_review": ActionTypeEnum.CHECK_POLICY,
-            "trust_notes": ActionTypeEnum.CHECK_POLICY,
-        }
-        if alias not in mapping:
-            raise ValueError(f"Unsupported investigation_target from model: {alias!r}")
-        return mapping[alias]
     def _map_decision_to_resolution(self, decision: str, confidence: float, observation) -> ResolutionEnum:
         if decision not in {"fraud", "legitimate"}:

 import json
 import logging
+import re
 from pathlib import Path
 from typing import Any, Dict, Optional
 logger = logging.getLogger(__name__)
+ACTION_ALIAS_TO_ENUM = {
+    "merchant_profile": ActionTypeEnum.FETCH_MERCHANT_PROFILE,
+    "fetch_merchant_profile": ActionTypeEnum.FETCH_MERCHANT_PROFILE,
+    "customer_profile": ActionTypeEnum.FETCH_CUSTOMER_PROFILE,
+    "fetch_customer_profile": ActionTypeEnum.FETCH_CUSTOMER_PROFILE,
+    "network_graph": ActionTypeEnum.FETCH_NETWORK_GRAPH,
+    "fetch_network_graph": ActionTypeEnum.FETCH_NETWORK_GRAPH,
+    "device_intel": ActionTypeEnum.FETCH_NETWORK_GRAPH,
+    "payment_trace": ActionTypeEnum.REVIEW_TRANSACTION,
+    "fulfillment_review": ActionTypeEnum.REVIEW_TRANSACTION,
+    "review_transaction": ActionTypeEnum.REVIEW_TRANSACTION,
+    "policy_review": ActionTypeEnum.CHECK_POLICY,
+    "check_policy": ActionTypeEnum.CHECK_POLICY,
+    "trust_notes": ActionTypeEnum.CHECK_POLICY,
+}
+ACTION_ENUM_TO_ALIAS = {
+    ActionTypeEnum.REVIEW_TRANSACTION: "payment_trace",
+    ActionTypeEnum.FETCH_CUSTOMER_PROFILE: "customer_profile",
+    ActionTypeEnum.FETCH_MERCHANT_PROFILE: "merchant_profile",
+    ActionTypeEnum.FETCH_NETWORK_GRAPH: "network_graph",
+    ActionTypeEnum.CHECK_POLICY: "policy_review",
+}
+ACTION_ENUM_TO_EVIDENCE_KEY = {
+    ActionTypeEnum.REVIEW_TRANSACTION: "transaction_review",
+    ActionTypeEnum.FETCH_CUSTOMER_PROFILE: "customer_profile",
+    ActionTypeEnum.FETCH_MERCHANT_PROFILE: "merchant_profile",
+    ActionTypeEnum.FETCH_NETWORK_GRAPH: "network_graph",
+    ActionTypeEnum.CHECK_POLICY: "policy_guide",
+}
 class LLMFraudDetectionAgent:
     """OpenAI-compatible LLM agent with heuristic fallback."""
             return self._fallback(observation, exc)
     def _build_messages(self, observation) -> list[Dict[str, str]]:
+        available_aliases = self._available_investigation_aliases(observation)
         observation_payload = {
             "case_id": observation.case_id,
             "task_name": observation.task_name.value,
             "remaining_sla": observation.remaining_sla,
             "note_required": observation.note_required,
             "allowed_public_actions": [action.value for action in observation.allowed_actions],
+            "available_investigation_aliases": available_aliases,
             "app_context": observation.app_context,
         }
         system_prompt = (
             "You are a fraud analyst operating inside a simulated investigation workflow. "
             "Only use the visible evidence shown to you. Choose either one investigation alias or one final "
+            "decision. For investigation_target, you must return exactly one alias from "
+            f"{available_aliases}. Never return placeholders, array expressions, or prose such as "
+            "'available_investigations[0]'. Respond with JSON only using this schema: "
             '{"action_type":"investigate|decide","investigation_target":"string|null",'
             '"decision":"fraud|legitimate|null","confidence":0.0,"reasoning":"one sentence"}'
         )
         reasoning = self._normalize_reasoning(payload.get("reasoning"))
         if action_type == "investigate":
             investigation_target = str(payload.get("investigation_target", "")).strip().lower()
+            mapped_action = self._map_investigation_alias(investigation_target, observation)
+            mapped_action = self._stabilize_investigation_choice(mapped_action, observation)
             return FraudCheckAction(
                 case_id=observation.case_id,
                 action_type=mapped_action,
         raise ValueError(f"Unsupported action_type from model: {action_type!r}")
+    def _map_investigation_alias(self, alias: str, observation) -> ActionTypeEnum:
+        normalized = alias.strip().lower()
+        if normalized in ACTION_ALIAS_TO_ENUM:
+            return ACTION_ALIAS_TO_ENUM[normalized]
+        placeholder_match = re.fullmatch(r"available_investigations\[(\d+)\]", normalized)
+        if placeholder_match:
+            index = int(placeholder_match.group(1))
+            available = self._available_investigation_aliases(observation)
+            if 0 <= index < len(available):
+                return ACTION_ALIAS_TO_ENUM[available[index]]
+        compact = re.sub(r"[^a-z_]", "", normalized.replace("-", "_").replace(" ", "_"))
+        for key, value in ACTION_ALIAS_TO_ENUM.items():
+            if compact == key:
+                return value
+        for key, value in ACTION_ALIAS_TO_ENUM.items():
+            if compact and compact in key:
+                return value
+        available = self._available_investigation_aliases(observation)
+        if len(available) == 1:
+            return ACTION_ALIAS_TO_ENUM[available[0]]
+        raise ValueError(f"Unsupported investigation_target from model: {alias!r}")
+    def _available_investigation_aliases(self, observation) -> list[str]:
+        context_aliases = observation.app_context.get("available_investigations")
+        aliases: list[str] = []
+        if isinstance(context_aliases, list):
+            for alias in context_aliases:
+                normalized = str(alias).strip().lower()
+                if normalized in ACTION_ALIAS_TO_ENUM:
+                    canonical = ACTION_ENUM_TO_ALIAS[ACTION_ALIAS_TO_ENUM[normalized]]
+                    if canonical not in aliases:
+                        aliases.append(canonical)
+        if aliases:
+            return aliases
+        fallback_aliases: list[str] = []
+        for action in observation.allowed_actions:
+            if action in ACTION_ENUM_TO_ALIAS:
+                alias = ACTION_ENUM_TO_ALIAS[action]
+                if alias not in fallback_aliases:
+                    fallback_aliases.append(alias)
+        return fallback_aliases
+    def _stabilize_investigation_choice(self, action_type: ActionTypeEnum, observation) -> ActionTypeEnum:
+        evidence_key = ACTION_ENUM_TO_EVIDENCE_KEY.get(action_type)
+        if evidence_key and evidence_key not in observation.revealed_evidence:
+            return action_type
+        alternatives = []
+        for alias in self._available_investigation_aliases(observation):
+            candidate = ACTION_ALIAS_TO_ENUM[alias]
+            candidate_key = ACTION_ENUM_TO_EVIDENCE_KEY.get(candidate)
+            if candidate_key and candidate_key not in observation.revealed_evidence:
+                alternatives.append(candidate)
+        if alternatives:
+            return alternatives[0]
+        raise ValueError(
+            f"Investigation {action_type.value!r} is already revealed and no unseen investigations remain."
+        )
     def _map_decision_to_resolution(self, decision: str, confidence: float, observation) -> ResolutionEnum:
         if decision not in {"fraud", "legitimate"}:

notebooks/fraudshield_trl_colab.ipynb CHANGED Viewed

@@ -6,9 +6,12 @@
       "source": [
         "# FraudShield Colab Training Notebook\n",
         "\n",
-        "This notebook uses **Unsloth + TRL** to fine-tune a small instruction model to imitate strong investigation trajectories in FraudShield.\n",
         "\n",
-        "It is designed for **reliable Colab execution** first: install dependencies, build a training set from real environment rollouts, fine-tune a LoRA policy, evaluate heuristic vs trained policy, and save the expected training artifacts.\n"
       ]
     },
     {
@@ -18,7 +21,7 @@
       "outputs": [],
       "source": [
         "%pip uninstall -y unsloth unsloth_zoo trl transformers tokenizers\n",
-        "%pip install -q openenv-core datasets peft accelerate sentencepiece\n",
         "%pip install -q \"transformers==4.51.3\" \"trl==0.19.1\"\n",
         "%pip install -q \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\"\n",
         "\n",
@@ -74,10 +77,12 @@
       "source": [
         "import json\n",
         "import os\n",
         "import subprocess\n",
         "from datetime import datetime\n",
         "\n",
-        "from datasets import Dataset\n",
         "\n",
         "from fraudshield_env import FraudShieldEnvironment\n",
         "from llm_agent import SnapshotCalibratedFraudDetectionAgent\n",
@@ -86,6 +91,17 @@
         "assert env.load_data(), 'FraudShield snapshot failed to load.'\n",
         "print('FraudShield loaded:', env.data_loader.get_bundle_summary())\n",
         "\n",
         "def serialize_observation(observation):\n",
         "    return json.dumps(\n",
         "        {\n",
@@ -102,74 +118,166 @@
         "            'case_summary': observation.case_summary.model_dump(mode='json'),\n",
         "            'app_context': observation.app_context,\n",
         "        },\n",
-        "        ensure_ascii=True,\n",
-        "        indent=2,\n",
         "    )\n",
         "\n",
         "def prompt_from_observation(observation):\n",
         "    return (\n",
-        "        'You are a fraud analyst working in a simulated investigation workflow.\\n'\n",
-        "        'Choose the next best action based only on the visible observation.\\n'\n",
-        "        'Respond with JSON only using keys action_type, investigation_target, decision, confidence, reasoning.\\n'\n",
-        "        'Use action_type investigate or decide.\\n\\n'\n",
-        "        'Observation:\\n'\n",
-        "        f\"{serialize_observation(observation)}\\n\"\n",
         "    )\n",
         "\n",
-        "def action_to_target_json(action):\n",
-        "    payload = {\n",
-        "        'action_type': 'investigate',\n",
         "        'investigation_target': None,\n",
-        "        'decision': None,\n",
-        "        'confidence': 0.5,\n",
-        "        'reasoning': action.reasoning or '',\n",
         "    }\n",
-        "    if action.action_type.value == 'fetch_customer_profile':\n",
-        "        payload['investigation_target'] = 'customer_profile'\n",
-        "    elif action.action_type.value == 'fetch_merchant_profile':\n",
-        "        payload['investigation_target'] = 'merchant_profile'\n",
-        "    elif action.action_type.value == 'fetch_network_graph':\n",
-        "        payload['investigation_target'] = 'network_graph'\n",
-        "    elif action.action_type.value == 'check_policy':\n",
-        "        payload['investigation_target'] = 'policy_review'\n",
-        "    elif action.action_type.value == 'review_transaction':\n",
-        "        payload['investigation_target'] = 'payment_trace'\n",
-        "    elif action.action_type.value == 'add_case_note':\n",
-        "        payload['investigation_target'] = 'trust_notes'\n",
-        "        payload['reasoning'] = action.note_text or payload['reasoning']\n",
-        "    elif action.action_type.value == 'resolve_case':\n",
-        "        payload['action_type'] = 'decide'\n",
-        "        payload['investigation_target'] = None\n",
-        "        if action.resolution.value in {'approve', 'request_docs'}:\n",
-        "            payload['decision'] = 'legitimate'\n",
-        "            payload['confidence'] = 0.8 if action.resolution.value == 'approve' else 0.6\n",
-        "        else:\n",
-        "            payload['decision'] = 'fraud'\n",
-        "            payload['confidence'] = 0.9 if action.resolution.value in {'block', 'escalate'} else 0.6\n",
-        "    return json.dumps(payload, ensure_ascii=True)\n",
-        "\n",
-        "def build_training_dataset(per_task_episodes=18):\n",
         "    agent = SnapshotCalibratedFraudDetectionAgent()\n",
-        "    records = []\n",
         "    for task_name in ('easy', 'medium', 'hard'):\n",
-        "        for episode_idx in range(per_task_episodes):\n",
-        "            rollout_env = FraudShieldEnvironment(data_path='data', seed=42 + episode_idx)\n",
-        "            rollout_env.load_data()\n",
-        "            observation = rollout_env.reset(task_name).observation\n",
-        "            while not rollout_env.is_done:\n",
         "                action = agent.decide(observation)\n",
-        "                records.append({\n",
         "                    'task_name': task_name,\n",
-        "                    'prompt': prompt_from_observation(observation),\n",
-        "                    'target': action_to_target_json(action),\n",
-        "                    'text': prompt_from_observation(observation) + action_to_target_json(action),\n",
         "                })\n",
-        "                observation = rollout_env.step(action).observation\n",
-        "    return Dataset.from_list(records)\n",
         "\n",
-        "train_dataset = build_training_dataset(per_task_episodes=18)\n",
-        "print(train_dataset)\n",
-        "print(train_dataset[0]['text'][:1200])\n"
       ]
     },
     {
@@ -209,30 +317,56 @@
         "from transformers import TrainingArguments\n",
         "from trl import SFTTrainer\n",
         "\n",
-        "training_args = TrainingArguments(\n",
-        "    output_dir='fraudshield-sft-run',\n",
-        "    num_train_epochs=3,\n",
         "    per_device_train_batch_size=2,\n",
         "    gradient_accumulation_steps=4,\n",
         "    learning_rate=2e-4,\n",
         "    logging_steps=1,\n",
         "    save_strategy='epoch',\n",
         "    report_to='none',\n",
         "    fp16=not torch.cuda.is_bf16_supported(),\n",
         "    bf16=torch.cuda.is_bf16_supported(),\n",
-        "    max_steps=-1,\n",
         "    warmup_ratio=0.05,\n",
         "    lr_scheduler_type='cosine',\n",
         ")\n",
         "\n",
         "trainer = SFTTrainer(\n",
-        "    model=model,\n",
         "    tokenizer=tokenizer,\n",
-        "    train_dataset=train_dataset,\n",
         "    dataset_text_field='text',\n",
         "    max_seq_length=MAX_SEQ_LENGTH,\n",
         "    packing=False,\n",
-        "    args=training_args,\n",
         ")\n",
         "\n",
         "trainer.train()\n",
@@ -261,7 +395,8 @@
         "        check=True,\n",
         "    )\n",
         "    with open('fraudshield_baseline_results.json', 'r', encoding='utf-8') as handle:\n",
-        "        return json.load(handle), completed.stdout\n",
         "\n",
         "baseline_results, baseline_stdout = run_inference()\n",
         "trained_results, trained_stdout = run_inference({'LOCAL_MODEL_PATH': OUTPUT_DIR})\n",
@@ -275,8 +410,10 @@
         "        'delta': trained_results[task_name]['score'] - baseline_results[task_name]['score'],\n",
         "    })\n",
         "\n",
-        "print('Heuristic baseline stdout:\\n', baseline_stdout)\n",
-        "print('Trained model stdout:\\n', trained_stdout)\n",
         "print(json.dumps(comparison_rows, indent=2))\n"
       ]
     },
@@ -316,8 +453,9 @@
         "summary = {\n",
         "    'status': 'completed',\n",
         "    'updated_at': datetime.utcnow().isoformat() + 'Z',\n",
-        "    'trainer': 'TRL SFTTrainer with Unsloth LoRA',\n",
         "    'base_model': MODEL_NAME,\n",
         "    'local_model_path': OUTPUT_DIR,\n",
         "    'baseline': {\n",
         "        'easy': baseline_results['easy']['score'],\n",
@@ -343,7 +481,8 @@
         "with open('training_summary.json', 'w', encoding='utf-8') as handle:\n",
         "    json.dump(summary, handle, indent=2)\n",
         "\n",
-        "print(json.dumps(summary, indent=2))\n"
       ]
     }
   ],
@@ -356,8 +495,12 @@
     "language_info": {
       "name": "python",
       "version": "3.12"
     }
   },
   "nbformat": 4,
   "nbformat_minor": 5
-}

       "source": [
         "# FraudShield Colab Training Notebook\n",
         "\n",
+        "This notebook trains an **open-source LLM policy** for FraudShield using a two-stage curriculum:\n",
         "\n",
+        "1. **Public fraud-data adaptation** from a Hugging Face dataset\n",
+        "2. **FraudShield policy adaptation** from environment-compatible action traces\n",
+        "\n",
+        "The goal is to learn more than a static heuristic by giving the model broader fraud signals first, then teaching it how to act inside the FraudShield workflow.\n"
       ]
     },
     {
       "outputs": [],
       "source": [
         "%pip uninstall -y unsloth unsloth_zoo trl transformers tokenizers\n",
+        "%pip install -q openenv-core datasets peft accelerate sentencepiece matplotlib pandas\n",
         "%pip install -q \"transformers==4.51.3\" \"trl==0.19.1\"\n",
         "%pip install -q \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\"\n",
         "\n",
       "source": [
         "import json\n",
         "import os\n",
+        "import random\n",
         "import subprocess\n",
         "from datetime import datetime\n",
         "\n",
+        "import pandas as pd\n",
+        "from datasets import Dataset, load_dataset\n",
         "\n",
         "from fraudshield_env import FraudShieldEnvironment\n",
         "from llm_agent import SnapshotCalibratedFraudDetectionAgent\n",
         "assert env.load_data(), 'FraudShield snapshot failed to load.'\n",
         "print('FraudShield loaded:', env.data_loader.get_bundle_summary())\n",
         "\n",
+        "random.seed(42)\n",
+        "\n",
+        "CANONICAL_ALIASES = [\n",
+        "    'merchant_profile',\n",
+        "    'customer_profile',\n",
+        "    'network_graph',\n",
+        "    'payment_trace',\n",
+        "    'policy_review',\n",
+        "]\n",
+        "\n",
+        "\n",
         "def serialize_observation(observation):\n",
         "    return json.dumps(\n",
         "        {\n",
         "            'case_summary': observation.case_summary.model_dump(mode='json'),\n",
         "            'app_context': observation.app_context,\n",
         "        },\n",
+        "        sort_keys=True,\n",
         "    )\n",
         "\n",
+        "\n",
         "def prompt_from_observation(observation):\n",
+        "    available = observation.app_context.get('available_investigations', CANONICAL_ALIASES)\n",
         "    return (\n",
+        "        'You are a fraud analyst operating in a simulated investigation workflow. '\n",
+        "        'Only use visible evidence. Return JSON only.\\n\\n'\n",
+        "        f'Visible observation:\\n{serialize_observation(observation)}\\n\\n'\n",
+        "        f'Valid investigation aliases: {available}.\\n'\n",
+        "        'Respond with JSON using this schema: '\n",
+        "        '{\"action_type\":\"investigate|decide\",\"investigation_target\":\"alias_or_null\",\"decision\":\"fraud|legitimate|null\",\"confidence\":0.0,\"reasoning\":\"one sentence\"}'\n",
         "    )\n",
         "\n",
+        "\n",
+        "def action_to_payload(action):\n",
+        "    action_name = action.action_type.value\n",
+        "    if action_name == 'fetch_merchant_profile':\n",
+        "        return {'action_type': 'investigate', 'investigation_target': 'merchant_profile', 'decision': None, 'confidence': None, 'reasoning': action.reasoning or 'Review seller risk signals before routing.'}\n",
+        "    if action_name == 'fetch_customer_profile':\n",
+        "        return {'action_type': 'investigate', 'investigation_target': 'customer_profile', 'decision': None, 'confidence': None, 'reasoning': action.reasoning or 'Review buyer risk signals before routing.'}\n",
+        "    if action_name == 'fetch_network_graph':\n",
+        "        return {'action_type': 'investigate', 'investigation_target': 'network_graph', 'decision': None, 'confidence': None, 'reasoning': action.reasoning or 'Check linked network risk before routing.'}\n",
+        "    if action_name == 'review_transaction':\n",
+        "        return {'action_type': 'investigate', 'investigation_target': 'payment_trace', 'decision': None, 'confidence': None, 'reasoning': action.reasoning or 'Inspect payment and fulfillment details first.'}\n",
+        "    if action_name == 'check_policy':\n",
+        "        return {'action_type': 'investigate', 'investigation_target': 'policy_review', 'decision': None, 'confidence': None, 'reasoning': action.reasoning or 'Check routing policy before a final decision.'}\n",
+        "    if action_name == 'add_case_note':\n",
+        "        return {'action_type': 'decide', 'investigation_target': None, 'decision': 'fraud', 'confidence': 0.55, 'reasoning': action.note_text or 'Document the case before final routing.'}\n",
+        "\n",
+        "    decision = 'fraud' if action.resolution.value in {'block', 'hold', 'escalate'} else 'legitimate'\n",
+        "    confidence = 0.9 if action.resolution.value in {'approve', 'block'} else 0.7\n",
+        "    return {\n",
+        "        'action_type': 'decide',\n",
         "        'investigation_target': None,\n",
+        "        'decision': decision,\n",
+        "        'confidence': confidence,\n",
+        "        'reasoning': action.reasoning or f'Final routing is {action.resolution.value}.',\n",
         "    }\n",
+        "\n",
+        "\n",
+        "def build_fraudshield_rollout_dataset(per_task_episodes=24):\n",
         "    agent = SnapshotCalibratedFraudDetectionAgent()\n",
+        "    rows = []\n",
         "    for task_name in ('easy', 'medium', 'hard'):\n",
+        "        for _ in range(per_task_episodes):\n",
+        "            reset_result = env.reset(task=task_name)\n",
+        "            observation = reset_result.observation\n",
+        "            done = False\n",
+        "            while not done:\n",
         "                action = agent.decide(observation)\n",
+        "                payload = action_to_payload(action)\n",
+        "                rows.append({\n",
+        "                    'text': prompt_from_observation(observation) + '\\n' + json.dumps(payload, separators=(',', ':')),\n",
+        "                    'source': 'fraudshield_rollout',\n",
         "                    'task_name': task_name,\n",
         "                })\n",
+        "                step_result = env.step(action)\n",
+        "                observation = step_result.observation\n",
+        "                done = step_result.done\n",
+        "    return Dataset.from_pandas(pd.DataFrame(rows), preserve_index=False)\n",
+        "\n",
+        "\n",
+        "def public_row_to_training_example(row):\n",
+        "    amount = float(row.get('amount', 0.0) or 0.0)\n",
+        "    transaction_type = str(row.get('transaction_type', row.get('type', 'purchase')))\n",
+        "    location = str(row.get('location', 'unknown'))\n",
+        "    merchant = str(row.get('merchant', row.get('nameDest', 'unknown_merchant')))\n",
+        "    device = str(row.get('device', 'unknown_device'))\n",
+        "    payment_method = str(row.get('payment_method', row.get('transaction_type', 'card')))\n",
+        "    timestamp = str(row.get('timestamp', row.get('step', 'unknown_time')))\n",
+        "    is_fraud = int(row.get('is_fraud', row.get('isFraud', row.get('Class', 0))) or 0)\n",
+        "\n",
+        "    high_amount = amount >= 1500\n",
+        "    risky_type = transaction_type.lower() in {'transfer', 'cash_out', 'wire', 'crypto', 'gift_card'}\n",
+        "    risky_location = any(token in location.lower() for token in ['proxy', 'unknown', 'foreign', 'vpn'])\n",
+        "    risky_device = any(token in device.lower() for token in ['emulator', 'root', 'shared', 'new'])\n",
+        "\n",
+        "    available = ['merchant_profile', 'customer_profile', 'network_graph', 'payment_trace', 'policy_review']\n",
+        "    visible_observation = {\n",
+        "        'case_id': f\"public_case_{abs(hash(str(row.get('transaction_id', merchant)))) % 100000}\",\n",
+        "        'task_name': 'medium',\n",
+        "        'current_screen': 'Queue',\n",
+        "        'visible_panels': ['triage_summary'],\n",
+        "        'revealed_evidence': {},\n",
+        "        'linked_case_ids': [],\n",
+        "        'remaining_steps': 6,\n",
+        "        'remaining_sla': 5,\n",
+        "        'note_required': False,\n",
+        "        'allowed_actions': ['review_transaction', 'fetch_customer_profile', 'fetch_merchant_profile', 'fetch_network_graph', 'check_policy', 'resolve_case'],\n",
+        "        'case_summary': {\n",
+        "            'amount_usd': round(amount, 2),\n",
+        "            'queue_reason': f'{transaction_type} transaction flagged for manual review',\n",
+        "            'visible_risk_band': 'review',\n",
+        "            'merchant_region': 'masked',\n",
+        "        },\n",
+        "        'app_context': {\n",
+        "            'item_category': transaction_type,\n",
+        "            'timestamp': timestamp,\n",
+        "            'available_investigations': available,\n",
+        "            'public_signals': {\n",
+        "                'merchant': merchant,\n",
+        "                'device': device,\n",
+        "                'payment_method': payment_method,\n",
+        "                'location': location,\n",
+        "            },\n",
+        "        },\n",
+        "    }\n",
+        "\n",
+        "    if is_fraud and (risky_type or high_amount):\n",
+        "        payload = {\n",
+        "            'action_type': 'investigate',\n",
+        "            'investigation_target': 'network_graph' if risky_device or risky_location else 'payment_trace',\n",
+        "            'decision': None,\n",
+        "            'confidence': None,\n",
+        "            'reasoning': 'The visible signals are suspicious, so gather network or payment evidence before routing.',\n",
+        "        }\n",
+        "    elif not is_fraud and amount < 200 and not risky_type:\n",
+        "        payload = {\n",
+        "            'action_type': 'decide',\n",
+        "            'investigation_target': None,\n",
+        "            'decision': 'legitimate',\n",
+        "            'confidence': 0.82,\n",
+        "            'reasoning': 'The visible transaction looks low risk and can be cleared with high confidence.',\n",
+        "        }\n",
+        "    else:\n",
+        "        payload = {\n",
+        "            'action_type': 'investigate',\n",
+        "            'investigation_target': 'merchant_profile' if high_amount else 'customer_profile',\n",
+        "            'decision': None,\n",
+        "            'confidence': None,\n",
+        "            'reasoning': 'The transaction is ambiguous, so inspect merchant or customer history before routing.',\n",
+        "        }\n",
+        "\n",
+        "    prompt = (\n",
+        "        'You are a fraud analyst learning how to investigate suspicious payments. '\n",
+        "        'Use visible triage signals to choose the next best FraudShield action. Return JSON only.\\n\\n'\n",
+        "        f'Visible observation:\\n{json.dumps(visible_observation, sort_keys=True)}\\n\\n'\n",
+        "        f'Valid investigation aliases: {available}.\\n'\n",
+        "        'Respond with JSON using this schema: '\n",
+        "        '{\"action_type\":\"investigate|decide\",\"investigation_target\":\"alias_or_null\",\"decision\":\"fraud|legitimate|null\",\"confidence\":0.0,\"reasoning\":\"one sentence\"}'\n",
+        "    )\n",
+        "    return {'text': prompt + '\\n' + json.dumps(payload, separators=(',', ':')), 'source': 'public_fraud_data', 'task_name': 'curriculum'}\n",
+        "\n",
         "\n",
+        "def build_public_curriculum(max_rows=2500):\n",
+        "    dataset_name = 'Phoenix21/mock_fraud-detection-dataset'\n",
+        "    public_ds = load_dataset(dataset_name, split='train')\n",
+        "    rows = [public_row_to_training_example(row) for row in public_ds.shuffle(seed=42).select(range(min(max_rows, len(public_ds))))]\n",
+        "    print('Loaded public curriculum rows from', dataset_name, 'count=', len(rows))\n",
+        "    return Dataset.from_pandas(pd.DataFrame(rows), preserve_index=False)\n",
+        "\n",
+        "\n",
+        "public_dataset = build_public_curriculum(max_rows=2500)\n",
+        "fraudshield_dataset = build_fraudshield_rollout_dataset(per_task_episodes=24)\n",
+        "print(public_dataset)\n",
+        "print(fraudshield_dataset)\n",
+        "print(public_dataset[0]['text'][:900])\n",
+        "print(fraudshield_dataset[0]['text'][:900])\n"
       ]
     },
     {
         "from transformers import TrainingArguments\n",
         "from trl import SFTTrainer\n",
         "\n",
+        "stage1_args = TrainingArguments(\n",
+        "    output_dir='fraudshield-sft-stage1',\n",
+        "    num_train_epochs=1,\n",
         "    per_device_train_batch_size=2,\n",
         "    gradient_accumulation_steps=4,\n",
         "    learning_rate=2e-4,\n",
+        "    logging_steps=5,\n",
+        "    save_strategy='no',\n",
+        "    report_to='none',\n",
+        "    fp16=not torch.cuda.is_bf16_supported(),\n",
+        "    bf16=torch.cuda.is_bf16_supported(),\n",
+        "    warmup_ratio=0.03,\n",
+        "    lr_scheduler_type='cosine',\n",
+        ")\n",
+        "\n",
+        "stage1_trainer = SFTTrainer(\n",
+        "    model=model,\n",
+        "    tokenizer=tokenizer,\n",
+        "    train_dataset=public_dataset,\n",
+        "    dataset_text_field='text',\n",
+        "    max_seq_length=MAX_SEQ_LENGTH,\n",
+        "    packing=False,\n",
+        "    args=stage1_args,\n",
+        ")\n",
+        "\n",
+        "stage1_trainer.train()\n",
+        "\n",
+        "stage2_args = TrainingArguments(\n",
+        "    output_dir='fraudshield-sft-stage2',\n",
+        "    num_train_epochs=2,\n",
+        "    per_device_train_batch_size=2,\n",
+        "    gradient_accumulation_steps=4,\n",
+        "    learning_rate=1e-4,\n",
         "    logging_steps=1,\n",
         "    save_strategy='epoch',\n",
         "    report_to='none',\n",
         "    fp16=not torch.cuda.is_bf16_supported(),\n",
         "    bf16=torch.cuda.is_bf16_supported(),\n",
         "    warmup_ratio=0.05,\n",
         "    lr_scheduler_type='cosine',\n",
         ")\n",
         "\n",
         "trainer = SFTTrainer(\n",
+        "    model=stage1_trainer.model,\n",
         "    tokenizer=tokenizer,\n",
+        "    train_dataset=fraudshield_dataset,\n",
         "    dataset_text_field='text',\n",
         "    max_seq_length=MAX_SEQ_LENGTH,\n",
         "    packing=False,\n",
+        "    args=stage2_args,\n",
         ")\n",
         "\n",
         "trainer.train()\n",
         "        check=True,\n",
         "    )\n",
         "    with open('fraudshield_baseline_results.json', 'r', encoding='utf-8') as handle:\n",
+        "        results = json.load(handle)\n",
+        "    return results, completed.stdout\n",
         "\n",
         "baseline_results, baseline_stdout = run_inference()\n",
         "trained_results, trained_stdout = run_inference({'LOCAL_MODEL_PATH': OUTPUT_DIR})\n",
         "        'delta': trained_results[task_name]['score'] - baseline_results[task_name]['score'],\n",
         "    })\n",
         "\n",
+        "print('Heuristic baseline stdout:\n",
+        "', baseline_stdout)\n",
+        "print('Trained model stdout:\n",
+        "', trained_stdout)\n",
         "print(json.dumps(comparison_rows, indent=2))\n"
       ]
     },
         "summary = {\n",
         "    'status': 'completed',\n",
         "    'updated_at': datetime.utcnow().isoformat() + 'Z',\n",
+        "    'trainer': 'Two-stage TRL SFTTrainer with Unsloth LoRA',\n",
         "    'base_model': MODEL_NAME,\n",
+        "    'public_curriculum_dataset': 'Phoenix21/mock_fraud-detection-dataset',\n",
         "    'local_model_path': OUTPUT_DIR,\n",
         "    'baseline': {\n",
         "        'easy': baseline_results['easy']['score'],\n",
         "with open('training_summary.json', 'w', encoding='utf-8') as handle:\n",
         "    json.dump(summary, handle, indent=2)\n",
         "\n",
+        "print(json.dumps(summary, indent=2))\n",
+        "print('Artifacts saved: reward_curve.png, loss_curve.png, training_summary.json')\n"
       ]
     }
   ],
     "language_info": {
       "name": "python",
       "version": "3.12"
+    },
+    "colab": {
+      "name": "fraudshield_trl_colab.ipynb",
+      "provenance": []
     }
   },
   "nbformat": 4,
   "nbformat_minor": 5
+}