fraudshield-1 / README.md
DevikaJ2005's picture
Add training-first RL architecture with tracking
ce9edc2
---
title: FraudShield
emoji: "🛡️"
colorFrom: blue
colorTo: indigo
sdk: docker
python_version: "3.12"
pinned: false
license: mit
---
# FraudShield
FraudShield is a partial-observability OpenEnv environment for simulated fraud investigation and workflow-aware routing.
## Training-First Architecture
FraudShield now includes a modular LLM + RL training stack alongside the OpenEnv runtime:
- `environment.py`: text-first wrapper for multi-step rollouts
- `reward.py`: decomposed numeric reward with measurable subscores
- `train.py`: Colab-friendly QLoRA training pipeline
- `evaluate.py`: fixed-task evaluation and comparison plots
- `config.py`: experiment, model, environment, and reward configuration
- `utils.py`: seeding, JSON handling, logging helpers, and moving averages
- `configs/colab_qlora_grpo.json`: default Colab experiment config
This layer is designed so you can generate rollouts, score model behavior with decomposed rewards, save checkpoints, resume runs, and compare before/after performance in a repeatable way.
Experimental tracking is enabled by default through TensorBoard logs under `artifacts/rl_runs/.../tb_logs`, and the training pipeline also writes plot artifacts such as `loss_vs_steps.png` and `reward_vs_steps.png`. If you want hosted tracking, set `report_to=["wandb"]` or `["tensorboard","wandb"]` in the experiment config before the run.
## What This Is
FraudShield is an RL-ready simulation, not a live fraud platform. An agent receives a limited triage view of a case, chooses investigation actions to reveal hidden evidence, and then routes the case with one of the supported final resolutions.
The environment is built for OpenEnv evaluation and training. It keeps the runtime fully offline by using the frozen snapshot in `data/fraudshield_cases.json`.
## Why It Matters For Theme 3.1
Theme 3.1 is about professional tasks, tool use, and world modeling under partial observability. FraudShield fits that directly:
- the agent starts with incomplete information
- useful evidence appears only after the right action is taken
- the environment rewards workflow quality, not just final correctness
- harder tasks require multi-step investigation and linked-case reasoning
This makes it a better fit for training decision-making agents than a one-shot fraud classifier.
## Lightweight Explorer UI
FraudShield now includes a small browser explorer at `/` so you can inspect the environment without sending raw API requests by hand. The explorer lets you:
- reset an easy, medium, or hard episode
- click investigation and resolution actions one step at a time
- inspect the live observation and full environment state
- run the current heuristic baseline as a walkthrough before RL training
This UI is intentionally lightweight. It is there to make the environment easier to understand, not to turn FraudShield into a fake production product.
## Environment Design
### Action Space
FraudShield keeps a fixed typed action space:
- `review_transaction`: open the operational transaction trace for the active case
- `fetch_customer_profile`: reveal buyer age, dispute history, and repeat-buyer status
- `fetch_merchant_profile`: reveal seller age, rating, reviews, and chargeback rate
- `fetch_network_graph`: reveal shared-device activity, prior flags, cluster risk, linked cards, and linked case IDs when present
- `check_policy`: reveal routing policy guidance
- `add_case_note`: write the required audit note before final closure
- `resolve_case`: submit one final resolution
Supported final resolutions:
- `approve`
- `block`
- `hold`
- `request_docs`
- `escalate`
### Observation Space
The public observation model stays the same, but the reset-time contents are intentionally sparse.
At reset, the agent only sees:
- `case_id`
- `task_name`
- `remaining_steps`
- `episode_step`
- `case_summary.amount_usd`
- a short triage summary in `case_summary.queue_reason`
- coarse context in `app_context`:
- `item_category`
- `timestamp`
- `investigation_budget_remaining`
- `available_investigations`
- the currently valid public actions in `allowed_actions`
Hidden details do **not** appear until the matching action is taken. In particular, seller profile, buyer profile, network risk, payment method, shipping behavior, and linked-case structure are progressively revealed through `revealed_evidence`.
### Reward Design
FraudShield keeps the existing correctness-driven terminal structure and adds workflow-shaped rewards:
- `+0.05` for a first-time useful fetch
- `+0.08` for `review_transaction` on cases with hidden high-risk payment or fulfillment facts
- `+0.08` for `fetch_network_graph` on cases with high hidden cluster risk
- `-0.05` for redundant repeated fetches
- `-0.03` for fetches after the case fetch budget is exhausted
- `-0.10` for resolving a medium or hard case with no fetch-based evidence
- `+0.15` terminal bonus for correct medium or hard routing when at least one investigation was used
The grader in `graders.py` is unchanged. Final task scores still depend on resolution accuracy, evidence coverage, policy compliance, workflow completion, efficiency, and linked-case consistency.
### Task Difficulty
FraudShield has three graded tasks:
| Task | Design goal | What makes it hard |
| --- | --- | --- |
| Easy | obvious routing with minimal investigation | strong visible cues, 1 fetch budget |
| Medium | mixed-signal routing | at least 1 investigation needed, 2 evidence points typically matter |
| Hard | linked-case reasoning | misleading triage, hidden linkage, 3 fetch budget, graph evidence usually required |
## How To Run Locally
Install the package:
```bash
pip install -e .
```
Run the heuristic or configured agent:
```bash
python inference.py
```
FraudShield supports three agent modes:
- `heuristic` by default when no model credentials are set
- `llm_local` when `LOCAL_MODEL_PATH` points to a trained Hugging Face / PEFT checkpoint
- `llm_remote` when an API-compatible model is configured
For a no-paid-model open-source setup, the recommended options are:
### Option 1: Use your locally trained model
```bash
LOCAL_MODEL_PATH=trained_policy python inference.py
```
### Option 2: Use a Hugging Face hosted open-source model
```bash
HF_TOKEN=your_token_here \
MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct \
API_BASE_URL=https://router.huggingface.co/v1 \
python inference.py
```
If `HF_TOKEN` is present and `API_BASE_URL` is not set, FraudShield defaults to the Hugging Face router automatically.
Run the OpenEnv API server:
```bash
python -m server.app
```
Then open the lightweight explorer:
- `http://127.0.0.1:7860/`
Important endpoints:
- `GET /health`
- `POST /reset?task=easy|medium|hard`
- `POST /step`
- `GET /state`
- `GET /info`
- `GET /tasks`
- `GET /metadata`
- `GET /schema`
- `POST /mcp`
- `GET /docs`
Validation:
```bash
python validate_api.py
python -m openenv.cli validate .
docker build -t fraudshield .
docker run -p 7860:7860 fraudshield
```
## How To Run The Training Notebook
The Colab notebook lives at:
- `notebooks/fraudshield_trl_colab.ipynb`
It is designed to:
1. install `openenv-core`, `trl`, `unsloth`, `transformers`, `datasets`, and `peft`
2. clone the repo and install FraudShield
3. load a public fraud curriculum dataset from Hugging Face
4. build a second-stage training set from real FraudShield rollouts
5. run two-stage fine-tuning with Unsloth LoRA and TRL `SFTTrainer`
- stage 1: public fraud-data adaptation
- stage 2: FraudShield policy adaptation
5. save a reusable local policy checkpoint
6. save:
- `reward_curve.png`
- `loss_curve.png`
- `training_summary.json`
7. evaluate:
- heuristic via `python inference.py`
- trained model via `LOCAL_MODEL_PATH=... python inference.py`
The notebook is designed for Colab + GPU execution and does not require a paid proprietary LLM. The current public curriculum source is `Phoenix21/mock_fraud-detection-dataset`, which gives the model broader fraud-signal exposure before it is adapted to FraudShield actions.
## Results
Current heuristic baseline, measured with `python inference.py`:
- Easy: `0.9900`
- Medium: `0.3500`
- Hard: `0.7425`
- Final: `0.6942`
This baseline is intentionally rule-based and not trained. It is strong on easy, weaker on medium, and still imperfect on hard, which leaves headroom for a trained policy that can learn broader fraud patterns from public data and then adapt them to FraudShield.
Once training is completed, this section should include:
- reward curve image
- loss curve image
- trained-vs-heuristic comparison table
- one short qualitative trace comparison
The preferred final story is:
- heuristic baseline
- base open-source LLM or hosted HF model
- fine-tuned local policy checkpoint
## Live Links
- Hugging Face Space: `https://huggingface.co/spaces/DevikaJ2005/fraudshield-1`
- Code repository: `https://github.com/DevikaJ2005/Fraudshield`
- Colab notebook: `https://colab.research.google.com/github/DevikaJ2005/Fraudshield/blob/main/notebooks/fraudshield_trl_colab.ipynb`
- Blog draft: `HF_BLOG_DRAFT.md`
The Space root can double as a quick explorer UI for judges before they open the API docs.
For final submission, make sure the README links:
- the public HF Space
- the public GitHub repo
- the public Colab notebook
- the final Hugging Face blog post or video/slides link
- the committed reward/loss plot images
## Simulation vs Production
FraudShield is a simulation for training and evaluation.
What it does:
- models partial observability
- enforces investigation budgets
- exposes hidden evidence only through actions
- grades routing behavior in a reproducible way
What it does **not** do:
- connect to live financial systems
- process real customer data
- move money or block real payments
- provide production security, auth, or compliance guarantees
A production fraud platform would still need real data pipelines, authentication, authorization, monitoring, compliance controls, and human-review operations beyond this environment.