Spaces:
Sleeping
title: FraudShield
emoji: 🛡️
colorFrom: blue
colorTo: indigo
sdk: docker
python_version: '3.12'
pinned: false
license: mit
FraudShield
FraudShield is a partial-observability OpenEnv environment for simulated fraud investigation and workflow-aware routing.
Training-First Architecture
FraudShield now includes a modular LLM + RL training stack alongside the OpenEnv runtime:
environment.py: text-first wrapper for multi-step rolloutsreward.py: decomposed numeric reward with measurable subscorestrain.py: Colab-friendly QLoRA training pipelineevaluate.py: fixed-task evaluation and comparison plotsconfig.py: experiment, model, environment, and reward configurationutils.py: seeding, JSON handling, logging helpers, and moving averagesconfigs/colab_qlora_grpo.json: default Colab experiment config
This layer is designed so you can generate rollouts, score model behavior with decomposed rewards, save checkpoints, resume runs, and compare before/after performance in a repeatable way.
Experimental tracking is enabled by default through TensorBoard logs under artifacts/rl_runs/.../tb_logs, and the training pipeline also writes plot artifacts such as loss_vs_steps.png and reward_vs_steps.png. If you want hosted tracking, set report_to=["wandb"] or ["tensorboard","wandb"] in the experiment config before the run.
What This Is
FraudShield is an RL-ready simulation, not a live fraud platform. An agent receives a limited triage view of a case, chooses investigation actions to reveal hidden evidence, and then routes the case with one of the supported final resolutions.
The environment is built for OpenEnv evaluation and training. It keeps the runtime fully offline by using the frozen snapshot in data/fraudshield_cases.json.
Why It Matters For Theme 3.1
Theme 3.1 is about professional tasks, tool use, and world modeling under partial observability. FraudShield fits that directly:
- the agent starts with incomplete information
- useful evidence appears only after the right action is taken
- the environment rewards workflow quality, not just final correctness
- harder tasks require multi-step investigation and linked-case reasoning
This makes it a better fit for training decision-making agents than a one-shot fraud classifier.
Lightweight Explorer UI
FraudShield now includes a small browser explorer at / so you can inspect the environment without sending raw API requests by hand. The explorer lets you:
- reset an easy, medium, or hard episode
- click investigation and resolution actions one step at a time
- inspect the live observation and full environment state
- run the current heuristic baseline as a walkthrough before RL training
This UI is intentionally lightweight. It is there to make the environment easier to understand, not to turn FraudShield into a fake production product.
Environment Design
Action Space
FraudShield keeps a fixed typed action space:
review_transaction: open the operational transaction trace for the active casefetch_customer_profile: reveal buyer age, dispute history, and repeat-buyer statusfetch_merchant_profile: reveal seller age, rating, reviews, and chargeback ratefetch_network_graph: reveal shared-device activity, prior flags, cluster risk, linked cards, and linked case IDs when presentcheck_policy: reveal routing policy guidanceadd_case_note: write the required audit note before final closureresolve_case: submit one final resolution
Supported final resolutions:
approveblockholdrequest_docsescalate
Observation Space
The public observation model stays the same, but the reset-time contents are intentionally sparse.
At reset, the agent only sees:
case_idtask_nameremaining_stepsepisode_stepcase_summary.amount_usd- a short triage summary in
case_summary.queue_reason - coarse context in
app_context:item_categorytimestampinvestigation_budget_remainingavailable_investigations
- the currently valid public actions in
allowed_actions
Hidden details do not appear until the matching action is taken. In particular, seller profile, buyer profile, network risk, payment method, shipping behavior, and linked-case structure are progressively revealed through revealed_evidence.
Reward Design
FraudShield keeps the existing correctness-driven terminal structure and adds workflow-shaped rewards:
+0.05for a first-time useful fetch+0.08forreview_transactionon cases with hidden high-risk payment or fulfillment facts+0.08forfetch_network_graphon cases with high hidden cluster risk-0.05for redundant repeated fetches-0.03for fetches after the case fetch budget is exhausted-0.10for resolving a medium or hard case with no fetch-based evidence+0.15terminal bonus for correct medium or hard routing when at least one investigation was used
The grader in graders.py is unchanged. Final task scores still depend on resolution accuracy, evidence coverage, policy compliance, workflow completion, efficiency, and linked-case consistency.
Task Difficulty
FraudShield has three graded tasks:
| Task | Design goal | What makes it hard |
|---|---|---|
| Easy | obvious routing with minimal investigation | strong visible cues, 1 fetch budget |
| Medium | mixed-signal routing | at least 1 investigation needed, 2 evidence points typically matter |
| Hard | linked-case reasoning | misleading triage, hidden linkage, 3 fetch budget, graph evidence usually required |
How To Run Locally
Install the package:
pip install -e .
Run the heuristic or configured agent:
python inference.py
FraudShield supports three agent modes:
heuristicby default when no model credentials are setllm_localwhenLOCAL_MODEL_PATHpoints to a trained Hugging Face / PEFT checkpointllm_remotewhen an API-compatible model is configured
For a no-paid-model open-source setup, the recommended options are:
Option 1: Use your locally trained model
LOCAL_MODEL_PATH=trained_policy python inference.py
Option 2: Use a Hugging Face hosted open-source model
HF_TOKEN=your_token_here \
MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct \
API_BASE_URL=https://router.huggingface.co/v1 \
python inference.py
If HF_TOKEN is present and API_BASE_URL is not set, FraudShield defaults to the Hugging Face router automatically.
Run the OpenEnv API server:
python -m server.app
Then open the lightweight explorer:
http://127.0.0.1:7860/
Important endpoints:
GET /healthPOST /reset?task=easy|medium|hardPOST /stepGET /stateGET /infoGET /tasksGET /metadataGET /schemaPOST /mcpGET /docs
Validation:
python validate_api.py
python -m openenv.cli validate .
docker build -t fraudshield .
docker run -p 7860:7860 fraudshield
How To Run The Training Notebook
The Colab notebook lives at:
notebooks/fraudshield_trl_colab.ipynb
It is designed to:
- install
openenv-core,trl,unsloth,transformers,datasets, andpeft - clone the repo and install FraudShield
- load a public fraud curriculum dataset from Hugging Face
- build a second-stage training set from real FraudShield rollouts
- run two-stage fine-tuning with Unsloth LoRA and TRL
SFTTrainer- stage 1: public fraud-data adaptation
- stage 2: FraudShield policy adaptation
- save a reusable local policy checkpoint
- save:
reward_curve.pngloss_curve.pngtraining_summary.json
- evaluate:
- heuristic via
python inference.py - trained model via
LOCAL_MODEL_PATH=... python inference.py
- heuristic via
The notebook is designed for Colab + GPU execution and does not require a paid proprietary LLM. The current public curriculum source is Phoenix21/mock_fraud-detection-dataset, which gives the model broader fraud-signal exposure before it is adapted to FraudShield actions.
Results
Current heuristic baseline, measured with python inference.py:
- Easy:
0.9900 - Medium:
0.3500 - Hard:
0.7425 - Final:
0.6942
This baseline is intentionally rule-based and not trained. It is strong on easy, weaker on medium, and still imperfect on hard, which leaves headroom for a trained policy that can learn broader fraud patterns from public data and then adapt them to FraudShield.
Once training is completed, this section should include:
- reward curve image
- loss curve image
- trained-vs-heuristic comparison table
- one short qualitative trace comparison
The preferred final story is:
- heuristic baseline
- base open-source LLM or hosted HF model
- fine-tuned local policy checkpoint
Live Links
- Hugging Face Space:
https://huggingface.co/spaces/DevikaJ2005/fraudshield-1 - Code repository:
https://github.com/DevikaJ2005/Fraudshield - Colab notebook:
https://colab.research.google.com/github/DevikaJ2005/Fraudshield/blob/main/notebooks/fraudshield_trl_colab.ipynb - Blog draft:
HF_BLOG_DRAFT.md
The Space root can double as a quick explorer UI for judges before they open the API docs.
For final submission, make sure the README links:
- the public HF Space
- the public GitHub repo
- the public Colab notebook
- the final Hugging Face blog post or video/slides link
- the committed reward/loss plot images
Simulation vs Production
FraudShield is a simulation for training and evaluation.
What it does:
- models partial observability
- enforces investigation budgets
- exposes hidden evidence only through actions
- grades routing behavior in a reproducible way
What it does not do:
- connect to live financial systems
- process real customer data
- move money or block real payments
- provide production security, auth, or compliance guarantees
A production fraud platform would still need real data pipelines, authentication, authorization, monitoring, compliance controls, and human-review operations beyond this environment.