Spaces:

DevikaJ2005
/

fraudshield-1

Sleeping

App Files Files Community

fraudshield-1 / README.md

DevikaJ2005

Add training-first RL architecture with tracking

ce9edc2 about 1 month ago

preview code

raw

history blame contribute delete

10.1 kB

metadata

title: FraudShield
emoji: 🛡️
colorFrom: blue
colorTo: indigo
sdk: docker
python_version: '3.12'
pinned: false
license: mit

FraudShield

FraudShield is a partial-observability OpenEnv environment for simulated fraud investigation and workflow-aware routing.

Training-First Architecture

FraudShield now includes a modular LLM + RL training stack alongside the OpenEnv runtime:

environment.py: text-first wrapper for multi-step rollouts
reward.py: decomposed numeric reward with measurable subscores
train.py: Colab-friendly QLoRA training pipeline
evaluate.py: fixed-task evaluation and comparison plots
config.py: experiment, model, environment, and reward configuration
utils.py: seeding, JSON handling, logging helpers, and moving averages
configs/colab_qlora_grpo.json: default Colab experiment config

This layer is designed so you can generate rollouts, score model behavior with decomposed rewards, save checkpoints, resume runs, and compare before/after performance in a repeatable way.

Experimental tracking is enabled by default through TensorBoard logs under artifacts/rl_runs/.../tb_logs, and the training pipeline also writes plot artifacts such as loss_vs_steps.png and reward_vs_steps.png. If you want hosted tracking, set report_to=["wandb"] or ["tensorboard","wandb"] in the experiment config before the run.

What This Is

FraudShield is an RL-ready simulation, not a live fraud platform. An agent receives a limited triage view of a case, chooses investigation actions to reveal hidden evidence, and then routes the case with one of the supported final resolutions.

The environment is built for OpenEnv evaluation and training. It keeps the runtime fully offline by using the frozen snapshot in data/fraudshield_cases.json.

Why It Matters For Theme 3.1

Theme 3.1 is about professional tasks, tool use, and world modeling under partial observability. FraudShield fits that directly:

the agent starts with incomplete information
useful evidence appears only after the right action is taken
the environment rewards workflow quality, not just final correctness
harder tasks require multi-step investigation and linked-case reasoning

This makes it a better fit for training decision-making agents than a one-shot fraud classifier.

Lightweight Explorer UI

FraudShield now includes a small browser explorer at / so you can inspect the environment without sending raw API requests by hand. The explorer lets you:

reset an easy, medium, or hard episode
click investigation and resolution actions one step at a time
inspect the live observation and full environment state
run the current heuristic baseline as a walkthrough before RL training

This UI is intentionally lightweight. It is there to make the environment easier to understand, not to turn FraudShield into a fake production product.

Environment Design

Action Space

FraudShield keeps a fixed typed action space:

review_transaction: open the operational transaction trace for the active case
fetch_customer_profile: reveal buyer age, dispute history, and repeat-buyer status
fetch_merchant_profile: reveal seller age, rating, reviews, and chargeback rate
fetch_network_graph: reveal shared-device activity, prior flags, cluster risk, linked cards, and linked case IDs when present
check_policy: reveal routing policy guidance
add_case_note: write the required audit note before final closure
resolve_case: submit one final resolution

Supported final resolutions:

approve
block
hold
request_docs
escalate

Observation Space

The public observation model stays the same, but the reset-time contents are intentionally sparse.

At reset, the agent only sees:

case_id
task_name
remaining_steps
episode_step
case_summary.amount_usd
a short triage summary in case_summary.queue_reason
coarse context in app_context:
- item_category
- timestamp
- investigation_budget_remaining
- available_investigations
the currently valid public actions in allowed_actions

Hidden details do not appear until the matching action is taken. In particular, seller profile, buyer profile, network risk, payment method, shipping behavior, and linked-case structure are progressively revealed through revealed_evidence.

Reward Design

FraudShield keeps the existing correctness-driven terminal structure and adds workflow-shaped rewards:

+0.05 for a first-time useful fetch
+0.08 for review_transaction on cases with hidden high-risk payment or fulfillment facts
+0.08 for fetch_network_graph on cases with high hidden cluster risk
-0.05 for redundant repeated fetches
-0.03 for fetches after the case fetch budget is exhausted
-0.10 for resolving a medium or hard case with no fetch-based evidence
+0.15 terminal bonus for correct medium or hard routing when at least one investigation was used

The grader in graders.py is unchanged. Final task scores still depend on resolution accuracy, evidence coverage, policy compliance, workflow completion, efficiency, and linked-case consistency.

Task Difficulty

FraudShield has three graded tasks:

Task	Design goal	What makes it hard
Easy	obvious routing with minimal investigation	strong visible cues, 1 fetch budget
Medium	mixed-signal routing	at least 1 investigation needed, 2 evidence points typically matter
Hard	linked-case reasoning	misleading triage, hidden linkage, 3 fetch budget, graph evidence usually required

How To Run Locally

Install the package:

pip install -e .

Run the heuristic or configured agent:

python inference.py

FraudShield supports three agent modes:

heuristic by default when no model credentials are set
llm_local when LOCAL_MODEL_PATH points to a trained Hugging Face / PEFT checkpoint
llm_remote when an API-compatible model is configured

For a no-paid-model open-source setup, the recommended options are:

Option 1: Use your locally trained model

LOCAL_MODEL_PATH=trained_policy python inference.py

Option 2: Use a Hugging Face hosted open-source model

HF_TOKEN=your_token_here \
MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct \
API_BASE_URL=https://router.huggingface.co/v1 \
python inference.py

If HF_TOKEN is present and API_BASE_URL is not set, FraudShield defaults to the Hugging Face router automatically.

Run the OpenEnv API server:

python -m server.app

Then open the lightweight explorer:

http://127.0.0.1:7860/

Important endpoints:

GET /health
POST /reset?task=easy|medium|hard
POST /step
GET /state
GET /info
GET /tasks
GET /metadata
GET /schema
POST /mcp
GET /docs

Validation:

python validate_api.py
python -m openenv.cli validate .
docker build -t fraudshield .
docker run -p 7860:7860 fraudshield

How To Run The Training Notebook

The Colab notebook lives at:

notebooks/fraudshield_trl_colab.ipynb

It is designed to:

install openenv-core, trl, unsloth, transformers, datasets, and peft
clone the repo and install FraudShield
load a public fraud curriculum dataset from Hugging Face
build a second-stage training set from real FraudShield rollouts
run two-stage fine-tuning with Unsloth LoRA and TRL SFTTrainer
- stage 1: public fraud-data adaptation
- stage 2: FraudShield policy adaptation
save a reusable local policy checkpoint
save:
- reward_curve.png
- loss_curve.png
- training_summary.json
evaluate:
- heuristic via python inference.py
- trained model via LOCAL_MODEL_PATH=... python inference.py

The notebook is designed for Colab + GPU execution and does not require a paid proprietary LLM. The current public curriculum source is Phoenix21/mock_fraud-detection-dataset, which gives the model broader fraud-signal exposure before it is adapted to FraudShield actions.

Results

Current heuristic baseline, measured with python inference.py:

Easy: 0.9900
Medium: 0.3500
Hard: 0.7425
Final: 0.6942

This baseline is intentionally rule-based and not trained. It is strong on easy, weaker on medium, and still imperfect on hard, which leaves headroom for a trained policy that can learn broader fraud patterns from public data and then adapt them to FraudShield.

Once training is completed, this section should include:

reward curve image
loss curve image
trained-vs-heuristic comparison table
one short qualitative trace comparison

The preferred final story is:

heuristic baseline
base open-source LLM or hosted HF model
fine-tuned local policy checkpoint

Live Links

Hugging Face Space: https://huggingface.co/spaces/DevikaJ2005/fraudshield-1
Code repository: https://github.com/DevikaJ2005/Fraudshield
Colab notebook: https://colab.research.google.com/github/DevikaJ2005/Fraudshield/blob/main/notebooks/fraudshield_trl_colab.ipynb
Blog draft: HF_BLOG_DRAFT.md

The Space root can double as a quick explorer UI for judges before they open the API docs.

For final submission, make sure the README links:

the public HF Space
the public GitHub repo
the public Colab notebook
the final Hugging Face blog post or video/slides link
the committed reward/loss plot images

Simulation vs Production

FraudShield is a simulation for training and evaluation.

What it does:

models partial observability
enforces investigation budgets
exposes hidden evidence only through actions
grades routing behavior in a reproducible way

What it does not do:

connect to live financial systems
process real customer data
move money or block real payments
provide production security, auth, or compliance guarantees

A production fraud platform would still need real data pipelines, authentication, authorization, monitoring, compliance controls, and human-review operations beyond this environment.