fraudshield-1 / README.md
DevikaJ2005's picture
Add training-first RL architecture with tracking
ce9edc2
metadata
title: FraudShield
emoji: 🛡️
colorFrom: blue
colorTo: indigo
sdk: docker
python_version: '3.12'
pinned: false
license: mit

FraudShield

FraudShield is a partial-observability OpenEnv environment for simulated fraud investigation and workflow-aware routing.

Training-First Architecture

FraudShield now includes a modular LLM + RL training stack alongside the OpenEnv runtime:

  • environment.py: text-first wrapper for multi-step rollouts
  • reward.py: decomposed numeric reward with measurable subscores
  • train.py: Colab-friendly QLoRA training pipeline
  • evaluate.py: fixed-task evaluation and comparison plots
  • config.py: experiment, model, environment, and reward configuration
  • utils.py: seeding, JSON handling, logging helpers, and moving averages
  • configs/colab_qlora_grpo.json: default Colab experiment config

This layer is designed so you can generate rollouts, score model behavior with decomposed rewards, save checkpoints, resume runs, and compare before/after performance in a repeatable way.

Experimental tracking is enabled by default through TensorBoard logs under artifacts/rl_runs/.../tb_logs, and the training pipeline also writes plot artifacts such as loss_vs_steps.png and reward_vs_steps.png. If you want hosted tracking, set report_to=["wandb"] or ["tensorboard","wandb"] in the experiment config before the run.

What This Is

FraudShield is an RL-ready simulation, not a live fraud platform. An agent receives a limited triage view of a case, chooses investigation actions to reveal hidden evidence, and then routes the case with one of the supported final resolutions.

The environment is built for OpenEnv evaluation and training. It keeps the runtime fully offline by using the frozen snapshot in data/fraudshield_cases.json.

Why It Matters For Theme 3.1

Theme 3.1 is about professional tasks, tool use, and world modeling under partial observability. FraudShield fits that directly:

  • the agent starts with incomplete information
  • useful evidence appears only after the right action is taken
  • the environment rewards workflow quality, not just final correctness
  • harder tasks require multi-step investigation and linked-case reasoning

This makes it a better fit for training decision-making agents than a one-shot fraud classifier.

Lightweight Explorer UI

FraudShield now includes a small browser explorer at / so you can inspect the environment without sending raw API requests by hand. The explorer lets you:

  • reset an easy, medium, or hard episode
  • click investigation and resolution actions one step at a time
  • inspect the live observation and full environment state
  • run the current heuristic baseline as a walkthrough before RL training

This UI is intentionally lightweight. It is there to make the environment easier to understand, not to turn FraudShield into a fake production product.

Environment Design

Action Space

FraudShield keeps a fixed typed action space:

  • review_transaction: open the operational transaction trace for the active case
  • fetch_customer_profile: reveal buyer age, dispute history, and repeat-buyer status
  • fetch_merchant_profile: reveal seller age, rating, reviews, and chargeback rate
  • fetch_network_graph: reveal shared-device activity, prior flags, cluster risk, linked cards, and linked case IDs when present
  • check_policy: reveal routing policy guidance
  • add_case_note: write the required audit note before final closure
  • resolve_case: submit one final resolution

Supported final resolutions:

  • approve
  • block
  • hold
  • request_docs
  • escalate

Observation Space

The public observation model stays the same, but the reset-time contents are intentionally sparse.

At reset, the agent only sees:

  • case_id
  • task_name
  • remaining_steps
  • episode_step
  • case_summary.amount_usd
  • a short triage summary in case_summary.queue_reason
  • coarse context in app_context:
    • item_category
    • timestamp
    • investigation_budget_remaining
    • available_investigations
  • the currently valid public actions in allowed_actions

Hidden details do not appear until the matching action is taken. In particular, seller profile, buyer profile, network risk, payment method, shipping behavior, and linked-case structure are progressively revealed through revealed_evidence.

Reward Design

FraudShield keeps the existing correctness-driven terminal structure and adds workflow-shaped rewards:

  • +0.05 for a first-time useful fetch
  • +0.08 for review_transaction on cases with hidden high-risk payment or fulfillment facts
  • +0.08 for fetch_network_graph on cases with high hidden cluster risk
  • -0.05 for redundant repeated fetches
  • -0.03 for fetches after the case fetch budget is exhausted
  • -0.10 for resolving a medium or hard case with no fetch-based evidence
  • +0.15 terminal bonus for correct medium or hard routing when at least one investigation was used

The grader in graders.py is unchanged. Final task scores still depend on resolution accuracy, evidence coverage, policy compliance, workflow completion, efficiency, and linked-case consistency.

Task Difficulty

FraudShield has three graded tasks:

Task Design goal What makes it hard
Easy obvious routing with minimal investigation strong visible cues, 1 fetch budget
Medium mixed-signal routing at least 1 investigation needed, 2 evidence points typically matter
Hard linked-case reasoning misleading triage, hidden linkage, 3 fetch budget, graph evidence usually required

How To Run Locally

Install the package:

pip install -e .

Run the heuristic or configured agent:

python inference.py

FraudShield supports three agent modes:

  • heuristic by default when no model credentials are set
  • llm_local when LOCAL_MODEL_PATH points to a trained Hugging Face / PEFT checkpoint
  • llm_remote when an API-compatible model is configured

For a no-paid-model open-source setup, the recommended options are:

Option 1: Use your locally trained model

LOCAL_MODEL_PATH=trained_policy python inference.py

Option 2: Use a Hugging Face hosted open-source model

HF_TOKEN=your_token_here \
MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct \
API_BASE_URL=https://router.huggingface.co/v1 \
python inference.py

If HF_TOKEN is present and API_BASE_URL is not set, FraudShield defaults to the Hugging Face router automatically.

Run the OpenEnv API server:

python -m server.app

Then open the lightweight explorer:

  • http://127.0.0.1:7860/

Important endpoints:

  • GET /health
  • POST /reset?task=easy|medium|hard
  • POST /step
  • GET /state
  • GET /info
  • GET /tasks
  • GET /metadata
  • GET /schema
  • POST /mcp
  • GET /docs

Validation:

python validate_api.py
python -m openenv.cli validate .
docker build -t fraudshield .
docker run -p 7860:7860 fraudshield

How To Run The Training Notebook

The Colab notebook lives at:

  • notebooks/fraudshield_trl_colab.ipynb

It is designed to:

  1. install openenv-core, trl, unsloth, transformers, datasets, and peft
  2. clone the repo and install FraudShield
  3. load a public fraud curriculum dataset from Hugging Face
  4. build a second-stage training set from real FraudShield rollouts
  5. run two-stage fine-tuning with Unsloth LoRA and TRL SFTTrainer
    • stage 1: public fraud-data adaptation
    • stage 2: FraudShield policy adaptation
  6. save a reusable local policy checkpoint
  7. save:
    • reward_curve.png
    • loss_curve.png
    • training_summary.json
  8. evaluate:
    • heuristic via python inference.py
    • trained model via LOCAL_MODEL_PATH=... python inference.py

The notebook is designed for Colab + GPU execution and does not require a paid proprietary LLM. The current public curriculum source is Phoenix21/mock_fraud-detection-dataset, which gives the model broader fraud-signal exposure before it is adapted to FraudShield actions.

Results

Current heuristic baseline, measured with python inference.py:

  • Easy: 0.9900
  • Medium: 0.3500
  • Hard: 0.7425
  • Final: 0.6942

This baseline is intentionally rule-based and not trained. It is strong on easy, weaker on medium, and still imperfect on hard, which leaves headroom for a trained policy that can learn broader fraud patterns from public data and then adapt them to FraudShield.

Once training is completed, this section should include:

  • reward curve image
  • loss curve image
  • trained-vs-heuristic comparison table
  • one short qualitative trace comparison

The preferred final story is:

  • heuristic baseline
  • base open-source LLM or hosted HF model
  • fine-tuned local policy checkpoint

Live Links

  • Hugging Face Space: https://huggingface.co/spaces/DevikaJ2005/fraudshield-1
  • Code repository: https://github.com/DevikaJ2005/Fraudshield
  • Colab notebook: https://colab.research.google.com/github/DevikaJ2005/Fraudshield/blob/main/notebooks/fraudshield_trl_colab.ipynb
  • Blog draft: HF_BLOG_DRAFT.md

The Space root can double as a quick explorer UI for judges before they open the API docs.

For final submission, make sure the README links:

  • the public HF Space
  • the public GitHub repo
  • the public Colab notebook
  • the final Hugging Face blog post or video/slides link
  • the committed reward/loss plot images

Simulation vs Production

FraudShield is a simulation for training and evaluation.

What it does:

  • models partial observability
  • enforces investigation budgets
  • exposes hidden evidence only through actions
  • grades routing behavior in a reproducible way

What it does not do:

  • connect to live financial systems
  • process real customer data
  • move money or block real payments
  • provide production security, auth, or compliance guarantees

A production fraud platform would still need real data pipelines, authentication, authorization, monitoring, compliance controls, and human-review operations beyond this environment.