File size: 10,070 Bytes
37e25c4
 
1510f7f
37e25c4
 
 
1510f7f
37e25c4
50ef6b4
37e25c4
 
338316c
3818a51
30533d1
af4e958
ce9edc2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30533d1
af4e958
30533d1
3818a51
30533d1
3818a51
30533d1
e4ffe61
30533d1
e4ffe61
30533d1
 
 
 
3818a51
30533d1
3818a51
30533d1
 
 
 
 
 
 
 
 
 
3818a51
30533d1
3818a51
30533d1
3818a51
30533d1
3818a51
30533d1
 
 
 
 
 
 
 
 
1510f7f
 
 
 
 
 
 
30533d1
3818a51
30533d1
3818a51
30533d1
3818a51
1510f7f
 
 
30533d1
 
 
 
 
 
 
 
 
3818a51
30533d1
3818a51
30533d1
3818a51
30533d1
3818a51
30533d1
 
 
 
 
 
 
3818a51
30533d1
3818a51
30533d1
dbee4da
30533d1
dbee4da
30533d1
 
 
 
 
 
 
3818a51
30533d1
3818a51
ea9eade
af4e958
 
 
30533d1
af4e958
 
 
 
 
ccd0934
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30533d1
af4e958
 
30533d1
3818a51
 
30533d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af4e958
 
30533d1
 
 
 
af4e958
 
30533d1
 
 
af4e958
30533d1
 
 
af4e958
30533d1
 
ccd0934
 
 
 
 
 
30533d1
 
 
 
 
 
 
1510f7f
ccd0934
1510f7f
30533d1
3818a51
30533d1
af4e958
1510f7f
 
30533d1
 
af4e958
ccd0934
af4e958
30533d1
e4ffe61
30533d1
 
 
 
3818a51
ccd0934
 
 
 
 
 
30533d1
3818a51
30533d1
 
 
 
3818a51
30533d1
3818a51
30533d1
3818a51
30533d1
 
 
 
 
3818a51
30533d1
3818a51
30533d1
3818a51
30533d1
3818a51
30533d1
 
 
 
1510f7f
30533d1
1510f7f
30533d1
 
 
 
1510f7f
30533d1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
---
title: FraudShield
emoji: "🛡️"
colorFrom: blue
colorTo: indigo
sdk: docker
python_version: "3.12"
pinned: false
license: mit
---

# FraudShield

FraudShield is a partial-observability OpenEnv environment for simulated fraud investigation and workflow-aware routing.

## Training-First Architecture

FraudShield now includes a modular LLM + RL training stack alongside the OpenEnv runtime:

- `environment.py`: text-first wrapper for multi-step rollouts
- `reward.py`: decomposed numeric reward with measurable subscores
- `train.py`: Colab-friendly QLoRA training pipeline
- `evaluate.py`: fixed-task evaluation and comparison plots
- `config.py`: experiment, model, environment, and reward configuration
- `utils.py`: seeding, JSON handling, logging helpers, and moving averages
- `configs/colab_qlora_grpo.json`: default Colab experiment config

This layer is designed so you can generate rollouts, score model behavior with decomposed rewards, save checkpoints, resume runs, and compare before/after performance in a repeatable way.

Experimental tracking is enabled by default through TensorBoard logs under `artifacts/rl_runs/.../tb_logs`, and the training pipeline also writes plot artifacts such as `loss_vs_steps.png` and `reward_vs_steps.png`. If you want hosted tracking, set `report_to=["wandb"]` or `["tensorboard","wandb"]` in the experiment config before the run.

## What This Is

FraudShield is an RL-ready simulation, not a live fraud platform. An agent receives a limited triage view of a case, chooses investigation actions to reveal hidden evidence, and then routes the case with one of the supported final resolutions.

The environment is built for OpenEnv evaluation and training. It keeps the runtime fully offline by using the frozen snapshot in `data/fraudshield_cases.json`.

## Why It Matters For Theme 3.1

Theme 3.1 is about professional tasks, tool use, and world modeling under partial observability. FraudShield fits that directly:

- the agent starts with incomplete information
- useful evidence appears only after the right action is taken
- the environment rewards workflow quality, not just final correctness
- harder tasks require multi-step investigation and linked-case reasoning

This makes it a better fit for training decision-making agents than a one-shot fraud classifier.

## Lightweight Explorer UI

FraudShield now includes a small browser explorer at `/` so you can inspect the environment without sending raw API requests by hand. The explorer lets you:

- reset an easy, medium, or hard episode
- click investigation and resolution actions one step at a time
- inspect the live observation and full environment state
- run the current heuristic baseline as a walkthrough before RL training

This UI is intentionally lightweight. It is there to make the environment easier to understand, not to turn FraudShield into a fake production product.

## Environment Design

### Action Space

FraudShield keeps a fixed typed action space:

- `review_transaction`: open the operational transaction trace for the active case
- `fetch_customer_profile`: reveal buyer age, dispute history, and repeat-buyer status
- `fetch_merchant_profile`: reveal seller age, rating, reviews, and chargeback rate
- `fetch_network_graph`: reveal shared-device activity, prior flags, cluster risk, linked cards, and linked case IDs when present
- `check_policy`: reveal routing policy guidance
- `add_case_note`: write the required audit note before final closure
- `resolve_case`: submit one final resolution

Supported final resolutions:

- `approve`
- `block`
- `hold`
- `request_docs`
- `escalate`

### Observation Space

The public observation model stays the same, but the reset-time contents are intentionally sparse.

At reset, the agent only sees:

- `case_id`
- `task_name`
- `remaining_steps`
- `episode_step`
- `case_summary.amount_usd`
- a short triage summary in `case_summary.queue_reason`
- coarse context in `app_context`:
  - `item_category`
  - `timestamp`
  - `investigation_budget_remaining`
  - `available_investigations`
- the currently valid public actions in `allowed_actions`

Hidden details do **not** appear until the matching action is taken. In particular, seller profile, buyer profile, network risk, payment method, shipping behavior, and linked-case structure are progressively revealed through `revealed_evidence`.

### Reward Design

FraudShield keeps the existing correctness-driven terminal structure and adds workflow-shaped rewards:

- `+0.05` for a first-time useful fetch
- `+0.08` for `review_transaction` on cases with hidden high-risk payment or fulfillment facts
- `+0.08` for `fetch_network_graph` on cases with high hidden cluster risk
- `-0.05` for redundant repeated fetches
- `-0.03` for fetches after the case fetch budget is exhausted
- `-0.10` for resolving a medium or hard case with no fetch-based evidence
- `+0.15` terminal bonus for correct medium or hard routing when at least one investigation was used

The grader in `graders.py` is unchanged. Final task scores still depend on resolution accuracy, evidence coverage, policy compliance, workflow completion, efficiency, and linked-case consistency.

### Task Difficulty

FraudShield has three graded tasks:

| Task | Design goal | What makes it hard |
| --- | --- | --- |
| Easy | obvious routing with minimal investigation | strong visible cues, 1 fetch budget |
| Medium | mixed-signal routing | at least 1 investigation needed, 2 evidence points typically matter |
| Hard | linked-case reasoning | misleading triage, hidden linkage, 3 fetch budget, graph evidence usually required |

## How To Run Locally

Install the package:

```bash
pip install -e .
```

Run the heuristic or configured agent:

```bash
python inference.py
```

FraudShield supports three agent modes:

- `heuristic` by default when no model credentials are set
- `llm_local` when `LOCAL_MODEL_PATH` points to a trained Hugging Face / PEFT checkpoint
- `llm_remote` when an API-compatible model is configured

For a no-paid-model open-source setup, the recommended options are:

### Option 1: Use your locally trained model

```bash
LOCAL_MODEL_PATH=trained_policy python inference.py
```

### Option 2: Use a Hugging Face hosted open-source model

```bash
HF_TOKEN=your_token_here \
MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct \
API_BASE_URL=https://router.huggingface.co/v1 \
python inference.py
```

If `HF_TOKEN` is present and `API_BASE_URL` is not set, FraudShield defaults to the Hugging Face router automatically.

Run the OpenEnv API server:

```bash
python -m server.app
```

Then open the lightweight explorer:

- `http://127.0.0.1:7860/`

Important endpoints:

- `GET /health`
- `POST /reset?task=easy|medium|hard`
- `POST /step`
- `GET /state`
- `GET /info`
- `GET /tasks`
- `GET /metadata`
- `GET /schema`
- `POST /mcp`
- `GET /docs`

Validation:

```bash
python validate_api.py
python -m openenv.cli validate .
docker build -t fraudshield .
docker run -p 7860:7860 fraudshield
```

## How To Run The Training Notebook

The Colab notebook lives at:

- `notebooks/fraudshield_trl_colab.ipynb`

It is designed to:

1. install `openenv-core`, `trl`, `unsloth`, `transformers`, `datasets`, and `peft`
2. clone the repo and install FraudShield
3. load a public fraud curriculum dataset from Hugging Face
4. build a second-stage training set from real FraudShield rollouts
5. run two-stage fine-tuning with Unsloth LoRA and TRL `SFTTrainer`
   - stage 1: public fraud-data adaptation
   - stage 2: FraudShield policy adaptation
5. save a reusable local policy checkpoint
6. save:
   - `reward_curve.png`
   - `loss_curve.png`
   - `training_summary.json`
7. evaluate:
   - heuristic via `python inference.py`
   - trained model via `LOCAL_MODEL_PATH=... python inference.py`

The notebook is designed for Colab + GPU execution and does not require a paid proprietary LLM. The current public curriculum source is `Phoenix21/mock_fraud-detection-dataset`, which gives the model broader fraud-signal exposure before it is adapted to FraudShield actions.

## Results

Current heuristic baseline, measured with `python inference.py`:

- Easy: `0.9900`
- Medium: `0.3500`
- Hard: `0.7425`
- Final: `0.6942`

This baseline is intentionally rule-based and not trained. It is strong on easy, weaker on medium, and still imperfect on hard, which leaves headroom for a trained policy that can learn broader fraud patterns from public data and then adapt them to FraudShield.

Once training is completed, this section should include:

- reward curve image
- loss curve image
- trained-vs-heuristic comparison table
- one short qualitative trace comparison

The preferred final story is:

- heuristic baseline
- base open-source LLM or hosted HF model
- fine-tuned local policy checkpoint

## Live Links

- Hugging Face Space: `https://huggingface.co/spaces/DevikaJ2005/fraudshield-1`
- Code repository: `https://github.com/DevikaJ2005/Fraudshield`
- Colab notebook: `https://colab.research.google.com/github/DevikaJ2005/Fraudshield/blob/main/notebooks/fraudshield_trl_colab.ipynb`
- Blog draft: `HF_BLOG_DRAFT.md`

The Space root can double as a quick explorer UI for judges before they open the API docs.

For final submission, make sure the README links:

- the public HF Space
- the public GitHub repo
- the public Colab notebook
- the final Hugging Face blog post or video/slides link
- the committed reward/loss plot images

## Simulation vs Production

FraudShield is a simulation for training and evaluation.

What it does:

- models partial observability
- enforces investigation budgets
- exposes hidden evidence only through actions
- grades routing behavior in a reproducible way

What it does **not** do:

- connect to live financial systems
- process real customer data
- move money or block real payments
- provide production security, auth, or compliance guarantees

A production fraud platform would still need real data pipelines, authentication, authorization, monitoring, compliance controls, and human-review operations beyond this environment.