Spaces:
Sleeping
title: Ad Fraud Investigation Environment
emoji: 🕵️
colorFrom: red
colorTo: yellow
sdk: docker
pinned: false
app_port: 8000
tags:
- openenv
- ad-fraud
- reinforcement-learning
base_path: /web
Ad Fraud Investigation Environment
An OpenEnv environment that simulates ad fraud review - a real-world task where AI agents investigate queues of advertisements, uncover fraud signals, and render verdicts under budget constraints.
Ad fraud costs the digital advertising industry over $100 billion annually. Platforms like Meta process billions of ads daily and ban advertisers only at high confidence thresholds. Unlike simple classification, real ad review is a sequential decision-making problem: a reviewer starts with limited surface-level signals, actively chooses what to investigate within a constrained budget, and must decide when enough evidence exists to commit to a verdict. This environment captures that workflow and provides a training ground for agents to learn it.
Quick Start
Install
pip install -e .
Run the server
uvicorn server.app:app --host 0.0.0.0 --port 8000
Use the client
from ad_fraud_env import AdFraudEnv, AdReviewAction
with AdFraudEnv(base_url="http://localhost:8000").sync() as env:
result = env.reset(seed=42, task_id="task_1")
print(result.observation.queue_summary)
# Investigate an ad
result = env.step(AdReviewAction(
action_type="investigate",
ad_id="ad_001",
investigation_target="landing_page",
))
print(result.observation.feedback)
# Render a verdict
result = env.step(AdReviewAction(
action_type="verdict",
ad_id="ad_001",
verdict="reject",
confidence=0.9,
))
print(f"Reward: {result.reward}, Done: {result.done}")
Run with Docker
docker build -t ad-fraud-env .
docker run -p 8000:8000 ad-fraud-env
Environment Design
Episode flow
Each episode is a review session. The agent receives a queue of ads and must process them within a limited action budget:
reset(task_id, seed)
|
v
+----------------------------------+<----------------------+
| Observe queue + first ad info | |
+------------------+---------------+ |
| |
v |
+-------------+ +------------------+ |
| investigate |---->| Reveal one signal |----------+
+-------------+ | (costs 1 budget) |
| +------------------+
v
+-------------+ +------------------+
| verdict |---->| approve / reject |----------+
+-------------+ | / escalate | |
| +------------------+ |
v |
+--------------+ +------------------+ |
| link_accounts|---->| Flag fraud ring |----------+
+--------------+ | (Task 3 only) |
| +------------------+
v
Budget exhausted or all ads reviewed -> episode ends
Tasks
Three tasks with increasing difficulty test different capabilities:
| Task | Name | Ads | Budget | Composition | Challenge |
|---|---|---|---|---|---|
| 1 | Basic Ad Triage | 5 | 25 | 2 legit, 3 obvious fraud | Learn the investigate -> verdict loop |
| 2 | Sophisticated Fraud | 12 | 30 | 5 legit, 5 sophisticated scams, 2 gray-area | Triage under budget pressure (~2.5 actions/ad) |
| 3 | Fraud Network Detection | 20 | 35 | 6 legit, 10 fraud (3 hidden rings), 4 gray-area | Cross-ad reasoning to detect coordinated networks (~1.75 actions/ad) |
Task 3 introduces fraud rings - clusters of 3-5 ads controlled by the same actor, using varied topologies (cliques, chains, hub-and-spoke). Individual ring members look borderline; the fraud signal is only visible by cross-referencing investigation data across ads (shared payment IDs, matching template hashes, overlapping targeting fingerprints).
Action Space
Actions are JSON objects. Three types:
investigate - spend one budget point to reveal a signal about an ad.
{
"action_type": "investigate",
"ad_id": "ad_001",
"investigation_target": "landing_page"
}
Each ad has six investigation dimensions:
| Target | What it reveals |
|---|---|
advertiser_history |
Account age, spend history, violation record, verification status |
landing_page |
Domain age, SSL, registrar, redirect chains, scam template similarity |
payment_method |
Payment type, chargeback history, cross-account velocity |
targeting_overlap |
Targeting fingerprint, audience overlap percentages |
creative_similarity |
Template hash, image dimensions, scam template similarity score |
campaign_structure |
Objective, bid strategy, budget/age ratio, placement distribution |
verdict - render a final decision on an ad.
{
"action_type": "verdict",
"ad_id": "ad_001",
"verdict": "reject",
"confidence": 0.9
}
verdict options: approve, reject, escalate. confidence: 0.0-1.0.
link_accounts - flag two ads as part of the same fraud network (Task 3).
{
"action_type": "link_accounts",
"ad_id": "ad_003",
"linked_ad_id": "ad_007",
"link_reason": "shared payment ID pmt_ring_48231 and matching template hash"
}
Observation Space
Observations are text-heavy by design so LLM agents can reason naturally:
| Field | Type | Description |
|---|---|---|
queue_summary |
str |
Task name, total/reviewed/pending counts, budget remaining |
current_ad_info |
str |
Ad copy, category, targeting, risk signals for the focused ad |
investigation_findings |
str |
Accumulated findings from all investigations so far |
verdict_history_summary |
str |
Verdicts rendered so far |
feedback |
str |
Natural language feedback on the last action |
available_ads |
list[str] |
Ad IDs still pending review |
queue_status |
dict |
Structured status for programmatic access |
done |
bool |
Whether the episode is complete |
reward |
float |
Step reward |
Reward Design
| Action | Reward | Rationale |
|---|---|---|
| Investigation | -0.02 | Simulates time/latency cost |
| Correct rejection (fraud -> reject) | +0.30 to +0.40 | Scaled by fraud severity |
| Correct approval (legit -> approve) | +0.10 | Revenue preserved |
| Correct escalation | +0.15 | Appropriate caution |
| False positive (legit -> reject) | -0.35 | Lost advertiser revenue |
| False negative (fraud -> approve) | -0.50 | Worst outcome - fraud goes live |
| Escalate (when wrong) | -0.05 | Human reviewer cost |
| Correct network link | +0.40 | High-value coordinated fraud detection |
| Incorrect network link | -0.25 | False accusation cost |
Unreviewed ads are auto-approved at episode end - missed fraud incurs the full -0.50 false-negative penalty.
Grading & Scoring
Each task has a dedicated grader that produces a normalized 0.0-1.0 score. Raw reward is normalized between theoretical worst-case (every decision wrong + full budget wasted) and best-case (every decision correct + efficient budget use).
| Component | Task 1 | Task 2 | Task 3 |
|---|---|---|---|
| Verdict accuracy | Yes | Yes | Yes |
| Budget efficiency bonus | Yes | Yes | Yes |
| Calibration bonus | - | Yes | Yes |
| Network detection (edge coverage) | - | - | Yes |
| Investigation coverage bonus | - | - | Yes |
Calibration bonus rewards agents whose stated confidence correlates with actual accuracy - high confidence on correct verdicts and low confidence on uncertain ones.
Network detection uses edge coverage: what fraction of ground-truth fraud ring connections did the agent discover via link_accounts?
Coverage bonus rewards breadth over depth - agents that review more ads (rather than deep-diving a single one) score higher on Task 3.
Baseline Scores
Generated with seed=42 using meta-llama/Llama-3.1-8B-Instruct. Reproducible via python inference.py.
| Task | Score | Steps | Verdicts |
|---|---|---|---|
| Task 1 (Easy) | 0.953 | 10 | 5/5 |
| Task 2 (Medium) | 0.882 | 23 | 12/12 |
| Task 3 (Hard) | 0.415 | 35 | 20/20 |
The sharp drop on Task 3 reflects the difficulty of cross-ad reasoning under tight budget - the baseline agent investigates and renders verdicts well but struggles to detect coordinated fraud rings.
Project Structure
ad_fraud_env/
+-- __init__.py # Package exports
+-- client.py # WebSocket client (extends EnvClient)
+-- models.py # Action, Observation, State types
+-- inference.py # Baseline LLM agent with mandatory stdout logging
+-- openenv.yaml # OpenEnv manifest
+-- pyproject.toml # Dependencies and package config
+-- Dockerfile # Multi-stage Docker build
+-- baseline_scores.json # Cached baseline results
+-- data/
| +-- ad_generator.py # Episode generation, task configs, campaign profiles
| +-- advertiser_profiles.py # Synthetic advertiser history
| +-- fraud_patterns.py # Fraud + legit ad templates (easy/medium/hard)
| +-- landing_pages.py # Simulated landing page investigation data
| +-- network_generator.py # Fraud ring topologies via networkx
+-- graders/
| +-- base_grader.py # Shared normalization and reward logic
| +-- task1_grader.py # Verdict accuracy only
| +-- task2_grader.py # + calibration bonus
| +-- task3_grader.py # + network detection + coverage bonus
+-- server/
| +-- app.py # FastAPI app with /tasks, /baseline, /grader endpoints
| +-- environment.py # Core environment (reset/step/state)
| +-- investigate_ui.py # HTML dashboard routes (/investigate, /web redirect)
| +-- static/
| +-- investigate_hq.html # Interactive investigation dashboard
| +-- requirements.txt # Server dependencies
| +-- investigate_ui.py # HTML dashboard routes (/investigate, /web redirect)
| +-- static/
| +-- investigate_hq.html # Interactive investigation dashboard
+-- tests/
+-- test_data_generation.py # Determinism, cross-ref checks, decoy validation
+-- test_environment.py # Step logic, state tracking, anti-exploit
+-- test_graders.py # Score ranges, calibration, network scoring
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/schema |
GET | Action/Observation JSON schemas |
/ws |
WS | WebSocket for step() / reset() / state() |
/tasks |
GET | Task list with configs and action schema |
/baseline |
GET | Baseline scores (cached or live) |
/grader |
GET | Last episode's grader result |
/investigate |
GET | HTML investigation dashboard (also / redirects here) |
License
BSD 3-Clause License