--- title: Ad Fraud Investigation Environment emoji: "\U0001F575\uFE0F" colorFrom: red colorTo: yellow sdk: docker pinned: false app_port: 8000 tags: - openenv - ad-fraud - reinforcement-learning base_path: /web --- # Ad Fraud Investigation Environment An OpenEnv environment that simulates ad fraud review - a real-world task where AI agents investigate queues of advertisements, uncover fraud signals, and render verdicts under budget constraints. Ad fraud costs the digital advertising industry over **$100 billion annually**. Platforms like Meta process billions of ads daily and ban advertisers only at high confidence thresholds. Unlike simple classification, real ad review is a **sequential decision-making** problem: a reviewer starts with limited surface-level signals, actively chooses what to investigate within a constrained budget, and must decide when enough evidence exists to commit to a verdict. This environment captures that workflow and provides a training ground for agents to learn it. ## Quick Start ### Install ```bash pip install -e . ``` ### Run the server ```bash uvicorn server.app:app --host 0.0.0.0 --port 8000 ``` ### Use the client ```python from ad_fraud_env import AdFraudEnv, AdReviewAction with AdFraudEnv(base_url="http://localhost:8000").sync() as env: result = env.reset(seed=42, task_id="task_1") print(result.observation.queue_summary) # Investigate an ad result = env.step(AdReviewAction( action_type="investigate", ad_id="ad_001", investigation_target="landing_page", )) print(result.observation.feedback) # Render a verdict result = env.step(AdReviewAction( action_type="verdict", ad_id="ad_001", verdict="reject", confidence=0.9, )) print(f"Reward: {result.reward}, Done: {result.done}") ``` ### Run with Docker ```bash docker build -t ad-fraud-env . docker run -p 8000:8000 ad-fraud-env ``` ## Environment Design ### Episode flow Each episode is a review session. The agent receives a queue of ads and must process them within a limited action budget: ``` reset(task_id, seed) | v +----------------------------------+<----------------------+ | Observe queue + first ad info | | +------------------+---------------+ | | | v | +-------------+ +------------------+ | | investigate |---->| Reveal one signal |----------+ +-------------+ | (costs 1 budget) | | +------------------+ v +-------------+ +------------------+ | verdict |---->| approve / reject |----------+ +-------------+ | / escalate | | | +------------------+ | v | +--------------+ +------------------+ | | link_accounts|---->| Flag fraud ring |----------+ +--------------+ | (Task 3 only) | | +------------------+ v Budget exhausted or all ads reviewed -> episode ends ``` ### Tasks Three tasks with increasing difficulty test different capabilities: | Task | Name | Ads | Budget | Composition | Challenge | |---|---|---:|---:|---|---| | 1 | Basic Ad Triage | 5 | 25 | 2 legit, 3 obvious fraud | Learn the investigate -> verdict loop | | 2 | Sophisticated Fraud | 12 | 30 | 5 legit, 5 sophisticated scams, 2 gray-area | Triage under budget pressure (~2.5 actions/ad) | | 3 | Fraud Network Detection | 20 | 35 | 6 legit, 10 fraud (3 hidden rings), 4 gray-area | Cross-ad reasoning to detect coordinated networks (~1.75 actions/ad) | Task 3 introduces **fraud rings** - clusters of 3-5 ads controlled by the same actor, using varied topologies (cliques, chains, hub-and-spoke). Individual ring members look borderline; the fraud signal is only visible by cross-referencing investigation data across ads (shared payment IDs, matching template hashes, overlapping targeting fingerprints). ### Action Space Actions are JSON objects. Three types: **`investigate`** - spend one budget point to reveal a signal about an ad. ```json { "action_type": "investigate", "ad_id": "ad_001", "investigation_target": "landing_page" } ``` Each ad has six investigation dimensions: | Target | What it reveals | |---|---| | `advertiser_history` | Account age, spend history, violation record, verification status | | `landing_page` | Domain age, SSL, registrar, redirect chains, scam template similarity | | `payment_method` | Payment type, chargeback history, cross-account velocity | | `targeting_overlap` | Targeting fingerprint, audience overlap percentages | | `creative_similarity` | Template hash, image dimensions, scam template similarity score | | `campaign_structure` | Objective, bid strategy, budget/age ratio, placement distribution | **`verdict`** - render a final decision on an ad. ```json { "action_type": "verdict", "ad_id": "ad_001", "verdict": "reject", "confidence": 0.9 } ``` `verdict` options: `approve`, `reject`, `escalate`. `confidence`: 0.0-1.0. **`link_accounts`** - flag two ads as part of the same fraud network (Task 3). ```json { "action_type": "link_accounts", "ad_id": "ad_003", "linked_ad_id": "ad_007", "link_reason": "shared payment ID pmt_ring_48231 and matching template hash" } ``` ### Observation Space Observations are text-heavy by design so LLM agents can reason naturally: | Field | Type | Description | |---|---|---| | `queue_summary` | `str` | Task name, total/reviewed/pending counts, budget remaining | | `current_ad_info` | `str` | Ad copy, category, targeting, risk signals for the focused ad | | `investigation_findings` | `str` | Accumulated findings from all investigations so far | | `verdict_history_summary` | `str` | Verdicts rendered so far | | `feedback` | `str` | Natural language feedback on the last action | | `available_ads` | `list[str]` | Ad IDs still pending review | | `queue_status` | `dict` | Structured status for programmatic access | | `done` | `bool` | Whether the episode is complete | | `reward` | `float` | Step reward | ## Reward Design | Action | Reward | Rationale | |---|---:|---| | Investigation | -0.02 | Simulates time/latency cost | | Correct rejection (fraud -> reject) | +0.30 to +0.40 | Scaled by fraud severity | | Correct approval (legit -> approve) | +0.10 | Revenue preserved | | Correct escalation | +0.15 | Appropriate caution | | False positive (legit -> reject) | -0.35 | Lost advertiser revenue | | False negative (fraud -> approve) | -0.50 | Worst outcome - fraud goes live | | Escalate (when wrong) | -0.05 | Human reviewer cost | | Correct network link | +0.40 | High-value coordinated fraud detection | | Incorrect network link | -0.25 | False accusation cost | Unreviewed ads are auto-approved at episode end - missed fraud incurs the full -0.50 false-negative penalty. ## Grading & Scoring Each task has a dedicated grader that produces a normalized **0.0-1.0 score**. Raw reward is normalized between theoretical worst-case (every decision wrong + full budget wasted) and best-case (every decision correct + efficient budget use). | Component | Task 1 | Task 2 | Task 3 | |---|:---:|:---:|:---:| | Verdict accuracy | Yes | Yes | Yes | | Budget efficiency bonus | Yes | Yes | Yes | | Calibration bonus | - | Yes | Yes | | Network detection (edge coverage) | - | - | Yes | | Investigation coverage bonus | - | - | Yes | **Calibration bonus** rewards agents whose stated confidence correlates with actual accuracy - high confidence on correct verdicts and low confidence on uncertain ones. **Network detection** uses edge coverage: what fraction of ground-truth fraud ring connections did the agent discover via `link_accounts`? **Coverage bonus** rewards breadth over depth - agents that review more ads (rather than deep-diving a single one) score higher on Task 3. ## Baseline Scores Generated with `seed=42` using `meta-llama/Llama-3.1-8B-Instruct`. Reproducible via `python inference.py`. | Task | Score | Steps | Verdicts | |---|---:|---:|---:| | Task 1 (Easy) | 0.953 | 10 | 5/5 | | Task 2 (Medium) | 0.882 | 23 | 12/12 | | Task 3 (Hard) | 0.415 | 35 | 20/20 | The sharp drop on Task 3 reflects the difficulty of cross-ad reasoning under tight budget - the baseline agent investigates and renders verdicts well but struggles to detect coordinated fraud rings. ## Project Structure ``` ad_fraud_env/ +-- __init__.py # Package exports +-- client.py # WebSocket client (extends EnvClient) +-- models.py # Action, Observation, State types +-- inference.py # Baseline LLM agent with mandatory stdout logging +-- openenv.yaml # OpenEnv manifest +-- pyproject.toml # Dependencies and package config +-- Dockerfile # Multi-stage Docker build +-- baseline_scores.json # Cached baseline results +-- data/ | +-- ad_generator.py # Episode generation, task configs, campaign profiles | +-- advertiser_profiles.py # Synthetic advertiser history | +-- fraud_patterns.py # Fraud + legit ad templates (easy/medium/hard) | +-- landing_pages.py # Simulated landing page investigation data | +-- network_generator.py # Fraud ring topologies via networkx +-- graders/ | +-- base_grader.py # Shared normalization and reward logic | +-- task1_grader.py # Verdict accuracy only | +-- task2_grader.py # + calibration bonus | +-- task3_grader.py # + network detection + coverage bonus +-- server/ | +-- app.py # FastAPI app with /tasks, /baseline, /grader endpoints | +-- environment.py # Core environment (reset/step/state) | +-- investigate_ui.py # HTML dashboard routes (/investigate, /web redirect) | +-- static/ | +-- investigate_hq.html # Interactive investigation dashboard | +-- requirements.txt # Server dependencies | +-- investigate_ui.py # HTML dashboard routes (/investigate, /web redirect) | +-- static/ | +-- investigate_hq.html # Interactive investigation dashboard +-- tests/ +-- test_data_generation.py # Determinism, cross-ref checks, decoy validation +-- test_environment.py # Step logic, state tracking, anti-exploit +-- test_graders.py # Score ranges, calibration, network scoring ``` ## API Endpoints | Endpoint | Method | Description | |---|---|---| | `/health` | GET | Health check | | `/schema` | GET | Action/Observation JSON schemas | | `/ws` | WS | WebSocket for `step()` / `reset()` / `state()` | | `/tasks` | GET | Task list with configs and action schema | | `/baseline` | GET | Baseline scores (cached or live) | | `/grader` | GET | Last episode's grader result | | `/investigate` | GET | HTML investigation dashboard (also `/` redirects here) | ## License BSD 3-Clause License