AdArena / README.md
QuantumTransformer's picture
Upload folder using huggingface_hub
c24a686 verified
|
Raw
History Blame Contribute Delete
11.4 kB
metadata
title: Ad Fraud Investigation Environment
emoji: 🕵️
colorFrom: red
colorTo: yellow
sdk: docker
pinned: false
app_port: 8000
tags:
  - openenv
  - ad-fraud
  - reinforcement-learning
base_path: /web

Ad Fraud Investigation Environment

An OpenEnv environment that simulates ad fraud review - a real-world task where AI agents investigate queues of advertisements, uncover fraud signals, and render verdicts under budget constraints.

Ad fraud costs the digital advertising industry over $100 billion annually. Platforms like Meta process billions of ads daily and ban advertisers only at high confidence thresholds. Unlike simple classification, real ad review is a sequential decision-making problem: a reviewer starts with limited surface-level signals, actively chooses what to investigate within a constrained budget, and must decide when enough evidence exists to commit to a verdict. This environment captures that workflow and provides a training ground for agents to learn it.

Quick Start

Install

pip install -e .

Run the server

uvicorn server.app:app --host 0.0.0.0 --port 8000

Use the client

from ad_fraud_env import AdFraudEnv, AdReviewAction

with AdFraudEnv(base_url="http://localhost:8000").sync() as env:
    result = env.reset(seed=42, task_id="task_1")
    print(result.observation.queue_summary)

    # Investigate an ad
    result = env.step(AdReviewAction(
        action_type="investigate",
        ad_id="ad_001",
        investigation_target="landing_page",
    ))
    print(result.observation.feedback)

    # Render a verdict
    result = env.step(AdReviewAction(
        action_type="verdict",
        ad_id="ad_001",
        verdict="reject",
        confidence=0.9,
    ))
    print(f"Reward: {result.reward}, Done: {result.done}")

Run with Docker

docker build -t ad-fraud-env .
docker run -p 8000:8000 ad-fraud-env

Environment Design

Episode flow

Each episode is a review session. The agent receives a queue of ads and must process them within a limited action budget:

reset(task_id, seed)
  |
  v
+----------------------------------+<----------------------+
|  Observe queue + first ad info   |                       |
+------------------+---------------+                       |
                   |                                       |
                   v                                       |
        +-------------+     +------------------+           |
        | investigate |---->| Reveal one signal |----------+
        +-------------+     | (costs 1 budget)  |
               |            +------------------+
               v
        +-------------+     +------------------+
        |   verdict   |---->| approve / reject  |----------+
        +-------------+     |  / escalate       |          |
               |            +------------------+           |
               v                                           |
        +--------------+    +------------------+           |
        | link_accounts|---->| Flag fraud ring   |----------+
        +--------------+    | (Task 3 only)     |
               |            +------------------+
               v
        Budget exhausted or all ads reviewed -> episode ends

Tasks

Three tasks with increasing difficulty test different capabilities:

Task Name Ads Budget Composition Challenge
1 Basic Ad Triage 5 25 2 legit, 3 obvious fraud Learn the investigate -> verdict loop
2 Sophisticated Fraud 12 30 5 legit, 5 sophisticated scams, 2 gray-area Triage under budget pressure (~2.5 actions/ad)
3 Fraud Network Detection 20 35 6 legit, 10 fraud (3 hidden rings), 4 gray-area Cross-ad reasoning to detect coordinated networks (~1.75 actions/ad)

Task 3 introduces fraud rings - clusters of 3-5 ads controlled by the same actor, using varied topologies (cliques, chains, hub-and-spoke). Individual ring members look borderline; the fraud signal is only visible by cross-referencing investigation data across ads (shared payment IDs, matching template hashes, overlapping targeting fingerprints).

Action Space

Actions are JSON objects. Three types:

investigate - spend one budget point to reveal a signal about an ad.

{
  "action_type": "investigate",
  "ad_id": "ad_001",
  "investigation_target": "landing_page"
}

Each ad has six investigation dimensions:

Target What it reveals
advertiser_history Account age, spend history, violation record, verification status
landing_page Domain age, SSL, registrar, redirect chains, scam template similarity
payment_method Payment type, chargeback history, cross-account velocity
targeting_overlap Targeting fingerprint, audience overlap percentages
creative_similarity Template hash, image dimensions, scam template similarity score
campaign_structure Objective, bid strategy, budget/age ratio, placement distribution

verdict - render a final decision on an ad.

{
  "action_type": "verdict",
  "ad_id": "ad_001",
  "verdict": "reject",
  "confidence": 0.9
}

verdict options: approve, reject, escalate. confidence: 0.0-1.0.

link_accounts - flag two ads as part of the same fraud network (Task 3).

{
  "action_type": "link_accounts",
  "ad_id": "ad_003",
  "linked_ad_id": "ad_007",
  "link_reason": "shared payment ID pmt_ring_48231 and matching template hash"
}

Observation Space

Observations are text-heavy by design so LLM agents can reason naturally:

Field Type Description
queue_summary str Task name, total/reviewed/pending counts, budget remaining
current_ad_info str Ad copy, category, targeting, risk signals for the focused ad
investigation_findings str Accumulated findings from all investigations so far
verdict_history_summary str Verdicts rendered so far
feedback str Natural language feedback on the last action
available_ads list[str] Ad IDs still pending review
queue_status dict Structured status for programmatic access
done bool Whether the episode is complete
reward float Step reward

Reward Design

Action Reward Rationale
Investigation -0.02 Simulates time/latency cost
Correct rejection (fraud -> reject) +0.30 to +0.40 Scaled by fraud severity
Correct approval (legit -> approve) +0.10 Revenue preserved
Correct escalation +0.15 Appropriate caution
False positive (legit -> reject) -0.35 Lost advertiser revenue
False negative (fraud -> approve) -0.50 Worst outcome - fraud goes live
Escalate (when wrong) -0.05 Human reviewer cost
Correct network link +0.40 High-value coordinated fraud detection
Incorrect network link -0.25 False accusation cost

Unreviewed ads are auto-approved at episode end - missed fraud incurs the full -0.50 false-negative penalty.

Grading & Scoring

Each task has a dedicated grader that produces a normalized 0.0-1.0 score. Raw reward is normalized between theoretical worst-case (every decision wrong + full budget wasted) and best-case (every decision correct + efficient budget use).

Component Task 1 Task 2 Task 3
Verdict accuracy Yes Yes Yes
Budget efficiency bonus Yes Yes Yes
Calibration bonus - Yes Yes
Network detection (edge coverage) - - Yes
Investigation coverage bonus - - Yes

Calibration bonus rewards agents whose stated confidence correlates with actual accuracy - high confidence on correct verdicts and low confidence on uncertain ones.

Network detection uses edge coverage: what fraction of ground-truth fraud ring connections did the agent discover via link_accounts?

Coverage bonus rewards breadth over depth - agents that review more ads (rather than deep-diving a single one) score higher on Task 3.

Baseline Scores

Generated with seed=42 using meta-llama/Llama-3.1-8B-Instruct. Reproducible via python inference.py.

Task Score Steps Verdicts
Task 1 (Easy) 0.953 10 5/5
Task 2 (Medium) 0.882 23 12/12
Task 3 (Hard) 0.415 35 20/20

The sharp drop on Task 3 reflects the difficulty of cross-ad reasoning under tight budget - the baseline agent investigates and renders verdicts well but struggles to detect coordinated fraud rings.

Project Structure

ad_fraud_env/
+-- __init__.py              # Package exports
+-- client.py                # WebSocket client (extends EnvClient)
+-- models.py                # Action, Observation, State types
+-- inference.py             # Baseline LLM agent with mandatory stdout logging
+-- openenv.yaml             # OpenEnv manifest
+-- pyproject.toml           # Dependencies and package config
+-- Dockerfile               # Multi-stage Docker build
+-- baseline_scores.json     # Cached baseline results
+-- data/
|   +-- ad_generator.py      # Episode generation, task configs, campaign profiles
|   +-- advertiser_profiles.py  # Synthetic advertiser history
|   +-- fraud_patterns.py    # Fraud + legit ad templates (easy/medium/hard)
|   +-- landing_pages.py     # Simulated landing page investigation data
|   +-- network_generator.py # Fraud ring topologies via networkx
+-- graders/
|   +-- base_grader.py       # Shared normalization and reward logic
|   +-- task1_grader.py      # Verdict accuracy only
|   +-- task2_grader.py      # + calibration bonus
|   +-- task3_grader.py      # + network detection + coverage bonus
+-- server/
|   +-- app.py               # FastAPI app with /tasks, /baseline, /grader endpoints
|   +-- environment.py       # Core environment (reset/step/state)
|   +-- investigate_ui.py    # HTML dashboard routes (/investigate, /web redirect)
|   +-- static/
|       +-- investigate_hq.html  # Interactive investigation dashboard
|   +-- requirements.txt     # Server dependencies
|   +-- investigate_ui.py    # HTML dashboard routes (/investigate, /web redirect)
|   +-- static/
|       +-- investigate_hq.html  # Interactive investigation dashboard
+-- tests/
    +-- test_data_generation.py  # Determinism, cross-ref checks, decoy validation
    +-- test_environment.py      # Step logic, state tracking, anti-exploit
    +-- test_graders.py          # Score ranges, calibration, network scoring

API Endpoints

Endpoint Method Description
/health GET Health check
/schema GET Action/Observation JSON schemas
/ws WS WebSocket for step() / reset() / state()
/tasks GET Task list with configs and action schema
/baseline GET Baseline scores (cached or live)
/grader GET Last episode's grader result
/investigate GET HTML investigation dashboard (also / redirects here)

License

BSD 3-Clause License