Spaces:

QuantumTransformer
/

AdArena

Sleeping

App Files Files Community

AdArena / README.md

QuantumTransformer

Upload folder using huggingface_hub

c24a686 verified 3 months ago

preview code

Raw

History Blame Contribute Delete

11.4 kB

metadata

title: Ad Fraud Investigation Environment
emoji: 🕵️
colorFrom: red
colorTo: yellow
sdk: docker
pinned: false
app_port: 8000
tags:
  - openenv
  - ad-fraud
  - reinforcement-learning
base_path: /web

Ad Fraud Investigation Environment

An OpenEnv environment that simulates ad fraud review - a real-world task where AI agents investigate queues of advertisements, uncover fraud signals, and render verdicts under budget constraints.

Ad fraud costs the digital advertising industry over $100 billion annually. Platforms like Meta process billions of ads daily and ban advertisers only at high confidence thresholds. Unlike simple classification, real ad review is a sequential decision-making problem: a reviewer starts with limited surface-level signals, actively chooses what to investigate within a constrained budget, and must decide when enough evidence exists to commit to a verdict. This environment captures that workflow and provides a training ground for agents to learn it.

Quick Start

Install

pip install -e .

Run the server

uvicorn server.app:app --host 0.0.0.0 --port 8000

Use the client

from ad_fraud_env import AdFraudEnv, AdReviewAction

with AdFraudEnv(base_url="http://localhost:8000").sync() as env:
    result = env.reset(seed=42, task_id="task_1")
    print(result.observation.queue_summary)

    # Investigate an ad
    result = env.step(AdReviewAction(
        action_type="investigate",
        ad_id="ad_001",
        investigation_target="landing_page",
    ))
    print(result.observation.feedback)

    # Render a verdict
    result = env.step(AdReviewAction(
        action_type="verdict",
        ad_id="ad_001",
        verdict="reject",
        confidence=0.9,
    ))
    print(f"Reward: {result.reward}, Done: {result.done}")

Run with Docker

docker build -t ad-fraud-env .
docker run -p 8000:8000 ad-fraud-env

Environment Design

Episode flow

Each episode is a review session. The agent receives a queue of ads and must process them within a limited action budget:

reset(task_id, seed)
  |
  v
+----------------------------------+<----------------------+
|  Observe queue + first ad info   |                       |
+------------------+---------------+                       |
                   |                                       |
                   v                                       |
        +-------------+     +------------------+           |
        | investigate |---->| Reveal one signal |----------+
        +-------------+     | (costs 1 budget)  |
               |            +------------------+
               v
        +-------------+     +------------------+
        |   verdict   |---->| approve / reject  |----------+
        +-------------+     |  / escalate       |          |
               |            +------------------+           |
               v                                           |
        +--------------+    +------------------+           |
        | link_accounts|---->| Flag fraud ring   |----------+
        +--------------+    | (Task 3 only)     |
               |            +------------------+
               v
        Budget exhausted or all ads reviewed -> episode ends

Tasks

Three tasks with increasing difficulty test different capabilities:

Task	Name	Ads	Budget	Composition	Challenge
1	Basic Ad Triage	5	25	2 legit, 3 obvious fraud	Learn the investigate -> verdict loop
2	Sophisticated Fraud	12	30	5 legit, 5 sophisticated scams, 2 gray-area	Triage under budget pressure (~2.5 actions/ad)
3	Fraud Network Detection	20	35	6 legit, 10 fraud (3 hidden rings), 4 gray-area	Cross-ad reasoning to detect coordinated networks (~1.75 actions/ad)

Task 3 introduces fraud rings - clusters of 3-5 ads controlled by the same actor, using varied topologies (cliques, chains, hub-and-spoke). Individual ring members look borderline; the fraud signal is only visible by cross-referencing investigation data across ads (shared payment IDs, matching template hashes, overlapping targeting fingerprints).

Action Space

Actions are JSON objects. Three types:

investigate - spend one budget point to reveal a signal about an ad.

{
  "action_type": "investigate",
  "ad_id": "ad_001",
  "investigation_target": "landing_page"
}

Each ad has six investigation dimensions:

Target	What it reveals
`advertiser_history`	Account age, spend history, violation record, verification status
`landing_page`	Domain age, SSL, registrar, redirect chains, scam template similarity
`payment_method`	Payment type, chargeback history, cross-account velocity
`targeting_overlap`	Targeting fingerprint, audience overlap percentages
`creative_similarity`	Template hash, image dimensions, scam template similarity score
`campaign_structure`	Objective, bid strategy, budget/age ratio, placement distribution

verdict - render a final decision on an ad.

{
  "action_type": "verdict",
  "ad_id": "ad_001",
  "verdict": "reject",
  "confidence": 0.9
}

verdict options: approve, reject, escalate. confidence: 0.0-1.0.

link_accounts - flag two ads as part of the same fraud network (Task 3).

{
  "action_type": "link_accounts",
  "ad_id": "ad_003",
  "linked_ad_id": "ad_007",
  "link_reason": "shared payment ID pmt_ring_48231 and matching template hash"
}

Observation Space

Observations are text-heavy by design so LLM agents can reason naturally:

Field	Type	Description
`queue_summary`	`str`	Task name, total/reviewed/pending counts, budget remaining
`current_ad_info`	`str`	Ad copy, category, targeting, risk signals for the focused ad
`investigation_findings`	`str`	Accumulated findings from all investigations so far
`verdict_history_summary`	`str`	Verdicts rendered so far
`feedback`	`str`	Natural language feedback on the last action
`available_ads`	`list[str]`	Ad IDs still pending review
`queue_status`	`dict`	Structured status for programmatic access
`done`	`bool`	Whether the episode is complete
`reward`	`float`	Step reward

Reward Design

Action	Reward	Rationale
Investigation	-0.02	Simulates time/latency cost
Correct rejection (fraud -> reject)	+0.30 to +0.40	Scaled by fraud severity
Correct approval (legit -> approve)	+0.10	Revenue preserved
Correct escalation	+0.15	Appropriate caution
False positive (legit -> reject)	-0.35	Lost advertiser revenue
False negative (fraud -> approve)	-0.50	Worst outcome - fraud goes live
Escalate (when wrong)	-0.05	Human reviewer cost
Correct network link	+0.40	High-value coordinated fraud detection
Incorrect network link	-0.25	False accusation cost

Unreviewed ads are auto-approved at episode end - missed fraud incurs the full -0.50 false-negative penalty.

Grading & Scoring

Each task has a dedicated grader that produces a normalized 0.0-1.0 score. Raw reward is normalized between theoretical worst-case (every decision wrong + full budget wasted) and best-case (every decision correct + efficient budget use).

Component	Task 1	Task 2	Task 3
Verdict accuracy	Yes	Yes	Yes
Budget efficiency bonus	Yes	Yes	Yes
Calibration bonus	-	Yes	Yes
Network detection (edge coverage)	-	-	Yes
Investigation coverage bonus	-	-	Yes

Calibration bonus rewards agents whose stated confidence correlates with actual accuracy - high confidence on correct verdicts and low confidence on uncertain ones.

Network detection uses edge coverage: what fraction of ground-truth fraud ring connections did the agent discover via link_accounts?

Coverage bonus rewards breadth over depth - agents that review more ads (rather than deep-diving a single one) score higher on Task 3.

Baseline Scores

Generated with seed=42 using meta-llama/Llama-3.1-8B-Instruct. Reproducible via python inference.py.

Task	Score	Steps	Verdicts
Task 1 (Easy)	0.953	10	5/5
Task 2 (Medium)	0.882	23	12/12
Task 3 (Hard)	0.415	35	20/20

The sharp drop on Task 3 reflects the difficulty of cross-ad reasoning under tight budget - the baseline agent investigates and renders verdicts well but struggles to detect coordinated fraud rings.

Project Structure

ad_fraud_env/
+-- __init__.py              # Package exports
+-- client.py                # WebSocket client (extends EnvClient)
+-- models.py                # Action, Observation, State types
+-- inference.py             # Baseline LLM agent with mandatory stdout logging
+-- openenv.yaml             # OpenEnv manifest
+-- pyproject.toml           # Dependencies and package config
+-- Dockerfile               # Multi-stage Docker build
+-- baseline_scores.json     # Cached baseline results
+-- data/
|   +-- ad_generator.py      # Episode generation, task configs, campaign profiles
|   +-- advertiser_profiles.py  # Synthetic advertiser history
|   +-- fraud_patterns.py    # Fraud + legit ad templates (easy/medium/hard)
|   +-- landing_pages.py     # Simulated landing page investigation data
|   +-- network_generator.py # Fraud ring topologies via networkx
+-- graders/
|   +-- base_grader.py       # Shared normalization and reward logic
|   +-- task1_grader.py      # Verdict accuracy only
|   +-- task2_grader.py      # + calibration bonus
|   +-- task3_grader.py      # + network detection + coverage bonus
+-- server/
|   +-- app.py               # FastAPI app with /tasks, /baseline, /grader endpoints
|   +-- environment.py       # Core environment (reset/step/state)
|   +-- investigate_ui.py    # HTML dashboard routes (/investigate, /web redirect)
|   +-- static/
|       +-- investigate_hq.html  # Interactive investigation dashboard
|   +-- requirements.txt     # Server dependencies
|   +-- investigate_ui.py    # HTML dashboard routes (/investigate, /web redirect)
|   +-- static/
|       +-- investigate_hq.html  # Interactive investigation dashboard
+-- tests/
    +-- test_data_generation.py  # Determinism, cross-ref checks, decoy validation
    +-- test_environment.py      # Step logic, state tracking, anti-exploit
    +-- test_graders.py          # Score ranges, calibration, network scoring

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/schema`	GET	Action/Observation JSON schemas
`/ws`	WS	WebSocket for `step()` / `reset()` / `state()`
`/tasks`	GET	Task list with configs and action schema
`/baseline`	GET	Baseline scores (cached or live)
`/grader`	GET	Last episode's grader result
`/investigate`	GET	HTML investigation dashboard (also `/` redirects here)

License

BSD 3-Clause License