OpenEnv-Auctioneer / README.md
Preethika
MM : Readme clean
445a7b6
metadata
title: OpenEnv Creative Auctioneer
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
tags:
  - openenv

OpenEnv Creative Auctioneer

A privacy-native real-time bidding (RTB) ad auction environment where an RL agent acts as an autonomous Account Manager β€” navigating a 24-hour campaign cycle, selecting ad creatives, pacing budgets, and assembling viral captions to maximise Return on Ad Spend (ROAS) β€” all without individual user identifiers.

Motivation

Programmatic advertising is a $500 B+ industry where split-second bidding decisions determine campaign success. Existing RL benchmarks either use toy grid-worlds or require proprietary data. OpenEnv-Auctioneer fills this gap with a fully open, dataset-calibrated simulation grounded in:

Dataset Role
MIND (Microsoft News) CTR calibration + headline catalog
iPinYou RTB Competitor bid distributions (Lognormal/hour)
Vogue Dialogue User persona bank
MS-COCO Captions 2017 Ad + caption pool for hard_assembly
Google Trends / Reddit Live viral hashtag scraping

All datasets are optional β€” the environment falls back to published statistics so it runs out-of-the-box with zero downloads.


Action Space

class Action(BaseModel):
    bid_price: float          # USD bid for the RTB auction (β‰₯ 0)
    headline_id: int          # Index into the 6-slot headlines catalog (0–5)
    creative_id: int          # Index into the 6-slot creatives catalog (0–5)
    generated_caption: str | None    # [hard_assembly] Rewritten caption with viral hashtags
    generated_hashtags: list[str] | None  # [hard_assembly] Chosen hashtags (e.g. ["#QuietLuxury", "#OOTD"])

Observation Space

class Observation(BaseModel):
    hour_of_day: int          # Current hour (0–23)
    remaining_budget: float   # Remaining budget in USD
    spent_so_far: float       # Cumulative spend
    current_context: str      # "Fitness" | "Tech" | "Fashion" | "Gaming"
    news_category: str        # Fine-grained MIND subcategory
    viral_trend: str          # Current cultural trend token
    market_pressure: float    # Auction competitiveness [0, 1]
    ads_shown_this_session: int
    fatigue_level: float      # User fatigue [0, 1]
    carryover_boost: float    # Brand-recall CTR boost [0, 0.30]
    last_ctr: float           # Previous step CTR
    cumulative_revenue: float # Total revenue earned

    # hard_assembly only:
    live_hashtags: list[str]      # Real-time scraped viral hashtags
    image_description: str        # Source ad image description
    base_caption: str             # Base caption to rewrite

Reward Signal

Outcome Reward
Auction won adjusted_ctr Γ— $15 βˆ’ clearing_price
Auction lost βˆ’$0.10 (missed opportunity)
Over-pacing (medium only) βˆ’$1.00 penalty
Assembly bonus (hard_assembly) +composite_score Γ— $8.00

Rewards are per-step (not sparse), providing continuous gradient signal.


Tasks

Level 1 β€” easy_headline (Easy)

Objective: Select the headline with the highest CTR for each context. Budget: $100 | Grader: mean(CTR_selected / CTR_oracle) | Target: 0.75

Level 2 β€” medium_pacing (Medium)

Objective: Pace $50 across 24 hours; retain β‰₯ 20% for peak hours (18–22). Budget: $50 | Grader: 0.3Γ—smoothness + 0.3Γ—peak_survival + 0.4Γ—revenue | Target: 0.70

Level 3 β€” hard_assembly (Hard)

Objective: Given an ad image description + base caption + live viral hashtags, generate a new caption that is simultaneously viral, coherent with the image, and creatively novel β€” while also winning auctions profitably.

Budget: $120 | Target: 0.65

The RL loop (what the LLM agent does each step):

1. Agent receives: image_description, base_caption, live_hashtags[], viral_trend
2. Agent must:
   a. Select 2–4 relevant hashtags from live_hashtags (scraped from Google Trends / Reddit)
   b. Rewrite the base caption to weave those hashtags into natural ad copy
   c. Add its own creative words (target 30–50% novel vocabulary)
   d. Keep the caption coherent with the source image
   e. Set a profitable bid price
3. Grader scores the assembled caption on 4 axes:
   β€’ 35% β€” Hashtag relevance  (cosine_sim of each hashtag vs viral_trend)
   β€’ 35% β€” Caption-trend alignment  (cosine_sim of caption vs viral_trend)
   β€’ 20% β€” Caption-image coherence  (cosine_sim of caption vs image_description)
   β€’ 10% β€” Novelty  (fraction of new words vs base_caption, target ~40%)
4. Reward = auction_reward + composite_score Γ— $8.00 bonus

Data sources for hard_assembly:

  • Ad creatives: MS-COCO Captions 2017 (val annotations) bucketed into Fitness/Tech/Fashion/Gaming by keyword matching. Falls back to 30-entry built-in seed pool.
  • Viral hashtags: ViralHashtagScraper queries Google Trends (via pytrends) and Reddit /r/popular/hot.json (public, no auth). Blends with static seed hashtags per context and trend. Cached for 1 hour.

Level 4 β€” hard_sequencing (Hard)

Objective: Plan 24-hour ad placements with carry-over brand-recall boosts. Winning triggers +15%/+10%/+5% CTR for the next 3 hours. Cover β‰₯ 3 contexts for a 20% diversity bonus. Budget: $100 | Grader: min(1.0, agent_conv/oracle_conv Γ— diversity_mult) | Target: 0.60


Grading Details

EasyHeadlineGrader

step_score  = CTR_selected / CTR_oracle
final_score = mean(step_scores)                         // [0.0, 1.0]

MediumPacingGrader

smoothness     = 1 βˆ’ mean(|hourly_spend βˆ’ ideal_spend| / ideal_spend)
peak_survival  = 1.0 if remaining_budget β‰₯ 20% at hour 18, else 0.0
revenue_factor = min(1.0, total_revenue / $30)

final_score = 0.30 Γ— smoothness + 0.30 Γ— peak_survival + 0.40 Γ— revenue_factor

HardAssemblyGrader β€” 4-Axis Composite

Axis Weight Metric
Hashtag Relevance 0.35 mean(cosine_sim(hashtag, viral_trend))
Caption-Trend Alignment 0.35 cosine_sim(caption, viral_trend)
Caption-Image Coherence 0.20 cosine_sim(caption, image_description)
Novelty 0.10 `1 βˆ’
composite = Ξ£ (weight Γ— axis_score)

final_score = 0.60 Γ— mean(composite_scores)
            + 0.40 Γ— min(1.0, total_revenue / $55)

HardSequencingGrader

agent_conversions  = Ξ£ [CTR_t Γ— (1 + carryover_boost_t) Γ— $15]
oracle_conversions = DP-optimal bid/skip sequence with carry-over

diversity_mult = 1.20 if β‰₯3 distinct contexts won, else 1.0

final_score = min(1.0, agent_conv / oracle_conv Γ— diversity_mult)

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  OpenEnvAuctioneer (Gym-style environment)                β”‚
β”‚                                                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Market Engine    β”‚   β”‚   User Simulator              β”‚ β”‚
β”‚  β”‚  (Statistical)    β”‚   β”‚   (Semantic / LLM)            β”‚ β”‚
β”‚  β”‚                   β”‚   β”‚                               β”‚ β”‚
β”‚  β”‚  iPinYou RTB logs β”‚   β”‚  SentenceTransformer          β”‚ β”‚
β”‚  β”‚  β†’ Lognormal per  β”‚   β”‚  all-MiniLM-L6-v2            β”‚ β”‚
β”‚  β”‚    hour bucket    β”‚   β”‚  + optional Llama-3-8B        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  MIND Dataset Layer  (Microsoft News Dataset)         β”‚ β”‚
β”‚  β”‚  behaviours.tsv  β†’  CTRCalibrator                     β”‚ β”‚
β”‚  β”‚  news.tsv        β†’  MINDCreativePool (headlines)      β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Ad + Caption Dataset  (MS-COCO Captions 2017)        β”‚ β”‚
β”‚  β”‚  β†’ image_description + base_caption per step          β”‚ β”‚
β”‚  β”‚  β†’ ViralHashtagScraper (pytrends + Reddit + seeds)    β”‚ β”‚
β”‚  β”‚  β†’ agent rewrites caption with viral hashtags         β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Grader (task-specific, deterministic 0.0–1.0)        β”‚ β”‚
β”‚  β”‚   Level 1: easy_headline  β†’ headline CTR lookup       β”‚ β”‚
β”‚  β”‚   Level 2: medium_pacing  β†’ pacing + survival         β”‚ β”‚
β”‚  β”‚   Level 3: hard_assembly  β†’ 4-axis composite score    β”‚ β”‚
│  │   Level 4: hard_sequencing→ DP oracle comparison      │ │
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Models

Model Role Always Active?
all-MiniLM-L6-v2 (SentenceTransformer) Semantic CTR scoring + grader cosine similarity βœ… Yes
Meta-Llama-3-8B-Instruct (4-bit) Richer LLM-based CTR scoring ❌ Optional (USE_LLM_SIMULATOR=1)

When the LLM simulator is active: final_ctr = 0.60 Γ— llm_ctr + 0.40 Γ— semantic_ctr


Setup & Usage

Prerequisites

  • Python 3.10+
  • Docker (for containerised execution)

Local Development

pip install -r requirements.txt
python -c "from environment import OpenEnvAuctioneer; e = OpenEnvAuctioneer(); print(e.reset())"

Docker Build & Run

# Build the image
docker build -t openenv-auctioneer .

# Run the FastAPI server (default)
docker run --rm -p 7860:7860 openenv-auctioneer

# Run inference directly inside the container
docker run --rm \
  -e HF_TOKEN=<your_key> \
  openenv-auctioneer python inference.py

Inference Script

# Build image first, then run inference
docker build -t openenv-auctioneer .

LOCAL_IMAGE_NAME=openenv-auctioneer \
HF_TOKEN=<your_key> \
python inference.py

The inference script emits standardised [START]/[STEP]/[END] logs to stdout.

Environment Variables

Variable Required Description
HF_TOKEN Yes (inference) API key for the LLM service
API_BASE_URL No LLM endpoint (default: HuggingFace router)
MODEL_NAME No Model identifier (default: Qwen/Qwen2.5-72B-Instruct)
LOCAL_IMAGE_NAME Yes (inference) Docker image name
AUCTIONEER_TASK No Task to run (default: all)
MIND_SOURCE No local / huggingface / azure
COCO_SOURCE No local / url (auto-download COCO annotations)
USE_LLM_SIMULATOR No Set 1 to enable Llama-3 User Simulator

Baseline Scores (Expected Ranges)

Task Expected Range Notes
easy_headline 0.55 – 0.80 Contextβ†’headline matching is learnable
medium_pacing 0.45 – 0.70 Requires budget discipline
hard_assembly 0.40 – 0.65 Caption quality + hashtag matching + auction wins
hard_sequencing 0.35 – 0.60 Compared against DP oracle

Scores depend on LLM quality and market stochasticity. Run multiple episodes for stable estimates.


Project Structure

β”œβ”€β”€ models.py          # Pydantic models: Action, Observation, Reward, Info
β”œβ”€β”€ environment.py     # OpenEnvAuctioneer + graders + dataset layers
β”‚   β”œβ”€β”€ MINDLoader          # MIND dataset loader (HF / Azure / local)
β”‚   β”œβ”€β”€ MarketCalibrator    # iPinYou-based auction price simulator
β”‚   β”œβ”€β”€ CTRCalibrator       # MIND-based CTR lookup tables
β”‚   β”œβ”€β”€ MINDCreativePool    # 6-slot headline/creative catalog from news.tsv
β”‚   β”œβ”€β”€ PersonaBank         # Vogue Dialogue persona sampling
β”‚   β”œβ”€β”€ ViralHashtagScraper # Live hashtag scraping (pytrends + Reddit)
β”‚   β”œβ”€β”€ AdCaptionDataset    # COCO-based ad image+caption pool
β”‚   β”œβ”€β”€ UserSimulator       # Semantic + optional LLM CTR scoring
β”‚   β”œβ”€β”€ EasyHeadlineGrader  # Level 1 grader
β”‚   β”œβ”€β”€ MediumPacingGrader  # Level 2 grader
β”‚   β”œβ”€β”€ HardAssemblyGrader  # Level 3 grader (4-axis composite)
β”‚   β”œβ”€β”€ HardSequencingGrader# Level 4 grader (DP oracle)
β”‚   └── OpenEnvAuctioneer   # Main Gym-style env class
β”œβ”€β”€ app.py             # FastAPI server (runs inside Docker)
β”œβ”€β”€ inference.py       # Baseline inference script (mandatory format)
β”œβ”€β”€ openenv.yaml       # OpenEnv metadata & task definitions
β”œβ”€β”€ Dockerfile         # Container build
β”œβ”€β”€ requirements.txt   # Python dependencies
β”œβ”€β”€ test_sequencing.py # Unit tests for DP oracle grader
└── Datasets/          # Optional dataset mount point

References

  1. MIND: Wu et al. (2020) β€” "MIND: A Large-scale Dataset for News Recommendation", ACL 2020. msnews.github.io
  2. iPinYou RTB: Zhang et al. (2014) β€” "Real-Time Bidding Benchmarking with iPinYou Dataset". contest.ipinyou.com
  3. MS-COCO Captions: Lin et al. (2014) β€” "Microsoft COCO: Common Objects in Context". cocodataset.org
  4. SentenceTransformers: Reimers & Gurevych (2019) β€” "Sentence-BERT". sbert.net

License

MIT