Spaces:
Sleeping
title: OpenEnv Creative Auctioneer
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
tags:
- openenv
OpenEnv Creative Auctioneer
A privacy-native real-time bidding (RTB) ad auction environment where an RL agent acts as an autonomous Account Manager β navigating a 24-hour campaign cycle, selecting ad creatives, pacing budgets, and assembling viral captions to maximise Return on Ad Spend (ROAS) β all without individual user identifiers.
Motivation
Programmatic advertising is a $500 B+ industry where split-second bidding decisions determine campaign success. Existing RL benchmarks either use toy grid-worlds or require proprietary data. OpenEnv-Auctioneer fills this gap with a fully open, dataset-calibrated simulation grounded in:
| Dataset | Role |
|---|---|
| MIND (Microsoft News) | CTR calibration + headline catalog |
| iPinYou RTB | Competitor bid distributions (Lognormal/hour) |
| Vogue Dialogue | User persona bank |
| MS-COCO Captions 2017 | Ad + caption pool for hard_assembly |
| Google Trends / Reddit | Live viral hashtag scraping |
All datasets are optional β the environment falls back to published statistics so it runs out-of-the-box with zero downloads.
Action Space
class Action(BaseModel):
bid_price: float # USD bid for the RTB auction (β₯ 0)
headline_id: int # Index into the 6-slot headlines catalog (0β5)
creative_id: int # Index into the 6-slot creatives catalog (0β5)
generated_caption: str | None # [hard_assembly] Rewritten caption with viral hashtags
generated_hashtags: list[str] | None # [hard_assembly] Chosen hashtags (e.g. ["#QuietLuxury", "#OOTD"])
Observation Space
class Observation(BaseModel):
hour_of_day: int # Current hour (0β23)
remaining_budget: float # Remaining budget in USD
spent_so_far: float # Cumulative spend
current_context: str # "Fitness" | "Tech" | "Fashion" | "Gaming"
news_category: str # Fine-grained MIND subcategory
viral_trend: str # Current cultural trend token
market_pressure: float # Auction competitiveness [0, 1]
ads_shown_this_session: int
fatigue_level: float # User fatigue [0, 1]
carryover_boost: float # Brand-recall CTR boost [0, 0.30]
last_ctr: float # Previous step CTR
cumulative_revenue: float # Total revenue earned
# hard_assembly only:
live_hashtags: list[str] # Real-time scraped viral hashtags
image_description: str # Source ad image description
base_caption: str # Base caption to rewrite
Reward Signal
| Outcome | Reward |
|---|---|
| Auction won | adjusted_ctr Γ $15 β clearing_price |
| Auction lost | β$0.10 (missed opportunity) |
| Over-pacing (medium only) | β$1.00 penalty |
| Assembly bonus (hard_assembly) | +composite_score Γ $8.00 |
Rewards are per-step (not sparse), providing continuous gradient signal.
Tasks
Level 1 β easy_headline (Easy)
Objective: Select the headline with the highest CTR for each context.
Budget: $100 | Grader: mean(CTR_selected / CTR_oracle) | Target: 0.75
Level 2 β medium_pacing (Medium)
Objective: Pace $50 across 24 hours; retain β₯ 20% for peak hours (18β22).
Budget: $50 | Grader: 0.3Γsmoothness + 0.3Γpeak_survival + 0.4Γrevenue | Target: 0.70
Level 3 β hard_assembly (Hard)
Objective: Given an ad image description + base caption + live viral hashtags, generate a new caption that is simultaneously viral, coherent with the image, and creatively novel β while also winning auctions profitably.
Budget: $120 | Target: 0.65
The RL loop (what the LLM agent does each step):
1. Agent receives: image_description, base_caption, live_hashtags[], viral_trend
2. Agent must:
a. Select 2β4 relevant hashtags from live_hashtags (scraped from Google Trends / Reddit)
b. Rewrite the base caption to weave those hashtags into natural ad copy
c. Add its own creative words (target 30β50% novel vocabulary)
d. Keep the caption coherent with the source image
e. Set a profitable bid price
3. Grader scores the assembled caption on 4 axes:
β’ 35% β Hashtag relevance (cosine_sim of each hashtag vs viral_trend)
β’ 35% β Caption-trend alignment (cosine_sim of caption vs viral_trend)
β’ 20% β Caption-image coherence (cosine_sim of caption vs image_description)
β’ 10% β Novelty (fraction of new words vs base_caption, target ~40%)
4. Reward = auction_reward + composite_score Γ $8.00 bonus
Data sources for hard_assembly:
- Ad creatives: MS-COCO Captions 2017 (val annotations) bucketed into Fitness/Tech/Fashion/Gaming by keyword matching. Falls back to 30-entry built-in seed pool.
- Viral hashtags:
ViralHashtagScraperqueries Google Trends (viapytrends) and Reddit/r/popular/hot.json(public, no auth). Blends with static seed hashtags per context and trend. Cached for 1 hour.
Level 4 β hard_sequencing (Hard)
Objective: Plan 24-hour ad placements with carry-over brand-recall boosts.
Winning triggers +15%/+10%/+5% CTR for the next 3 hours. Cover β₯ 3 contexts for
a 20% diversity bonus.
Budget: $100 | Grader: min(1.0, agent_conv/oracle_conv Γ diversity_mult) | Target: 0.60
Grading Details
EasyHeadlineGrader
step_score = CTR_selected / CTR_oracle
final_score = mean(step_scores) // [0.0, 1.0]
MediumPacingGrader
smoothness = 1 β mean(|hourly_spend β ideal_spend| / ideal_spend)
peak_survival = 1.0 if remaining_budget β₯ 20% at hour 18, else 0.0
revenue_factor = min(1.0, total_revenue / $30)
final_score = 0.30 Γ smoothness + 0.30 Γ peak_survival + 0.40 Γ revenue_factor
HardAssemblyGrader β 4-Axis Composite
| Axis | Weight | Metric |
|---|---|---|
| Hashtag Relevance | 0.35 | mean(cosine_sim(hashtag, viral_trend)) |
| Caption-Trend Alignment | 0.35 | cosine_sim(caption, viral_trend) |
| Caption-Image Coherence | 0.20 | cosine_sim(caption, image_description) |
| Novelty | 0.10 | `1 β |
composite = Ξ£ (weight Γ axis_score)
final_score = 0.60 Γ mean(composite_scores)
+ 0.40 Γ min(1.0, total_revenue / $55)
HardSequencingGrader
agent_conversions = Ξ£ [CTR_t Γ (1 + carryover_boost_t) Γ $15]
oracle_conversions = DP-optimal bid/skip sequence with carry-over
diversity_mult = 1.20 if β₯3 distinct contexts won, else 1.0
final_score = min(1.0, agent_conv / oracle_conv Γ diversity_mult)
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpenEnvAuctioneer (Gym-style environment) β
β β
β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β β Market Engine β β User Simulator β β
β β (Statistical) β β (Semantic / LLM) β β
β β β β β β
β β iPinYou RTB logs β β SentenceTransformer β β
β β β Lognormal per β β all-MiniLM-L6-v2 β β
β β hour bucket β β + optional Llama-3-8B β β
β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β MIND Dataset Layer (Microsoft News Dataset) β β
β β behaviours.tsv β CTRCalibrator β β
β β news.tsv β MINDCreativePool (headlines) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Ad + Caption Dataset (MS-COCO Captions 2017) β β
β β β image_description + base_caption per step β β
β β β ViralHashtagScraper (pytrends + Reddit + seeds) β β
β β β agent rewrites caption with viral hashtags β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Grader (task-specific, deterministic 0.0β1.0) β β
β β Level 1: easy_headline β headline CTR lookup β β
β β Level 2: medium_pacing β pacing + survival β β
β β Level 3: hard_assembly β 4-axis composite score β β
β β Level 4: hard_sequencingβ DP oracle comparison β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Models
| Model | Role | Always Active? |
|---|---|---|
all-MiniLM-L6-v2 (SentenceTransformer) |
Semantic CTR scoring + grader cosine similarity | β Yes |
Meta-Llama-3-8B-Instruct (4-bit) |
Richer LLM-based CTR scoring | β Optional (USE_LLM_SIMULATOR=1) |
When the LLM simulator is active: final_ctr = 0.60 Γ llm_ctr + 0.40 Γ semantic_ctr
Setup & Usage
Prerequisites
- Python 3.10+
- Docker (for containerised execution)
Local Development
pip install -r requirements.txt
python -c "from environment import OpenEnvAuctioneer; e = OpenEnvAuctioneer(); print(e.reset())"
Docker Build & Run
# Build the image
docker build -t openenv-auctioneer .
# Run the FastAPI server (default)
docker run --rm -p 7860:7860 openenv-auctioneer
# Run inference directly inside the container
docker run --rm \
-e HF_TOKEN=<your_key> \
openenv-auctioneer python inference.py
Inference Script
# Build image first, then run inference
docker build -t openenv-auctioneer .
LOCAL_IMAGE_NAME=openenv-auctioneer \
HF_TOKEN=<your_key> \
python inference.py
The inference script emits standardised [START]/[STEP]/[END] logs to stdout.
Environment Variables
| Variable | Required | Description |
|---|---|---|
HF_TOKEN |
Yes (inference) | API key for the LLM service |
API_BASE_URL |
No | LLM endpoint (default: HuggingFace router) |
MODEL_NAME |
No | Model identifier (default: Qwen/Qwen2.5-72B-Instruct) |
LOCAL_IMAGE_NAME |
Yes (inference) | Docker image name |
AUCTIONEER_TASK |
No | Task to run (default: all) |
MIND_SOURCE |
No | local / huggingface / azure |
COCO_SOURCE |
No | local / url (auto-download COCO annotations) |
USE_LLM_SIMULATOR |
No | Set 1 to enable Llama-3 User Simulator |
Baseline Scores (Expected Ranges)
| Task | Expected Range | Notes |
|---|---|---|
easy_headline |
0.55 β 0.80 | Contextβheadline matching is learnable |
medium_pacing |
0.45 β 0.70 | Requires budget discipline |
hard_assembly |
0.40 β 0.65 | Caption quality + hashtag matching + auction wins |
hard_sequencing |
0.35 β 0.60 | Compared against DP oracle |
Scores depend on LLM quality and market stochasticity. Run multiple episodes for stable estimates.
Project Structure
βββ models.py # Pydantic models: Action, Observation, Reward, Info
βββ environment.py # OpenEnvAuctioneer + graders + dataset layers
β βββ MINDLoader # MIND dataset loader (HF / Azure / local)
β βββ MarketCalibrator # iPinYou-based auction price simulator
β βββ CTRCalibrator # MIND-based CTR lookup tables
β βββ MINDCreativePool # 6-slot headline/creative catalog from news.tsv
β βββ PersonaBank # Vogue Dialogue persona sampling
β βββ ViralHashtagScraper # Live hashtag scraping (pytrends + Reddit)
β βββ AdCaptionDataset # COCO-based ad image+caption pool
β βββ UserSimulator # Semantic + optional LLM CTR scoring
β βββ EasyHeadlineGrader # Level 1 grader
β βββ MediumPacingGrader # Level 2 grader
β βββ HardAssemblyGrader # Level 3 grader (4-axis composite)
β βββ HardSequencingGrader# Level 4 grader (DP oracle)
β βββ OpenEnvAuctioneer # Main Gym-style env class
βββ app.py # FastAPI server (runs inside Docker)
βββ inference.py # Baseline inference script (mandatory format)
βββ openenv.yaml # OpenEnv metadata & task definitions
βββ Dockerfile # Container build
βββ requirements.txt # Python dependencies
βββ test_sequencing.py # Unit tests for DP oracle grader
βββ Datasets/ # Optional dataset mount point
References
- MIND: Wu et al. (2020) β "MIND: A Large-scale Dataset for News Recommendation", ACL 2020. msnews.github.io
- iPinYou RTB: Zhang et al. (2014) β "Real-Time Bidding Benchmarking with iPinYou Dataset". contest.ipinyou.com
- MS-COCO Captions: Lin et al. (2014) β "Microsoft COCO: Common Objects in Context". cocodataset.org
- SentenceTransformers: Reimers & Gurevych (2019) β "Sentence-BERT". sbert.net
License
MIT