--- title: OpenEnv Creative Auctioneer colorFrom: purple colorTo: blue sdk: docker app_port: 7860 tags: - openenv --- # OpenEnv Creative Auctioneer A **privacy-native real-time bidding (RTB) ad auction** environment where an RL agent acts as an autonomous Account Manager — navigating a 24-hour campaign cycle, selecting ad creatives, pacing budgets, and assembling viral captions to maximise Return on Ad Spend (ROAS) — all **without individual user identifiers**. ## Motivation Programmatic advertising is a \$500 B+ industry where split-second bidding decisions determine campaign success. Existing RL benchmarks either use toy grid-worlds or require proprietary data. **OpenEnv-Auctioneer** fills this gap with a fully open, dataset-calibrated simulation grounded in: | Dataset | Role | |---------|------| | [MIND](https://msnews.github.io/) (Microsoft News) | CTR calibration + headline catalog | | [iPinYou RTB](https://contest.ipinyou.com/) | Competitor bid distributions (Lognormal/hour) | | [Vogue Dialogue](https://github.com/aimagelab/Vogue-Dialogue) | User persona bank | | [MS-COCO Captions 2017](https://cocodataset.org/) | Ad + caption pool for `hard_assembly` | | [Google Trends](https://github.com/GeneralMills/pytrends) / [Reddit](https://www.reddit.com/) | Live viral hashtag scraping | All datasets are **optional** — the environment falls back to published statistics so it runs out-of-the-box with zero downloads. --- ## Action Space ```python class Action(BaseModel): bid_price: float # USD bid for the RTB auction (≥ 0) headline_id: int # Index into the 6-slot headlines catalog (0–5) creative_id: int # Index into the 6-slot creatives catalog (0–5) generated_caption: str | None # [hard_assembly] Rewritten caption with viral hashtags generated_hashtags: list[str] | None # [hard_assembly] Chosen hashtags (e.g. ["#QuietLuxury", "#OOTD"]) ``` ## Observation Space ```python class Observation(BaseModel): hour_of_day: int # Current hour (0–23) remaining_budget: float # Remaining budget in USD spent_so_far: float # Cumulative spend current_context: str # "Fitness" | "Tech" | "Fashion" | "Gaming" news_category: str # Fine-grained MIND subcategory viral_trend: str # Current cultural trend token market_pressure: float # Auction competitiveness [0, 1] ads_shown_this_session: int fatigue_level: float # User fatigue [0, 1] carryover_boost: float # Brand-recall CTR boost [0, 0.30] last_ctr: float # Previous step CTR cumulative_revenue: float # Total revenue earned # hard_assembly only: live_hashtags: list[str] # Real-time scraped viral hashtags image_description: str # Source ad image description base_caption: str # Base caption to rewrite ``` ## Reward Signal | Outcome | Reward | |---------|--------| | Auction **won** | `adjusted_ctr × $15 − clearing_price` | | Auction **lost** | `−$0.10` (missed opportunity) | | Over-pacing (medium only) | `−$1.00` penalty | | Assembly bonus (hard_assembly) | `+composite_score × $8.00` | Rewards are **per-step** (not sparse), providing continuous gradient signal. --- ## Tasks ### Level 1 — `easy_headline` (Easy) **Objective:** Select the headline with the highest CTR for each context. **Budget:** $100 | **Grader:** `mean(CTR_selected / CTR_oracle)` | **Target:** 0.75 ### Level 2 — `medium_pacing` (Medium) **Objective:** Pace $50 across 24 hours; retain ≥ 20% for peak hours (18–22). **Budget:** $50 | **Grader:** `0.3×smoothness + 0.3×peak_survival + 0.4×revenue` | **Target:** 0.70 ### Level 3 — `hard_assembly` (Hard) **Objective:** Given an ad image description + base caption + live viral hashtags, **generate a new caption** that is simultaneously viral, coherent with the image, and creatively novel — while also winning auctions profitably. **Budget:** $120 | **Target:** 0.65 **The RL loop (what the LLM agent does each step):** ``` 1. Agent receives: image_description, base_caption, live_hashtags[], viral_trend 2. Agent must: a. Select 2–4 relevant hashtags from live_hashtags (scraped from Google Trends / Reddit) b. Rewrite the base caption to weave those hashtags into natural ad copy c. Add its own creative words (target 30–50% novel vocabulary) d. Keep the caption coherent with the source image e. Set a profitable bid price 3. Grader scores the assembled caption on 4 axes: • 35% — Hashtag relevance (cosine_sim of each hashtag vs viral_trend) • 35% — Caption-trend alignment (cosine_sim of caption vs viral_trend) • 20% — Caption-image coherence (cosine_sim of caption vs image_description) • 10% — Novelty (fraction of new words vs base_caption, target ~40%) 4. Reward = auction_reward + composite_score × $8.00 bonus ``` **Data sources for hard_assembly:** - **Ad creatives**: MS-COCO Captions 2017 (val annotations) bucketed into Fitness/Tech/Fashion/Gaming by keyword matching. Falls back to 30-entry built-in seed pool. - **Viral hashtags**: `ViralHashtagScraper` queries Google Trends (via `pytrends`) and Reddit `/r/popular/hot.json` (public, no auth). Blends with static seed hashtags per context and trend. Cached for 1 hour. ### Level 4 — `hard_sequencing` (Hard) **Objective:** Plan 24-hour ad placements with carry-over brand-recall boosts. Winning triggers +15%/+10%/+5% CTR for the next 3 hours. Cover ≥ 3 contexts for a 20% diversity bonus. **Budget:** $100 | **Grader:** `min(1.0, agent_conv/oracle_conv × diversity_mult)` | **Target:** 0.60 --- ## Grading Details ### `EasyHeadlineGrader` ``` step_score = CTR_selected / CTR_oracle final_score = mean(step_scores) // [0.0, 1.0] ``` ### `MediumPacingGrader` ``` smoothness = 1 − mean(|hourly_spend − ideal_spend| / ideal_spend) peak_survival = 1.0 if remaining_budget ≥ 20% at hour 18, else 0.0 revenue_factor = min(1.0, total_revenue / $30) final_score = 0.30 × smoothness + 0.30 × peak_survival + 0.40 × revenue_factor ``` ### `HardAssemblyGrader` — 4-Axis Composite | Axis | Weight | Metric | |------|--------|--------| | Hashtag Relevance | 0.35 | `mean(cosine_sim(hashtag, viral_trend))` | | Caption-Trend Alignment | 0.35 | `cosine_sim(caption, viral_trend)` | | Caption-Image Coherence | 0.20 | `cosine_sim(caption, image_description)` | | Novelty | 0.10 | `1 − |novel_fraction − 0.40| / 0.60` | ``` composite = Σ (weight × axis_score) final_score = 0.60 × mean(composite_scores) + 0.40 × min(1.0, total_revenue / $55) ``` ### `HardSequencingGrader` ``` agent_conversions = Σ [CTR_t × (1 + carryover_boost_t) × $15] oracle_conversions = DP-optimal bid/skip sequence with carry-over diversity_mult = 1.20 if ≥3 distinct contexts won, else 1.0 final_score = min(1.0, agent_conv / oracle_conv × diversity_mult) ``` --- ## Architecture ``` ┌───────────────────────────────────────────────────────────┐ │ OpenEnvAuctioneer (Gym-style environment) │ │ │ │ ┌──────────────────┐ ┌───────────────────────────────┐ │ │ │ Market Engine │ │ User Simulator │ │ │ │ (Statistical) │ │ (Semantic / LLM) │ │ │ │ │ │ │ │ │ │ iPinYou RTB logs │ │ SentenceTransformer │ │ │ │ → Lognormal per │ │ all-MiniLM-L6-v2 │ │ │ │ hour bucket │ │ + optional Llama-3-8B │ │ │ └──────────────────┘ └───────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ MIND Dataset Layer (Microsoft News Dataset) │ │ │ │ behaviours.tsv → CTRCalibrator │ │ │ │ news.tsv → MINDCreativePool (headlines) │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Ad + Caption Dataset (MS-COCO Captions 2017) │ │ │ │ → image_description + base_caption per step │ │ │ │ → ViralHashtagScraper (pytrends + Reddit + seeds) │ │ │ │ → agent rewrites caption with viral hashtags │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Grader (task-specific, deterministic 0.0–1.0) │ │ │ │ Level 1: easy_headline → headline CTR lookup │ │ │ │ Level 2: medium_pacing → pacing + survival │ │ │ │ Level 3: hard_assembly → 4-axis composite score │ │ │ │ Level 4: hard_sequencing→ DP oracle comparison │ │ │ └───────────────────────────────────────────────────────┘ │ └───────────────────────────────────────────────────────────┘ ``` --- ## Models | Model | Role | Always Active? | |-------|------|----------------| | `all-MiniLM-L6-v2` (SentenceTransformer) | Semantic CTR scoring + grader cosine similarity | ✅ Yes | | `Meta-Llama-3-8B-Instruct` (4-bit) | Richer LLM-based CTR scoring | ❌ Optional (`USE_LLM_SIMULATOR=1`) | When the LLM simulator is active: `final_ctr = 0.60 × llm_ctr + 0.40 × semantic_ctr` --- ## Setup & Usage ### Prerequisites - Python 3.10+ - Docker (for containerised execution) ### Local Development ```bash pip install -r requirements.txt python -c "from environment import OpenEnvAuctioneer; e = OpenEnvAuctioneer(); print(e.reset())" ``` ### Docker Build & Run ```bash # Build the image docker build -t openenv-auctioneer . # Run the FastAPI server (default) docker run --rm -p 7860:7860 openenv-auctioneer # Run inference directly inside the container docker run --rm \ -e HF_TOKEN= \ openenv-auctioneer python inference.py ``` ### Inference Script ```bash # Build image first, then run inference docker build -t openenv-auctioneer . LOCAL_IMAGE_NAME=openenv-auctioneer \ HF_TOKEN= \ python inference.py ``` The inference script emits standardised `[START]`/`[STEP]`/`[END]` logs to stdout. ### Environment Variables | Variable | Required | Description | |----------|----------|-------------| | `HF_TOKEN` | Yes (inference) | API key for the LLM service | | `API_BASE_URL` | No | LLM endpoint (default: HuggingFace router) | | `MODEL_NAME` | No | Model identifier (default: Qwen/Qwen2.5-72B-Instruct) | | `LOCAL_IMAGE_NAME` | Yes (inference) | Docker image name | | `AUCTIONEER_TASK` | No | Task to run (default: `all`) | | `MIND_SOURCE` | No | `local` / `huggingface` / `azure` | | `COCO_SOURCE` | No | `local` / `url` (auto-download COCO annotations) | | `USE_LLM_SIMULATOR` | No | Set `1` to enable Llama-3 User Simulator | --- ## Baseline Scores (Expected Ranges) | Task | Expected Range | Notes | |------|---------------|-------| | `easy_headline` | 0.55 – 0.80 | Context→headline matching is learnable | | `medium_pacing` | 0.45 – 0.70 | Requires budget discipline | | `hard_assembly` | 0.40 – 0.65 | Caption quality + hashtag matching + auction wins | | `hard_sequencing` | 0.35 – 0.60 | Compared against DP oracle | Scores depend on LLM quality and market stochasticity. Run multiple episodes for stable estimates. --- ## Project Structure ``` ├── models.py # Pydantic models: Action, Observation, Reward, Info ├── environment.py # OpenEnvAuctioneer + graders + dataset layers │ ├── MINDLoader # MIND dataset loader (HF / Azure / local) │ ├── MarketCalibrator # iPinYou-based auction price simulator │ ├── CTRCalibrator # MIND-based CTR lookup tables │ ├── MINDCreativePool # 6-slot headline/creative catalog from news.tsv │ ├── PersonaBank # Vogue Dialogue persona sampling │ ├── ViralHashtagScraper # Live hashtag scraping (pytrends + Reddit) │ ├── AdCaptionDataset # COCO-based ad image+caption pool │ ├── UserSimulator # Semantic + optional LLM CTR scoring │ ├── EasyHeadlineGrader # Level 1 grader │ ├── MediumPacingGrader # Level 2 grader │ ├── HardAssemblyGrader # Level 3 grader (4-axis composite) │ ├── HardSequencingGrader# Level 4 grader (DP oracle) │ └── OpenEnvAuctioneer # Main Gym-style env class ├── app.py # FastAPI server (runs inside Docker) ├── inference.py # Baseline inference script (mandatory format) ├── openenv.yaml # OpenEnv metadata & task definitions ├── Dockerfile # Container build ├── requirements.txt # Python dependencies ├── test_sequencing.py # Unit tests for DP oracle grader └── Datasets/ # Optional dataset mount point ``` ## References 1. **MIND**: Wu et al. (2020) — *"MIND: A Large-scale Dataset for News Recommendation"*, ACL 2020. [msnews.github.io](https://msnews.github.io/) 2. **iPinYou RTB**: Zhang et al. (2014) — *"Real-Time Bidding Benchmarking with iPinYou Dataset"*. [contest.ipinyou.com](https://contest.ipinyou.com/) 3. **MS-COCO Captions**: Lin et al. (2014) — *"Microsoft COCO: Common Objects in Context"*. [cocodataset.org](https://cocodataset.org/) 4. **SentenceTransformers**: Reimers & Gurevych (2019) — *"Sentence-BERT"*. [sbert.net](https://www.sbert.net/) ## License MIT