Spaces:
Sleeping
Sleeping
| title: OpenEnv Creative Auctioneer | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: docker | |
| app_port: 7860 | |
| tags: | |
| - openenv | |
| # OpenEnv Creative Auctioneer | |
| A **privacy-native real-time bidding (RTB) ad auction** environment where an RL | |
| agent acts as an autonomous Account Manager β navigating a 24-hour campaign | |
| cycle, selecting ad creatives, pacing budgets, and assembling viral captions to | |
| maximise Return on Ad Spend (ROAS) β all **without individual user identifiers**. | |
| ## Motivation | |
| Programmatic advertising is a \$500 B+ industry where split-second bidding | |
| decisions determine campaign success. Existing RL benchmarks either use toy | |
| grid-worlds or require proprietary data. **OpenEnv-Auctioneer** fills this gap | |
| with a fully open, dataset-calibrated simulation grounded in: | |
| | Dataset | Role | | |
| |---------|------| | |
| | [MIND](https://msnews.github.io/) (Microsoft News) | CTR calibration + headline catalog | | |
| | [iPinYou RTB](https://contest.ipinyou.com/) | Competitor bid distributions (Lognormal/hour) | | |
| | [Vogue Dialogue](https://github.com/aimagelab/Vogue-Dialogue) | User persona bank | | |
| | [MS-COCO Captions 2017](https://cocodataset.org/) | Ad + caption pool for `hard_assembly` | | |
| | [Google Trends](https://github.com/GeneralMills/pytrends) / [Reddit](https://www.reddit.com/) | Live viral hashtag scraping | | |
| All datasets are **optional** β the environment falls back to published | |
| statistics so it runs out-of-the-box with zero downloads. | |
| --- | |
| ## Action Space | |
| ```python | |
| class Action(BaseModel): | |
| bid_price: float # USD bid for the RTB auction (β₯ 0) | |
| headline_id: int # Index into the 6-slot headlines catalog (0β5) | |
| creative_id: int # Index into the 6-slot creatives catalog (0β5) | |
| generated_caption: str | None # [hard_assembly] Rewritten caption with viral hashtags | |
| generated_hashtags: list[str] | None # [hard_assembly] Chosen hashtags (e.g. ["#QuietLuxury", "#OOTD"]) | |
| ``` | |
| ## Observation Space | |
| ```python | |
| class Observation(BaseModel): | |
| hour_of_day: int # Current hour (0β23) | |
| remaining_budget: float # Remaining budget in USD | |
| spent_so_far: float # Cumulative spend | |
| current_context: str # "Fitness" | "Tech" | "Fashion" | "Gaming" | |
| news_category: str # Fine-grained MIND subcategory | |
| viral_trend: str # Current cultural trend token | |
| market_pressure: float # Auction competitiveness [0, 1] | |
| ads_shown_this_session: int | |
| fatigue_level: float # User fatigue [0, 1] | |
| carryover_boost: float # Brand-recall CTR boost [0, 0.30] | |
| last_ctr: float # Previous step CTR | |
| cumulative_revenue: float # Total revenue earned | |
| # hard_assembly only: | |
| live_hashtags: list[str] # Real-time scraped viral hashtags | |
| image_description: str # Source ad image description | |
| base_caption: str # Base caption to rewrite | |
| ``` | |
| ## Reward Signal | |
| | Outcome | Reward | | |
| |---------|--------| | |
| | Auction **won** | `adjusted_ctr Γ $15 β clearing_price` | | |
| | Auction **lost** | `β$0.10` (missed opportunity) | | |
| | Over-pacing (medium only) | `β$1.00` penalty | | |
| | Assembly bonus (hard_assembly) | `+composite_score Γ $8.00` | | |
| Rewards are **per-step** (not sparse), providing continuous gradient signal. | |
| --- | |
| ## Tasks | |
| ### Level 1 β `easy_headline` (Easy) | |
| **Objective:** Select the headline with the highest CTR for each context. | |
| **Budget:** $100 | **Grader:** `mean(CTR_selected / CTR_oracle)` | **Target:** 0.75 | |
| ### Level 2 β `medium_pacing` (Medium) | |
| **Objective:** Pace $50 across 24 hours; retain β₯ 20% for peak hours (18β22). | |
| **Budget:** $50 | **Grader:** `0.3Γsmoothness + 0.3Γpeak_survival + 0.4Γrevenue` | **Target:** 0.70 | |
| ### Level 3 β `hard_assembly` (Hard) | |
| **Objective:** Given an ad image description + base caption + live viral hashtags, | |
| **generate a new caption** that is simultaneously viral, coherent with the image, | |
| and creatively novel β while also winning auctions profitably. | |
| **Budget:** $120 | **Target:** 0.65 | |
| **The RL loop (what the LLM agent does each step):** | |
| ``` | |
| 1. Agent receives: image_description, base_caption, live_hashtags[], viral_trend | |
| 2. Agent must: | |
| a. Select 2β4 relevant hashtags from live_hashtags (scraped from Google Trends / Reddit) | |
| b. Rewrite the base caption to weave those hashtags into natural ad copy | |
| c. Add its own creative words (target 30β50% novel vocabulary) | |
| d. Keep the caption coherent with the source image | |
| e. Set a profitable bid price | |
| 3. Grader scores the assembled caption on 4 axes: | |
| β’ 35% β Hashtag relevance (cosine_sim of each hashtag vs viral_trend) | |
| β’ 35% β Caption-trend alignment (cosine_sim of caption vs viral_trend) | |
| β’ 20% β Caption-image coherence (cosine_sim of caption vs image_description) | |
| β’ 10% β Novelty (fraction of new words vs base_caption, target ~40%) | |
| 4. Reward = auction_reward + composite_score Γ $8.00 bonus | |
| ``` | |
| **Data sources for hard_assembly:** | |
| - **Ad creatives**: MS-COCO Captions 2017 (val annotations) bucketed into Fitness/Tech/Fashion/Gaming by keyword matching. Falls back to 30-entry built-in seed pool. | |
| - **Viral hashtags**: `ViralHashtagScraper` queries Google Trends (via `pytrends`) and Reddit `/r/popular/hot.json` (public, no auth). Blends with static seed hashtags per context and trend. Cached for 1 hour. | |
| ### Level 4 β `hard_sequencing` (Hard) | |
| **Objective:** Plan 24-hour ad placements with carry-over brand-recall boosts. | |
| Winning triggers +15%/+10%/+5% CTR for the next 3 hours. Cover β₯ 3 contexts for | |
| a 20% diversity bonus. | |
| **Budget:** $100 | **Grader:** `min(1.0, agent_conv/oracle_conv Γ diversity_mult)` | **Target:** 0.60 | |
| --- | |
| ## Grading Details | |
| ### `EasyHeadlineGrader` | |
| ``` | |
| step_score = CTR_selected / CTR_oracle | |
| final_score = mean(step_scores) // [0.0, 1.0] | |
| ``` | |
| ### `MediumPacingGrader` | |
| ``` | |
| smoothness = 1 β mean(|hourly_spend β ideal_spend| / ideal_spend) | |
| peak_survival = 1.0 if remaining_budget β₯ 20% at hour 18, else 0.0 | |
| revenue_factor = min(1.0, total_revenue / $30) | |
| final_score = 0.30 Γ smoothness + 0.30 Γ peak_survival + 0.40 Γ revenue_factor | |
| ``` | |
| ### `HardAssemblyGrader` β 4-Axis Composite | |
| | Axis | Weight | Metric | | |
| |------|--------|--------| | |
| | Hashtag Relevance | 0.35 | `mean(cosine_sim(hashtag, viral_trend))` | | |
| | Caption-Trend Alignment | 0.35 | `cosine_sim(caption, viral_trend)` | | |
| | Caption-Image Coherence | 0.20 | `cosine_sim(caption, image_description)` | | |
| | Novelty | 0.10 | `1 β |novel_fraction β 0.40| / 0.60` | | |
| ``` | |
| composite = Ξ£ (weight Γ axis_score) | |
| final_score = 0.60 Γ mean(composite_scores) | |
| + 0.40 Γ min(1.0, total_revenue / $55) | |
| ``` | |
| ### `HardSequencingGrader` | |
| ``` | |
| agent_conversions = Ξ£ [CTR_t Γ (1 + carryover_boost_t) Γ $15] | |
| oracle_conversions = DP-optimal bid/skip sequence with carry-over | |
| diversity_mult = 1.20 if β₯3 distinct contexts won, else 1.0 | |
| final_score = min(1.0, agent_conv / oracle_conv Γ diversity_mult) | |
| ``` | |
| --- | |
| ## Architecture | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β OpenEnvAuctioneer (Gym-style environment) β | |
| β β | |
| β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β | |
| β β Market Engine β β User Simulator β β | |
| β β (Statistical) β β (Semantic / LLM) β β | |
| β β β β β β | |
| β β iPinYou RTB logs β β SentenceTransformer β β | |
| β β β Lognormal per β β all-MiniLM-L6-v2 β β | |
| β β hour bucket β β + optional Llama-3-8B β β | |
| β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β | |
| β β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β MIND Dataset Layer (Microsoft News Dataset) β β | |
| β β behaviours.tsv β CTRCalibrator β β | |
| β β news.tsv β MINDCreativePool (headlines) β β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Ad + Caption Dataset (MS-COCO Captions 2017) β β | |
| β β β image_description + base_caption per step β β | |
| β β β ViralHashtagScraper (pytrends + Reddit + seeds) β β | |
| β β β agent rewrites caption with viral hashtags β β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Grader (task-specific, deterministic 0.0β1.0) β β | |
| β β Level 1: easy_headline β headline CTR lookup β β | |
| β β Level 2: medium_pacing β pacing + survival β β | |
| β β Level 3: hard_assembly β 4-axis composite score β β | |
| β β Level 4: hard_sequencingβ DP oracle comparison β β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## Models | |
| | Model | Role | Always Active? | | |
| |-------|------|----------------| | |
| | `all-MiniLM-L6-v2` (SentenceTransformer) | Semantic CTR scoring + grader cosine similarity | β Yes | | |
| | `Meta-Llama-3-8B-Instruct` (4-bit) | Richer LLM-based CTR scoring | β Optional (`USE_LLM_SIMULATOR=1`) | | |
| When the LLM simulator is active: `final_ctr = 0.60 Γ llm_ctr + 0.40 Γ semantic_ctr` | |
| --- | |
| ## Setup & Usage | |
| ### Prerequisites | |
| - Python 3.10+ | |
| - Docker (for containerised execution) | |
| ### Local Development | |
| ```bash | |
| pip install -r requirements.txt | |
| python -c "from environment import OpenEnvAuctioneer; e = OpenEnvAuctioneer(); print(e.reset())" | |
| ``` | |
| ### Docker Build & Run | |
| ```bash | |
| # Build the image | |
| docker build -t openenv-auctioneer . | |
| # Run the FastAPI server (default) | |
| docker run --rm -p 7860:7860 openenv-auctioneer | |
| # Run inference directly inside the container | |
| docker run --rm \ | |
| -e HF_TOKEN=<your_key> \ | |
| openenv-auctioneer python inference.py | |
| ``` | |
| ### Inference Script | |
| ```bash | |
| # Build image first, then run inference | |
| docker build -t openenv-auctioneer . | |
| LOCAL_IMAGE_NAME=openenv-auctioneer \ | |
| HF_TOKEN=<your_key> \ | |
| python inference.py | |
| ``` | |
| The inference script emits standardised `[START]`/`[STEP]`/`[END]` logs to stdout. | |
| ### Environment Variables | |
| | Variable | Required | Description | | |
| |----------|----------|-------------| | |
| | `HF_TOKEN` | Yes (inference) | API key for the LLM service | | |
| | `API_BASE_URL` | No | LLM endpoint (default: HuggingFace router) | | |
| | `MODEL_NAME` | No | Model identifier (default: Qwen/Qwen2.5-72B-Instruct) | | |
| | `LOCAL_IMAGE_NAME` | Yes (inference) | Docker image name | | |
| | `AUCTIONEER_TASK` | No | Task to run (default: `all`) | | |
| | `MIND_SOURCE` | No | `local` / `huggingface` / `azure` | | |
| | `COCO_SOURCE` | No | `local` / `url` (auto-download COCO annotations) | | |
| | `USE_LLM_SIMULATOR` | No | Set `1` to enable Llama-3 User Simulator | | |
| --- | |
| ## Baseline Scores (Expected Ranges) | |
| | Task | Expected Range | Notes | | |
| |------|---------------|-------| | |
| | `easy_headline` | 0.55 β 0.80 | Contextβheadline matching is learnable | | |
| | `medium_pacing` | 0.45 β 0.70 | Requires budget discipline | | |
| | `hard_assembly` | 0.40 β 0.65 | Caption quality + hashtag matching + auction wins | | |
| | `hard_sequencing` | 0.35 β 0.60 | Compared against DP oracle | | |
| Scores depend on LLM quality and market stochasticity. Run multiple episodes | |
| for stable estimates. | |
| --- | |
| ## Project Structure | |
| ``` | |
| βββ models.py # Pydantic models: Action, Observation, Reward, Info | |
| βββ environment.py # OpenEnvAuctioneer + graders + dataset layers | |
| β βββ MINDLoader # MIND dataset loader (HF / Azure / local) | |
| β βββ MarketCalibrator # iPinYou-based auction price simulator | |
| β βββ CTRCalibrator # MIND-based CTR lookup tables | |
| β βββ MINDCreativePool # 6-slot headline/creative catalog from news.tsv | |
| β βββ PersonaBank # Vogue Dialogue persona sampling | |
| β βββ ViralHashtagScraper # Live hashtag scraping (pytrends + Reddit) | |
| β βββ AdCaptionDataset # COCO-based ad image+caption pool | |
| β βββ UserSimulator # Semantic + optional LLM CTR scoring | |
| β βββ EasyHeadlineGrader # Level 1 grader | |
| β βββ MediumPacingGrader # Level 2 grader | |
| β βββ HardAssemblyGrader # Level 3 grader (4-axis composite) | |
| β βββ HardSequencingGrader# Level 4 grader (DP oracle) | |
| β βββ OpenEnvAuctioneer # Main Gym-style env class | |
| βββ app.py # FastAPI server (runs inside Docker) | |
| βββ inference.py # Baseline inference script (mandatory format) | |
| βββ openenv.yaml # OpenEnv metadata & task definitions | |
| βββ Dockerfile # Container build | |
| βββ requirements.txt # Python dependencies | |
| βββ test_sequencing.py # Unit tests for DP oracle grader | |
| βββ Datasets/ # Optional dataset mount point | |
| ``` | |
| ## References | |
| 1. **MIND**: Wu et al. (2020) β *"MIND: A Large-scale Dataset for News Recommendation"*, ACL 2020. [msnews.github.io](https://msnews.github.io/) | |
| 2. **iPinYou RTB**: Zhang et al. (2014) β *"Real-Time Bidding Benchmarking with iPinYou Dataset"*. [contest.ipinyou.com](https://contest.ipinyou.com/) | |
| 3. **MS-COCO Captions**: Lin et al. (2014) β *"Microsoft COCO: Common Objects in Context"*. [cocodataset.org](https://cocodataset.org/) | |
| 4. **SentenceTransformers**: Reimers & Gurevych (2019) β *"Sentence-BERT"*. [sbert.net](https://www.sbert.net/) | |
| ## License | |
| MIT | |