OpenEnv-Auctioneer / README.md
Preethika
MM : Readme clean
445a7b6
---
title: OpenEnv Creative Auctioneer
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
tags:
- openenv
---
# OpenEnv Creative Auctioneer
A **privacy-native real-time bidding (RTB) ad auction** environment where an RL
agent acts as an autonomous Account Manager β€” navigating a 24-hour campaign
cycle, selecting ad creatives, pacing budgets, and assembling viral captions to
maximise Return on Ad Spend (ROAS) β€” all **without individual user identifiers**.
## Motivation
Programmatic advertising is a \$500 B+ industry where split-second bidding
decisions determine campaign success. Existing RL benchmarks either use toy
grid-worlds or require proprietary data. **OpenEnv-Auctioneer** fills this gap
with a fully open, dataset-calibrated simulation grounded in:
| Dataset | Role |
|---------|------|
| [MIND](https://msnews.github.io/) (Microsoft News) | CTR calibration + headline catalog |
| [iPinYou RTB](https://contest.ipinyou.com/) | Competitor bid distributions (Lognormal/hour) |
| [Vogue Dialogue](https://github.com/aimagelab/Vogue-Dialogue) | User persona bank |
| [MS-COCO Captions 2017](https://cocodataset.org/) | Ad + caption pool for `hard_assembly` |
| [Google Trends](https://github.com/GeneralMills/pytrends) / [Reddit](https://www.reddit.com/) | Live viral hashtag scraping |
All datasets are **optional** β€” the environment falls back to published
statistics so it runs out-of-the-box with zero downloads.
---
## Action Space
```python
class Action(BaseModel):
bid_price: float # USD bid for the RTB auction (β‰₯ 0)
headline_id: int # Index into the 6-slot headlines catalog (0–5)
creative_id: int # Index into the 6-slot creatives catalog (0–5)
generated_caption: str | None # [hard_assembly] Rewritten caption with viral hashtags
generated_hashtags: list[str] | None # [hard_assembly] Chosen hashtags (e.g. ["#QuietLuxury", "#OOTD"])
```
## Observation Space
```python
class Observation(BaseModel):
hour_of_day: int # Current hour (0–23)
remaining_budget: float # Remaining budget in USD
spent_so_far: float # Cumulative spend
current_context: str # "Fitness" | "Tech" | "Fashion" | "Gaming"
news_category: str # Fine-grained MIND subcategory
viral_trend: str # Current cultural trend token
market_pressure: float # Auction competitiveness [0, 1]
ads_shown_this_session: int
fatigue_level: float # User fatigue [0, 1]
carryover_boost: float # Brand-recall CTR boost [0, 0.30]
last_ctr: float # Previous step CTR
cumulative_revenue: float # Total revenue earned
# hard_assembly only:
live_hashtags: list[str] # Real-time scraped viral hashtags
image_description: str # Source ad image description
base_caption: str # Base caption to rewrite
```
## Reward Signal
| Outcome | Reward |
|---------|--------|
| Auction **won** | `adjusted_ctr Γ— $15 βˆ’ clearing_price` |
| Auction **lost** | `βˆ’$0.10` (missed opportunity) |
| Over-pacing (medium only) | `βˆ’$1.00` penalty |
| Assembly bonus (hard_assembly) | `+composite_score Γ— $8.00` |
Rewards are **per-step** (not sparse), providing continuous gradient signal.
---
## Tasks
### Level 1 β€” `easy_headline` (Easy)
**Objective:** Select the headline with the highest CTR for each context.
**Budget:** $100 | **Grader:** `mean(CTR_selected / CTR_oracle)` | **Target:** 0.75
### Level 2 β€” `medium_pacing` (Medium)
**Objective:** Pace $50 across 24 hours; retain β‰₯ 20% for peak hours (18–22).
**Budget:** $50 | **Grader:** `0.3Γ—smoothness + 0.3Γ—peak_survival + 0.4Γ—revenue` | **Target:** 0.70
### Level 3 β€” `hard_assembly` (Hard)
**Objective:** Given an ad image description + base caption + live viral hashtags,
**generate a new caption** that is simultaneously viral, coherent with the image,
and creatively novel β€” while also winning auctions profitably.
**Budget:** $120 | **Target:** 0.65
**The RL loop (what the LLM agent does each step):**
```
1. Agent receives: image_description, base_caption, live_hashtags[], viral_trend
2. Agent must:
a. Select 2–4 relevant hashtags from live_hashtags (scraped from Google Trends / Reddit)
b. Rewrite the base caption to weave those hashtags into natural ad copy
c. Add its own creative words (target 30–50% novel vocabulary)
d. Keep the caption coherent with the source image
e. Set a profitable bid price
3. Grader scores the assembled caption on 4 axes:
β€’ 35% β€” Hashtag relevance (cosine_sim of each hashtag vs viral_trend)
β€’ 35% β€” Caption-trend alignment (cosine_sim of caption vs viral_trend)
β€’ 20% β€” Caption-image coherence (cosine_sim of caption vs image_description)
β€’ 10% β€” Novelty (fraction of new words vs base_caption, target ~40%)
4. Reward = auction_reward + composite_score Γ— $8.00 bonus
```
**Data sources for hard_assembly:**
- **Ad creatives**: MS-COCO Captions 2017 (val annotations) bucketed into Fitness/Tech/Fashion/Gaming by keyword matching. Falls back to 30-entry built-in seed pool.
- **Viral hashtags**: `ViralHashtagScraper` queries Google Trends (via `pytrends`) and Reddit `/r/popular/hot.json` (public, no auth). Blends with static seed hashtags per context and trend. Cached for 1 hour.
### Level 4 β€” `hard_sequencing` (Hard)
**Objective:** Plan 24-hour ad placements with carry-over brand-recall boosts.
Winning triggers +15%/+10%/+5% CTR for the next 3 hours. Cover β‰₯ 3 contexts for
a 20% diversity bonus.
**Budget:** $100 | **Grader:** `min(1.0, agent_conv/oracle_conv Γ— diversity_mult)` | **Target:** 0.60
---
## Grading Details
### `EasyHeadlineGrader`
```
step_score = CTR_selected / CTR_oracle
final_score = mean(step_scores) // [0.0, 1.0]
```
### `MediumPacingGrader`
```
smoothness = 1 βˆ’ mean(|hourly_spend βˆ’ ideal_spend| / ideal_spend)
peak_survival = 1.0 if remaining_budget β‰₯ 20% at hour 18, else 0.0
revenue_factor = min(1.0, total_revenue / $30)
final_score = 0.30 Γ— smoothness + 0.30 Γ— peak_survival + 0.40 Γ— revenue_factor
```
### `HardAssemblyGrader` β€” 4-Axis Composite
| Axis | Weight | Metric |
|------|--------|--------|
| Hashtag Relevance | 0.35 | `mean(cosine_sim(hashtag, viral_trend))` |
| Caption-Trend Alignment | 0.35 | `cosine_sim(caption, viral_trend)` |
| Caption-Image Coherence | 0.20 | `cosine_sim(caption, image_description)` |
| Novelty | 0.10 | `1 βˆ’ |novel_fraction βˆ’ 0.40| / 0.60` |
```
composite = Ξ£ (weight Γ— axis_score)
final_score = 0.60 Γ— mean(composite_scores)
+ 0.40 Γ— min(1.0, total_revenue / $55)
```
### `HardSequencingGrader`
```
agent_conversions = Ξ£ [CTR_t Γ— (1 + carryover_boost_t) Γ— $15]
oracle_conversions = DP-optimal bid/skip sequence with carry-over
diversity_mult = 1.20 if β‰₯3 distinct contexts won, else 1.0
final_score = min(1.0, agent_conv / oracle_conv Γ— diversity_mult)
```
---
## Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ OpenEnvAuctioneer (Gym-style environment) β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Market Engine β”‚ β”‚ User Simulator β”‚ β”‚
β”‚ β”‚ (Statistical) β”‚ β”‚ (Semantic / LLM) β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ iPinYou RTB logs β”‚ β”‚ SentenceTransformer β”‚ β”‚
β”‚ β”‚ β†’ Lognormal per β”‚ β”‚ all-MiniLM-L6-v2 β”‚ β”‚
β”‚ β”‚ hour bucket β”‚ β”‚ + optional Llama-3-8B β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ MIND Dataset Layer (Microsoft News Dataset) β”‚ β”‚
β”‚ β”‚ behaviours.tsv β†’ CTRCalibrator β”‚ β”‚
β”‚ β”‚ news.tsv β†’ MINDCreativePool (headlines) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Ad + Caption Dataset (MS-COCO Captions 2017) β”‚ β”‚
β”‚ β”‚ β†’ image_description + base_caption per step β”‚ β”‚
β”‚ β”‚ β†’ ViralHashtagScraper (pytrends + Reddit + seeds) β”‚ β”‚
β”‚ β”‚ β†’ agent rewrites caption with viral hashtags β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Grader (task-specific, deterministic 0.0–1.0) β”‚ β”‚
β”‚ β”‚ Level 1: easy_headline β†’ headline CTR lookup β”‚ β”‚
β”‚ β”‚ Level 2: medium_pacing β†’ pacing + survival β”‚ β”‚
β”‚ β”‚ Level 3: hard_assembly β†’ 4-axis composite score β”‚ β”‚
│ │ Level 4: hard_sequencing→ DP oracle comparison │ │
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## Models
| Model | Role | Always Active? |
|-------|------|----------------|
| `all-MiniLM-L6-v2` (SentenceTransformer) | Semantic CTR scoring + grader cosine similarity | βœ… Yes |
| `Meta-Llama-3-8B-Instruct` (4-bit) | Richer LLM-based CTR scoring | ❌ Optional (`USE_LLM_SIMULATOR=1`) |
When the LLM simulator is active: `final_ctr = 0.60 Γ— llm_ctr + 0.40 Γ— semantic_ctr`
---
## Setup & Usage
### Prerequisites
- Python 3.10+
- Docker (for containerised execution)
### Local Development
```bash
pip install -r requirements.txt
python -c "from environment import OpenEnvAuctioneer; e = OpenEnvAuctioneer(); print(e.reset())"
```
### Docker Build & Run
```bash
# Build the image
docker build -t openenv-auctioneer .
# Run the FastAPI server (default)
docker run --rm -p 7860:7860 openenv-auctioneer
# Run inference directly inside the container
docker run --rm \
-e HF_TOKEN=<your_key> \
openenv-auctioneer python inference.py
```
### Inference Script
```bash
# Build image first, then run inference
docker build -t openenv-auctioneer .
LOCAL_IMAGE_NAME=openenv-auctioneer \
HF_TOKEN=<your_key> \
python inference.py
```
The inference script emits standardised `[START]`/`[STEP]`/`[END]` logs to stdout.
### Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `HF_TOKEN` | Yes (inference) | API key for the LLM service |
| `API_BASE_URL` | No | LLM endpoint (default: HuggingFace router) |
| `MODEL_NAME` | No | Model identifier (default: Qwen/Qwen2.5-72B-Instruct) |
| `LOCAL_IMAGE_NAME` | Yes (inference) | Docker image name |
| `AUCTIONEER_TASK` | No | Task to run (default: `all`) |
| `MIND_SOURCE` | No | `local` / `huggingface` / `azure` |
| `COCO_SOURCE` | No | `local` / `url` (auto-download COCO annotations) |
| `USE_LLM_SIMULATOR` | No | Set `1` to enable Llama-3 User Simulator |
---
## Baseline Scores (Expected Ranges)
| Task | Expected Range | Notes |
|------|---------------|-------|
| `easy_headline` | 0.55 – 0.80 | Contextβ†’headline matching is learnable |
| `medium_pacing` | 0.45 – 0.70 | Requires budget discipline |
| `hard_assembly` | 0.40 – 0.65 | Caption quality + hashtag matching + auction wins |
| `hard_sequencing` | 0.35 – 0.60 | Compared against DP oracle |
Scores depend on LLM quality and market stochasticity. Run multiple episodes
for stable estimates.
---
## Project Structure
```
β”œβ”€β”€ models.py # Pydantic models: Action, Observation, Reward, Info
β”œβ”€β”€ environment.py # OpenEnvAuctioneer + graders + dataset layers
β”‚ β”œβ”€β”€ MINDLoader # MIND dataset loader (HF / Azure / local)
β”‚ β”œβ”€β”€ MarketCalibrator # iPinYou-based auction price simulator
β”‚ β”œβ”€β”€ CTRCalibrator # MIND-based CTR lookup tables
β”‚ β”œβ”€β”€ MINDCreativePool # 6-slot headline/creative catalog from news.tsv
β”‚ β”œβ”€β”€ PersonaBank # Vogue Dialogue persona sampling
β”‚ β”œβ”€β”€ ViralHashtagScraper # Live hashtag scraping (pytrends + Reddit)
β”‚ β”œβ”€β”€ AdCaptionDataset # COCO-based ad image+caption pool
β”‚ β”œβ”€β”€ UserSimulator # Semantic + optional LLM CTR scoring
β”‚ β”œβ”€β”€ EasyHeadlineGrader # Level 1 grader
β”‚ β”œβ”€β”€ MediumPacingGrader # Level 2 grader
β”‚ β”œβ”€β”€ HardAssemblyGrader # Level 3 grader (4-axis composite)
β”‚ β”œβ”€β”€ HardSequencingGrader# Level 4 grader (DP oracle)
β”‚ └── OpenEnvAuctioneer # Main Gym-style env class
β”œβ”€β”€ app.py # FastAPI server (runs inside Docker)
β”œβ”€β”€ inference.py # Baseline inference script (mandatory format)
β”œβ”€β”€ openenv.yaml # OpenEnv metadata & task definitions
β”œβ”€β”€ Dockerfile # Container build
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ test_sequencing.py # Unit tests for DP oracle grader
└── Datasets/ # Optional dataset mount point
```
## References
1. **MIND**: Wu et al. (2020) β€” *"MIND: A Large-scale Dataset for News Recommendation"*, ACL 2020. [msnews.github.io](https://msnews.github.io/)
2. **iPinYou RTB**: Zhang et al. (2014) β€” *"Real-Time Bidding Benchmarking with iPinYou Dataset"*. [contest.ipinyou.com](https://contest.ipinyou.com/)
3. **MS-COCO Captions**: Lin et al. (2014) β€” *"Microsoft COCO: Common Objects in Context"*. [cocodataset.org](https://cocodataset.org/)
4. **SentenceTransformers**: Reimers & Gurevych (2019) β€” *"Sentence-BERT"*. [sbert.net](https://www.sbert.net/)
## License
MIT