Spaces:
Sleeping
Sleeping
Commit Β·
09e32d2
0
Parent(s):
initial: CDN Cache Optimizer OpenEnv
Browse files- Dockerfile +24 -0
- README.md +165 -0
- api/__init__.py +0 -0
- api/__pycache__/__init__.cpython-312.pyc +0 -0
- api/__pycache__/main.cpython-312.pyc +0 -0
- api/main.py +113 -0
- env/__init__.py +4 -0
- env/__pycache__/__init__.cpython-312.pyc +0 -0
- env/__pycache__/cache.cpython-312.pyc +0 -0
- env/__pycache__/graders.cpython-312.pyc +0 -0
- env/__pycache__/models.cpython-312.pyc +0 -0
- env/__pycache__/traffic.cpython-312.pyc +0 -0
- env/cache.py +266 -0
- env/graders.py +188 -0
- env/models.py +67 -0
- env/traffic.py +119 -0
- inference.py +221 -0
- openenv.yaml +68 -0
- requirements.txt +6 -0
Dockerfile
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.11-slim
|
| 2 |
+
|
| 3 |
+
# HF Spaces expects port 7860
|
| 4 |
+
EXPOSE 7860
|
| 5 |
+
|
| 6 |
+
WORKDIR /app
|
| 7 |
+
|
| 8 |
+
# Install deps
|
| 9 |
+
COPY requirements.txt .
|
| 10 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 11 |
+
|
| 12 |
+
# Copy source
|
| 13 |
+
COPY env/ ./env/
|
| 14 |
+
COPY api/ ./api/
|
| 15 |
+
COPY inference.py .
|
| 16 |
+
COPY openenv.yaml .
|
| 17 |
+
|
| 18 |
+
# Environment variables (override at runtime)
|
| 19 |
+
ENV API_BASE_URL="https://api.openai.com/v1"
|
| 20 |
+
ENV MODEL_NAME="gpt-4o-mini"
|
| 21 |
+
ENV HF_TOKEN=""
|
| 22 |
+
|
| 23 |
+
# Start FastAPI server
|
| 24 |
+
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "7860"]
|
README.md
ADDED
|
@@ -0,0 +1,165 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π CDN Cache Optimizer β OpenEnv RL Environment
|
| 2 |
+
|
| 3 |
+
An RL environment simulating **edge CDN cache management** β the exact problem companies like Meta solve at planetary scale. An agent manages a cache of limited size, deciding which files to evict when new content arrives, balancing **hit rate**, **bandwidth efficiency**, and **thrash avoidance**.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## π― Motivation
|
| 8 |
+
|
| 9 |
+
Content Delivery Networks serve billions of files daily. Edge servers have limited storage, so they must constantly decide: *which cached files to keep, and which to evict?* Standard algorithms like LRU aren't optimal β especially when traffic has **viral bursts** (a file suddenly gets 50x more requests for 20 minutes, then drops back to zero).
|
| 10 |
+
|
| 11 |
+
A smarter agent can:
|
| 12 |
+
- Predict viral spikes from queue previews
|
| 13 |
+
- Avoid evicting high-frequency files
|
| 14 |
+
- Prevent cache thrashing (evicting then immediately re-requesting)
|
| 15 |
+
- Maximize bandwidth saved for users
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## π§ Environment Description
|
| 20 |
+
|
| 21 |
+
At each step, a file is requested from the network. If it's already in the cache β **cache hit** (reward). If not β **cache miss**, and the agent must decide whether to evict an existing file to make room.
|
| 22 |
+
|
| 23 |
+
### Traffic Model
|
| 24 |
+
- **Steady files**: Consistent, cyclical demand
|
| 25 |
+
- **Viral files**: Bell-curve spike in popularity, then fade back to baseline
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## π Action & Observation Space
|
| 30 |
+
|
| 31 |
+
### Observation Space
|
| 32 |
+
| Field | Type | Description |
|
| 33 |
+
|-------|------|-------------|
|
| 34 |
+
| `step` | int | Current episode step |
|
| 35 |
+
| `cache_used_mb` | float | MB currently used |
|
| 36 |
+
| `cache_capacity_mb` | float | Total cache size |
|
| 37 |
+
| `cache_fill_ratio` | float | 0.0β1.0 fill level |
|
| 38 |
+
| `cached_files` | List[FileEntry] | All files in cache with metadata |
|
| 39 |
+
| `incoming_file_id` | str | File being requested |
|
| 40 |
+
| `incoming_file_size_mb` | float | Size of incoming file |
|
| 41 |
+
| `incoming_file_is_viral` | bool | Is this file currently viral? |
|
| 42 |
+
| `cache_hit` | bool | Is incoming file already cached? |
|
| 43 |
+
| `recent_hit_rate` | float | Rolling hit rate (last 20 steps) |
|
| 44 |
+
| `time_of_day` | float | Normalized 0.0β1.0 daily cycle |
|
| 45 |
+
| `queue_preview` | List[str] | Next 3 file IDs (prefetch hint) |
|
| 46 |
+
|
| 47 |
+
### FileEntry Fields
|
| 48 |
+
| Field | Type | Description |
|
| 49 |
+
|-------|------|-------------|
|
| 50 |
+
| `file_id` | str | Unique identifier |
|
| 51 |
+
| `size_mb` | float | File size in MB |
|
| 52 |
+
| `request_frequency` | float | Requests since cached |
|
| 53 |
+
| `is_viral` | bool | Currently viral |
|
| 54 |
+
| `last_accessed` | int | Step number of last access |
|
| 55 |
+
|
| 56 |
+
### Action Space
|
| 57 |
+
| Field | Type | Description |
|
| 58 |
+
|-------|------|-------------|
|
| 59 |
+
| `evict_file_id` | str \| null | File to evict (null = no eviction) |
|
| 60 |
+
|
| 61 |
+
### Reward Function
|
| 62 |
+
| Component | Range | Description |
|
| 63 |
+
|-----------|-------|-------------|
|
| 64 |
+
| `cache_hit_bonus` | +1.0 to +1.5 | Hit reward (viral hits = +1.5) |
|
| 65 |
+
| `bandwidth_saved` | +0.0 to +0.2 | Reward for bandwidth efficiency |
|
| 66 |
+
| `eviction_penalty` | -0.0 to -0.5 | Penalty for evicting popular files |
|
| 67 |
+
| `thrash_penalty` | 0.0 or -0.5 | Penalty for evicting same file twice |
|
| 68 |
+
| `wasted_capacity_penalty` | -0.0 to -0.3 | Penalty for leaving cache empty |
|
| 69 |
+
|
| 70 |
+
---
|
| 71 |
+
|
| 72 |
+
## π Tasks
|
| 73 |
+
|
| 74 |
+
### Task 1: Steady Traffic Cache (Easy)
|
| 75 |
+
- **Cache**: 100MB | **Files**: 30 | **Steps**: 100
|
| 76 |
+
- No viral files β steady demand only
|
| 77 |
+
- Agent learns basic LRU-style eviction
|
| 78 |
+
- **Target hit rate**: β₯ 0.60 β score 1.0
|
| 79 |
+
- **Baseline score**: ~0.75
|
| 80 |
+
|
| 81 |
+
### Task 2: Mixed Traffic Cache (Medium)
|
| 82 |
+
- **Cache**: 80MB | **Files**: 50 | **Steps**: 150
|
| 83 |
+
- 20% viral files mixed with steady demand
|
| 84 |
+
- Agent must handle spikes and prioritize popular content
|
| 85 |
+
- **Score**: 70% hit rate + 30% bandwidth
|
| 86 |
+
- **Baseline score**: ~0.60
|
| 87 |
+
|
| 88 |
+
### Task 3: Constrained Cache with Viral Bursts (Hard)
|
| 89 |
+
- **Cache**: 50MB | **Files**: 80 | **Steps**: 200
|
| 90 |
+
- 35% viral files, tight capacity, large file sizes
|
| 91 |
+
- Agent must predict spikes, avoid thrashing
|
| 92 |
+
- **Score**: 50% hit rate + 25% bandwidth + 25% reward quality
|
| 93 |
+
- **Baseline score**: ~0.45
|
| 94 |
+
|
| 95 |
+
---
|
| 96 |
+
|
| 97 |
+
## π Setup & Usage
|
| 98 |
+
|
| 99 |
+
### Local Setup
|
| 100 |
+
```bash
|
| 101 |
+
git clone <repo>
|
| 102 |
+
cd cdn-cache-env
|
| 103 |
+
pip install -r requirements.txt
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
### Run API Server
|
| 107 |
+
```bash
|
| 108 |
+
uvicorn api.main:app --host 0.0.0.0 --port 7860
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
### Run Inference (Baseline Agent)
|
| 112 |
+
```bash
|
| 113 |
+
export API_BASE_URL="https://api.openai.com/v1"
|
| 114 |
+
export MODEL_NAME="gpt-4o-mini"
|
| 115 |
+
export HF_TOKEN="your_token_here"
|
| 116 |
+
|
| 117 |
+
python inference.py
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
### Docker
|
| 121 |
+
```bash
|
| 122 |
+
docker build -t cdn-cache-env .
|
| 123 |
+
docker run -p 7860:7860 \
|
| 124 |
+
-e API_BASE_URL="https://api.openai.com/v1" \
|
| 125 |
+
-e MODEL_NAME="gpt-4o-mini" \
|
| 126 |
+
-e HF_TOKEN="your_token" \
|
| 127 |
+
cdn-cache-env
|
| 128 |
+
```
|
| 129 |
+
|
| 130 |
+
---
|
| 131 |
+
|
| 132 |
+
## π API Endpoints
|
| 133 |
+
|
| 134 |
+
| Method | Endpoint | Description |
|
| 135 |
+
|--------|----------|-------------|
|
| 136 |
+
| GET | `/health` | Health check (returns 200) |
|
| 137 |
+
| GET | `/tasks` | List all tasks |
|
| 138 |
+
| POST | `/reset` | Start episode `{"task_id": "task_easy", "seed": 42}` |
|
| 139 |
+
| POST | `/step` | Take action `{"evict_file_id": "file_001" or null}` |
|
| 140 |
+
| GET | `/state` | Full environment state |
|
| 141 |
+
|
| 142 |
+
---
|
| 143 |
+
|
| 144 |
+
## π Baseline Scores
|
| 145 |
+
|
| 146 |
+
Using the built-in `smart_policy` (non-LLM baseline):
|
| 147 |
+
|
| 148 |
+
| Task | Hit Rate | Score |
|
| 149 |
+
|------|----------|-------|
|
| 150 |
+
| Easy | ~0.72 | ~1.00 |
|
| 151 |
+
| Medium | ~0.61 | ~0.82 |
|
| 152 |
+
| Hard | ~0.48 | ~0.78 |
|
| 153 |
+
| **Overall** | | **~0.87** |
|
| 154 |
+
|
| 155 |
+
---
|
| 156 |
+
|
| 157 |
+
## π Log Format
|
| 158 |
+
|
| 159 |
+
`inference.py` emits structured JSON logs:
|
| 160 |
+
|
| 161 |
+
```
|
| 162 |
+
{"type": "START", "task_id": "task_easy", ...}
|
| 163 |
+
{"type": "STEP", "step": 0, "action": {...}, "reward": 1.0, ...}
|
| 164 |
+
{"type": "END", "total_reward": 87.3, "final_hit_rate": 0.72, "score": 1.0}
|
| 165 |
+
```
|
api/__init__.py
ADDED
|
File without changes
|
api/__pycache__/__init__.cpython-312.pyc
ADDED
|
Binary file (139 Bytes). View file
|
|
|
api/__pycache__/main.cpython-312.pyc
ADDED
|
Binary file (5.07 kB). View file
|
|
|
api/main.py
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
FastAPI server exposing OpenEnv interface over HTTP.
|
| 3 |
+
Endpoints: POST /reset, POST /step, GET /state, GET /health, GET /tasks
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import sys
|
| 7 |
+
import os
|
| 8 |
+
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
| 9 |
+
|
| 10 |
+
from fastapi import FastAPI, HTTPException
|
| 11 |
+
from fastapi.middleware.cors import CORSMiddleware
|
| 12 |
+
from pydantic import BaseModel
|
| 13 |
+
from typing import Optional, Dict
|
| 14 |
+
import uvicorn
|
| 15 |
+
|
| 16 |
+
from env.cache import CDNCacheEnv, TASK_CONFIGS
|
| 17 |
+
from env.models import Action, StepResult
|
| 18 |
+
|
| 19 |
+
app = FastAPI(
|
| 20 |
+
title="CDN Cache Optimizer - OpenEnv",
|
| 21 |
+
description=(
|
| 22 |
+
"RL environment simulating edge CDN cache management. "
|
| 23 |
+
"Agent decides which files to evict when cache is full. "
|
| 24 |
+
"Implements full OpenEnv spec."
|
| 25 |
+
),
|
| 26 |
+
version="1.0.0",
|
| 27 |
+
)
|
| 28 |
+
|
| 29 |
+
app.add_middleware(
|
| 30 |
+
CORSMiddleware,
|
| 31 |
+
allow_origins=["*"],
|
| 32 |
+
allow_methods=["*"],
|
| 33 |
+
allow_headers=["*"],
|
| 34 |
+
)
|
| 35 |
+
|
| 36 |
+
# Global env instance (stateful per session)
|
| 37 |
+
_env: Optional[CDNCacheEnv] = None
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
class ResetRequest(BaseModel):
|
| 41 |
+
task_id: str = "task_easy"
|
| 42 |
+
seed: int = 42
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
class StepRequest(BaseModel):
|
| 46 |
+
evict_file_id: Optional[str] = None
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
@app.get("/health")
|
| 50 |
+
def health():
|
| 51 |
+
return {"status": "ok", "env": "cdn-cache-optimizer"}
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
@app.get("/tasks")
|
| 55 |
+
def list_tasks():
|
| 56 |
+
return {
|
| 57 |
+
task_id: {
|
| 58 |
+
"name": cfg.name,
|
| 59 |
+
"difficulty": cfg.difficulty,
|
| 60 |
+
"description": cfg.description,
|
| 61 |
+
"cache_capacity_mb": cfg.cache_capacity_mb,
|
| 62 |
+
"episode_length": cfg.episode_length,
|
| 63 |
+
}
|
| 64 |
+
for task_id, cfg in TASK_CONFIGS.items()
|
| 65 |
+
}
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
@app.post("/reset")
|
| 69 |
+
def reset(req: ResetRequest):
|
| 70 |
+
global _env
|
| 71 |
+
if req.task_id not in TASK_CONFIGS:
|
| 72 |
+
raise HTTPException(
|
| 73 |
+
status_code=400,
|
| 74 |
+
detail=f"Unknown task_id '{req.task_id}'. Valid: {list(TASK_CONFIGS.keys())}"
|
| 75 |
+
)
|
| 76 |
+
_env = CDNCacheEnv(task_id=req.task_id, seed=req.seed)
|
| 77 |
+
obs = _env.reset()
|
| 78 |
+
return {"observation": obs.dict(), "task": _env.config.dict()}
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
@app.post("/step")
|
| 82 |
+
def step(req: StepRequest):
|
| 83 |
+
global _env
|
| 84 |
+
if _env is None:
|
| 85 |
+
raise HTTPException(status_code=400, detail="Call /reset first.")
|
| 86 |
+
if _env._done:
|
| 87 |
+
raise HTTPException(status_code=400, detail="Episode done. Call /reset.")
|
| 88 |
+
|
| 89 |
+
action = Action(evict_file_id=req.evict_file_id)
|
| 90 |
+
result: StepResult = _env.step(action)
|
| 91 |
+
return result.dict()
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
@app.get("/state")
|
| 95 |
+
def state():
|
| 96 |
+
global _env
|
| 97 |
+
if _env is None:
|
| 98 |
+
raise HTTPException(status_code=400, detail="Call /reset first.")
|
| 99 |
+
return _env.state()
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
@app.get("/")
|
| 103 |
+
def root():
|
| 104 |
+
return {
|
| 105 |
+
"name": "CDN Cache Optimizer",
|
| 106 |
+
"spec": "OpenEnv v1",
|
| 107 |
+
"endpoints": ["/reset", "/step", "/state", "/health", "/tasks"],
|
| 108 |
+
"tasks": list(TASK_CONFIGS.keys()),
|
| 109 |
+
}
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
if __name__ == "__main__":
|
| 113 |
+
uvicorn.run("api.main:app", host="0.0.0.0", port=7860, reload=False)
|
env/__init__.py
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from env.cache import CDNCacheEnv, TASK_CONFIGS
|
| 2 |
+
from env.models import Observation, Action, Reward, StepResult, TaskConfig
|
| 3 |
+
from env.traffic import TrafficGenerator
|
| 4 |
+
from env.graders import run_all_graders, grade_task_easy, grade_task_medium, grade_task_hard
|
env/__pycache__/__init__.cpython-312.pyc
ADDED
|
Binary file (524 Bytes). View file
|
|
|
env/__pycache__/cache.cpython-312.pyc
ADDED
|
Binary file (11.3 kB). View file
|
|
|
env/__pycache__/graders.cpython-312.pyc
ADDED
|
Binary file (7.16 kB). View file
|
|
|
env/__pycache__/models.cpython-312.pyc
ADDED
|
Binary file (2.89 kB). View file
|
|
|
env/__pycache__/traffic.cpython-312.pyc
ADDED
|
Binary file (7.31 kB). View file
|
|
|
env/cache.py
ADDED
|
@@ -0,0 +1,266 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Core CDN Cache simulation.
|
| 3 |
+
Implements full OpenEnv interface: reset(), step(), state()
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
from collections import defaultdict
|
| 7 |
+
from typing import Dict, Optional, List, Tuple
|
| 8 |
+
from env.models import (
|
| 9 |
+
Observation, Action, Reward, StepResult, FileEntry, TaskConfig
|
| 10 |
+
)
|
| 11 |
+
from env.traffic import TrafficGenerator
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
TASK_CONFIGS = {
|
| 15 |
+
"task_easy": TaskConfig(
|
| 16 |
+
task_id="task_easy",
|
| 17 |
+
name="Steady Traffic Cache",
|
| 18 |
+
difficulty="easy",
|
| 19 |
+
cache_capacity_mb=100.0,
|
| 20 |
+
num_files=30,
|
| 21 |
+
viral_ratio=0.0, # no viral files
|
| 22 |
+
episode_length=100,
|
| 23 |
+
description=(
|
| 24 |
+
"Cache has 100MB capacity. Only steady traffic files. "
|
| 25 |
+
"Agent must learn LRU-style eviction. Target hit rate >= 0.60."
|
| 26 |
+
),
|
| 27 |
+
),
|
| 28 |
+
"task_medium": TaskConfig(
|
| 29 |
+
task_id="task_medium",
|
| 30 |
+
name="Mixed Traffic Cache",
|
| 31 |
+
difficulty="medium",
|
| 32 |
+
cache_capacity_mb=80.0,
|
| 33 |
+
num_files=50,
|
| 34 |
+
viral_ratio=0.2,
|
| 35 |
+
episode_length=150,
|
| 36 |
+
description=(
|
| 37 |
+
"80MB cache, mix of steady and viral files. "
|
| 38 |
+
"Agent must prioritize popular content and handle viral spikes. "
|
| 39 |
+
"Target hit rate >= 0.55 with efficient eviction."
|
| 40 |
+
),
|
| 41 |
+
),
|
| 42 |
+
"task_hard": TaskConfig(
|
| 43 |
+
task_id="task_hard",
|
| 44 |
+
name="Constrained Cache with Viral Bursts",
|
| 45 |
+
difficulty="hard",
|
| 46 |
+
cache_capacity_mb=50.0,
|
| 47 |
+
num_files=80,
|
| 48 |
+
viral_ratio=0.35,
|
| 49 |
+
episode_length=200,
|
| 50 |
+
description=(
|
| 51 |
+
"Tight 50MB cache, many viral bursts, large file sizes. "
|
| 52 |
+
"Agent must predict spikes, avoid cache thrashing, "
|
| 53 |
+
"and maximize bandwidth saved. Target hit rate >= 0.45."
|
| 54 |
+
),
|
| 55 |
+
),
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
class CDNCacheEnv:
|
| 60 |
+
"""
|
| 61 |
+
CDN Cache Optimizer Environment.
|
| 62 |
+
At each step, a file is requested. If not cached, agent must decide
|
| 63 |
+
which file (if any) to evict to make room for the new one.
|
| 64 |
+
"""
|
| 65 |
+
|
| 66 |
+
def __init__(self, task_id: str = "task_easy", seed: int = 42):
|
| 67 |
+
if task_id not in TASK_CONFIGS:
|
| 68 |
+
raise ValueError(f"Unknown task_id: {task_id}. Choose from {list(TASK_CONFIGS.keys())}")
|
| 69 |
+
self.config = TASK_CONFIGS[task_id]
|
| 70 |
+
self.seed = seed
|
| 71 |
+
self._cache: Dict[str, FileEntry] = {} # file_id -> FileEntry
|
| 72 |
+
self._cache_used_mb: float = 0.0
|
| 73 |
+
self._step: int = 0
|
| 74 |
+
self._hits: int = 0
|
| 75 |
+
self._misses: int = 0
|
| 76 |
+
self._recent_hits: List[bool] = []
|
| 77 |
+
self._last_evicted: Optional[str] = None
|
| 78 |
+
self._eviction_counts: Dict[str, int] = defaultdict(int)
|
| 79 |
+
self._total_bandwidth_saved: float = 0.0
|
| 80 |
+
self._done: bool = False
|
| 81 |
+
self.traffic = TrafficGenerator(
|
| 82 |
+
num_files=self.config.num_files,
|
| 83 |
+
viral_ratio=self.config.viral_ratio,
|
| 84 |
+
episode_length=self.config.episode_length,
|
| 85 |
+
seed=seed,
|
| 86 |
+
)
|
| 87 |
+
|
| 88 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 89 |
+
# OpenEnv Interface
|
| 90 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 91 |
+
|
| 92 |
+
def reset(self) -> Observation:
|
| 93 |
+
"""Reset environment to initial state."""
|
| 94 |
+
self._cache = {}
|
| 95 |
+
self._cache_used_mb = 0.0
|
| 96 |
+
self._step = 0
|
| 97 |
+
self._hits = 0
|
| 98 |
+
self._misses = 0
|
| 99 |
+
self._recent_hits = []
|
| 100 |
+
self._last_evicted = None
|
| 101 |
+
self._eviction_counts = defaultdict(int)
|
| 102 |
+
self._total_bandwidth_saved = 0.0
|
| 103 |
+
self._done = False
|
| 104 |
+
self.traffic = TrafficGenerator(
|
| 105 |
+
num_files=self.config.num_files,
|
| 106 |
+
viral_ratio=self.config.viral_ratio,
|
| 107 |
+
episode_length=self.config.episode_length,
|
| 108 |
+
seed=self.seed,
|
| 109 |
+
)
|
| 110 |
+
return self._make_observation(cache_hit=False)
|
| 111 |
+
|
| 112 |
+
def step(self, action: Action) -> StepResult:
|
| 113 |
+
"""Process one step: handle eviction, then serve the request."""
|
| 114 |
+
if self._done:
|
| 115 |
+
raise RuntimeError("Episode done. Call reset() first.")
|
| 116 |
+
|
| 117 |
+
file_id, size_mb, is_viral = self.traffic.get_request(self._step)
|
| 118 |
+
cache_hit = file_id in self._cache
|
| 119 |
+
reward = self._process_step(action, file_id, size_mb, is_viral, cache_hit)
|
| 120 |
+
|
| 121 |
+
self._step += 1
|
| 122 |
+
self._done = self._step >= self.config.episode_length
|
| 123 |
+
|
| 124 |
+
obs = self._make_observation(cache_hit=cache_hit)
|
| 125 |
+
info = {
|
| 126 |
+
"total_hits": self._hits,
|
| 127 |
+
"total_misses": self._misses,
|
| 128 |
+
"hit_rate": self._hits / max(1, self._hits + self._misses),
|
| 129 |
+
"cache_fill_ratio": self._cache_used_mb / self.config.cache_capacity_mb,
|
| 130 |
+
"bandwidth_saved_mb": self._total_bandwidth_saved,
|
| 131 |
+
}
|
| 132 |
+
return StepResult(observation=obs, reward=reward, done=self._done, info=info)
|
| 133 |
+
|
| 134 |
+
def state(self) -> dict:
|
| 135 |
+
"""Return current full environment state."""
|
| 136 |
+
return {
|
| 137 |
+
"step": self._step,
|
| 138 |
+
"done": self._done,
|
| 139 |
+
"cache": {k: v.dict() for k, v in self._cache.items()},
|
| 140 |
+
"cache_used_mb": self._cache_used_mb,
|
| 141 |
+
"cache_capacity_mb": self.config.cache_capacity_mb,
|
| 142 |
+
"hits": self._hits,
|
| 143 |
+
"misses": self._misses,
|
| 144 |
+
"hit_rate": self._hits / max(1, self._hits + self._misses),
|
| 145 |
+
"bandwidth_saved_mb": self._total_bandwidth_saved,
|
| 146 |
+
"task": self.config.dict(),
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 150 |
+
# Internal Logic
|
| 151 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 152 |
+
|
| 153 |
+
def _process_step(
|
| 154 |
+
self,
|
| 155 |
+
action: Action,
|
| 156 |
+
file_id: str,
|
| 157 |
+
size_mb: float,
|
| 158 |
+
is_viral: bool,
|
| 159 |
+
cache_hit: bool,
|
| 160 |
+
) -> Reward:
|
| 161 |
+
hit_bonus = 0.0
|
| 162 |
+
eviction_penalty = 0.0
|
| 163 |
+
thrash_penalty = 0.0
|
| 164 |
+
bandwidth_saved = 0.0
|
| 165 |
+
wasted_penalty = 0.0
|
| 166 |
+
|
| 167 |
+
if cache_hit:
|
| 168 |
+
self._hits += 1
|
| 169 |
+
self._recent_hits.append(True)
|
| 170 |
+
hit_bonus = 1.0 + (0.5 if is_viral else 0.0) # viral hits worth more
|
| 171 |
+
bandwidth_saved = size_mb * 0.01 # normalized
|
| 172 |
+
self._total_bandwidth_saved += size_mb
|
| 173 |
+
# Update frequency
|
| 174 |
+
entry = self._cache[file_id]
|
| 175 |
+
entry.request_frequency = min(entry.request_frequency + 1, 50)
|
| 176 |
+
entry.last_accessed = self._step
|
| 177 |
+
else:
|
| 178 |
+
self._misses += 1
|
| 179 |
+
self._recent_hits.append(False)
|
| 180 |
+
|
| 181 |
+
# Try to insert new file
|
| 182 |
+
if self._cache_used_mb + size_mb <= self.config.cache_capacity_mb:
|
| 183 |
+
# Fits without eviction
|
| 184 |
+
self._insert_file(file_id, size_mb, is_viral)
|
| 185 |
+
else:
|
| 186 |
+
# Need to evict
|
| 187 |
+
if action.evict_file_id and action.evict_file_id in self._cache:
|
| 188 |
+
evicted = self._cache[action.evict_file_id]
|
| 189 |
+
|
| 190 |
+
# Penalize evicting high-frequency files
|
| 191 |
+
if evicted.request_frequency > 10:
|
| 192 |
+
eviction_penalty -= 0.3
|
| 193 |
+
if evicted.is_viral:
|
| 194 |
+
eviction_penalty -= 0.2
|
| 195 |
+
|
| 196 |
+
# Thrash penalty: evicted and re-requested soon
|
| 197 |
+
if action.evict_file_id == self._last_evicted:
|
| 198 |
+
thrash_penalty = -0.5
|
| 199 |
+
|
| 200 |
+
self._eviction_counts[action.evict_file_id] += 1
|
| 201 |
+
self._remove_file(action.evict_file_id)
|
| 202 |
+
self._last_evicted = action.evict_file_id
|
| 203 |
+
|
| 204 |
+
if self._cache_used_mb + size_mb <= self.config.cache_capacity_mb:
|
| 205 |
+
self._insert_file(file_id, size_mb, is_viral)
|
| 206 |
+
else:
|
| 207 |
+
# No valid eviction action β wasted capacity penalty
|
| 208 |
+
wasted_penalty = -0.2
|
| 209 |
+
|
| 210 |
+
# Wasted capacity: cache too empty when we could be caching
|
| 211 |
+
fill_ratio = self._cache_used_mb / self.config.cache_capacity_mb
|
| 212 |
+
if fill_ratio < 0.3 and self._step > 10:
|
| 213 |
+
wasted_penalty -= 0.1
|
| 214 |
+
|
| 215 |
+
# Keep recent_hits window at 20
|
| 216 |
+
if len(self._recent_hits) > 20:
|
| 217 |
+
self._recent_hits.pop(0)
|
| 218 |
+
|
| 219 |
+
total = hit_bonus + eviction_penalty + thrash_penalty + bandwidth_saved + wasted_penalty
|
| 220 |
+
return Reward(
|
| 221 |
+
total=round(total, 4),
|
| 222 |
+
cache_hit_bonus=hit_bonus,
|
| 223 |
+
eviction_penalty=eviction_penalty,
|
| 224 |
+
thrash_penalty=thrash_penalty,
|
| 225 |
+
bandwidth_saved=bandwidth_saved,
|
| 226 |
+
wasted_capacity_penalty=wasted_penalty,
|
| 227 |
+
)
|
| 228 |
+
|
| 229 |
+
def _insert_file(self, file_id: str, size_mb: float, is_viral: bool):
|
| 230 |
+
self._cache[file_id] = FileEntry(
|
| 231 |
+
file_id=file_id,
|
| 232 |
+
size_mb=size_mb,
|
| 233 |
+
request_frequency=1.0,
|
| 234 |
+
is_viral=is_viral,
|
| 235 |
+
last_accessed=self._step,
|
| 236 |
+
)
|
| 237 |
+
self._cache_used_mb += size_mb
|
| 238 |
+
|
| 239 |
+
def _remove_file(self, file_id: str):
|
| 240 |
+
if file_id in self._cache:
|
| 241 |
+
self._cache_used_mb -= self._cache[file_id].size_mb
|
| 242 |
+
self._cache_used_mb = max(0.0, self._cache_used_mb)
|
| 243 |
+
del self._cache[file_id]
|
| 244 |
+
|
| 245 |
+
def _make_observation(self, cache_hit: bool) -> Observation:
|
| 246 |
+
file_id, size_mb, is_viral = self.traffic.get_request(self._step)
|
| 247 |
+
preview = self.traffic.get_preview(self._step)
|
| 248 |
+
recent_hit_rate = (
|
| 249 |
+
sum(self._recent_hits) / len(self._recent_hits)
|
| 250 |
+
if self._recent_hits else 0.0
|
| 251 |
+
)
|
| 252 |
+
fill = self._cache_used_mb / self.config.cache_capacity_mb
|
| 253 |
+
return Observation(
|
| 254 |
+
step=self._step,
|
| 255 |
+
cache_used_mb=round(self._cache_used_mb, 2),
|
| 256 |
+
cache_capacity_mb=self.config.cache_capacity_mb,
|
| 257 |
+
cache_fill_ratio=round(fill, 4),
|
| 258 |
+
cached_files=list(self._cache.values()),
|
| 259 |
+
incoming_file_id=file_id,
|
| 260 |
+
incoming_file_size_mb=size_mb,
|
| 261 |
+
incoming_file_is_viral=is_viral,
|
| 262 |
+
cache_hit=cache_hit,
|
| 263 |
+
recent_hit_rate=round(recent_hit_rate, 4),
|
| 264 |
+
time_of_day=round(self.traffic.time_of_day(self._step), 4),
|
| 265 |
+
queue_preview=preview,
|
| 266 |
+
)
|
env/graders.py
ADDED
|
@@ -0,0 +1,188 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Deterministic graders for all 3 tasks.
|
| 3 |
+
Each grader runs a full episode and returns a score in [0.0, 1.0].
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
from typing import Callable, Dict, List
|
| 7 |
+
from env.cache import CDNCacheEnv, TASK_CONFIGS
|
| 8 |
+
from env.models import Action, Observation
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
GraderPolicy = Callable[[Observation], Action]
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def _run_episode(task_id: str, policy: GraderPolicy, seed: int = 42) -> Dict:
|
| 15 |
+
"""Run one full episode with a given policy. Returns stats dict."""
|
| 16 |
+
env = CDNCacheEnv(task_id=task_id, seed=seed)
|
| 17 |
+
obs = env.reset()
|
| 18 |
+
total_reward = 0.0
|
| 19 |
+
steps = 0
|
| 20 |
+
|
| 21 |
+
while True:
|
| 22 |
+
action = policy(obs)
|
| 23 |
+
result = env.step(action)
|
| 24 |
+
total_reward += result.reward.total
|
| 25 |
+
obs = result.observation
|
| 26 |
+
steps += 1
|
| 27 |
+
if result.done:
|
| 28 |
+
break
|
| 29 |
+
|
| 30 |
+
state = env.state()
|
| 31 |
+
return {
|
| 32 |
+
"hit_rate": state["hit_rate"],
|
| 33 |
+
"total_reward": total_reward,
|
| 34 |
+
"bandwidth_saved_mb": state["bandwidth_saved_mb"],
|
| 35 |
+
"steps": steps,
|
| 36 |
+
"hits": state["hits"],
|
| 37 |
+
"misses": state["misses"],
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 42 |
+
# Built-in Policies (for baseline + grading)
|
| 43 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 44 |
+
|
| 45 |
+
def lru_policy(obs: Observation) -> Action:
|
| 46 |
+
"""Evict Least Recently Used file."""
|
| 47 |
+
if not obs.cached_files:
|
| 48 |
+
return Action(evict_file_id=None)
|
| 49 |
+
lru = min(obs.cached_files, key=lambda f: f.last_accessed)
|
| 50 |
+
return Action(evict_file_id=lru.file_id)
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
def lfu_policy(obs: Observation) -> Action:
|
| 54 |
+
"""Evict Least Frequently Used file."""
|
| 55 |
+
if not obs.cached_files:
|
| 56 |
+
return Action(evict_file_id=None)
|
| 57 |
+
lfu = min(obs.cached_files, key=lambda f: f.request_frequency)
|
| 58 |
+
return Action(evict_file_id=lfu.file_id)
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
def smart_policy(obs: Observation) -> Action:
|
| 62 |
+
"""
|
| 63 |
+
Smarter policy:
|
| 64 |
+
- Never evict viral files
|
| 65 |
+
- Evict the lowest-frequency, largest file (wastes least value, frees most space)
|
| 66 |
+
"""
|
| 67 |
+
if not obs.cached_files:
|
| 68 |
+
return Action(evict_file_id=None)
|
| 69 |
+
|
| 70 |
+
# Filter out viral files from eviction candidates
|
| 71 |
+
candidates = [f for f in obs.cached_files if not f.is_viral]
|
| 72 |
+
if not candidates:
|
| 73 |
+
candidates = obs.cached_files # fallback: evict anything
|
| 74 |
+
|
| 75 |
+
# Score: low frequency = good eviction, large size = good eviction
|
| 76 |
+
def eviction_score(f):
|
| 77 |
+
return -f.request_frequency + f.size_mb * 0.1
|
| 78 |
+
|
| 79 |
+
best = max(candidates, key=eviction_score)
|
| 80 |
+
return Action(evict_file_id=best.file_id)
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
def no_op_policy(obs: Observation) -> Action:
|
| 84 |
+
"""Never evict anything (baseline floor)."""
|
| 85 |
+
return Action(evict_file_id=None)
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 89 |
+
# Grader Functions
|
| 90 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 91 |
+
|
| 92 |
+
def grade_task_easy(policy: GraderPolicy, seed: int = 42) -> float:
|
| 93 |
+
"""
|
| 94 |
+
Easy: steady traffic, 100MB cache.
|
| 95 |
+
Score based purely on hit rate.
|
| 96 |
+
>= 0.60 hit rate = 1.0, scales down to 0.0.
|
| 97 |
+
"""
|
| 98 |
+
stats = _run_episode("task_easy", policy, seed)
|
| 99 |
+
hit_rate = stats["hit_rate"]
|
| 100 |
+
|
| 101 |
+
# Linear scale: 0.0 hit_rate -> 0.0 score, 0.60+ -> 1.0
|
| 102 |
+
score = min(1.0, hit_rate / 0.60)
|
| 103 |
+
return round(score, 4)
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
def grade_task_medium(policy: GraderPolicy, seed: int = 42) -> float:
|
| 107 |
+
"""
|
| 108 |
+
Medium: mixed traffic, viral files.
|
| 109 |
+
Score = weighted combo of hit rate + bandwidth saved.
|
| 110 |
+
"""
|
| 111 |
+
stats = _run_episode("task_medium", policy, seed)
|
| 112 |
+
hit_rate = stats["hit_rate"]
|
| 113 |
+
bandwidth = stats["bandwidth_saved_mb"]
|
| 114 |
+
|
| 115 |
+
# Normalize bandwidth: assume 500MB = perfect
|
| 116 |
+
bw_score = min(1.0, bandwidth / 500.0)
|
| 117 |
+
|
| 118 |
+
# Hit rate: 0.55 = 1.0
|
| 119 |
+
hr_score = min(1.0, hit_rate / 0.55)
|
| 120 |
+
|
| 121 |
+
# 70% hit rate, 30% bandwidth
|
| 122 |
+
score = 0.70 * hr_score + 0.30 * bw_score
|
| 123 |
+
return round(score, 4)
|
| 124 |
+
|
| 125 |
+
|
| 126 |
+
def grade_task_hard(policy: GraderPolicy, seed: int = 42) -> float:
|
| 127 |
+
"""
|
| 128 |
+
Hard: constrained cache, many viral bursts.
|
| 129 |
+
Score = hit rate + bandwidth + thrash avoidance.
|
| 130 |
+
"""
|
| 131 |
+
stats = _run_episode("task_hard", policy, seed)
|
| 132 |
+
hit_rate = stats["hit_rate"]
|
| 133 |
+
bandwidth = stats["bandwidth_saved_mb"]
|
| 134 |
+
total_reward = stats["total_reward"]
|
| 135 |
+
|
| 136 |
+
# Hit rate target: 0.45 = 1.0
|
| 137 |
+
hr_score = min(1.0, hit_rate / 0.45)
|
| 138 |
+
|
| 139 |
+
# Bandwidth: 400MB = 1.0
|
| 140 |
+
bw_score = min(1.0, bandwidth / 400.0)
|
| 141 |
+
|
| 142 |
+
# Reward signal (captures thrash penalties implicitly)
|
| 143 |
+
# Normalize: 200 reward = 1.0
|
| 144 |
+
rw_score = max(0.0, min(1.0, total_reward / 200.0))
|
| 145 |
+
|
| 146 |
+
# 50% hit rate, 25% bandwidth, 25% reward quality
|
| 147 |
+
score = 0.50 * hr_score + 0.25 * bw_score + 0.25 * rw_score
|
| 148 |
+
return round(score, 4)
|
| 149 |
+
|
| 150 |
+
|
| 151 |
+
# ββββββββββββββββββββββββοΏ½οΏ½οΏ½ββββββββββββββββββββ
|
| 152 |
+
# Master Grader
|
| 153 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 154 |
+
|
| 155 |
+
def run_all_graders(policy: GraderPolicy, seed: int = 42) -> Dict:
|
| 156 |
+
"""Run all 3 graders and return scores + summary."""
|
| 157 |
+
easy = grade_task_easy(policy, seed)
|
| 158 |
+
medium = grade_task_medium(policy, seed)
|
| 159 |
+
hard = grade_task_hard(policy, seed)
|
| 160 |
+
overall = round((easy + medium + hard) / 3, 4)
|
| 161 |
+
|
| 162 |
+
return {
|
| 163 |
+
"task_easy": easy,
|
| 164 |
+
"task_medium": medium,
|
| 165 |
+
"task_hard": hard,
|
| 166 |
+
"overall": overall,
|
| 167 |
+
"all_in_range": all(0.0 <= s <= 1.0 for s in [easy, medium, hard]),
|
| 168 |
+
}
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
if __name__ == "__main__":
|
| 172 |
+
print("=== Running Grader Validation ===\n")
|
| 173 |
+
|
| 174 |
+
policies = {
|
| 175 |
+
"no_op": no_op_policy,
|
| 176 |
+
"lru": lru_policy,
|
| 177 |
+
"lfu": lfu_policy,
|
| 178 |
+
"smart": smart_policy,
|
| 179 |
+
}
|
| 180 |
+
|
| 181 |
+
for name, policy in policies.items():
|
| 182 |
+
results = run_all_graders(policy)
|
| 183 |
+
print(f"Policy: {name}")
|
| 184 |
+
print(f" Easy: {results['task_easy']}")
|
| 185 |
+
print(f" Medium: {results['task_medium']}")
|
| 186 |
+
print(f" Hard: {results['task_hard']}")
|
| 187 |
+
print(f" Overall:{results['overall']}")
|
| 188 |
+
print(f" Valid: {results['all_in_range']}\n")
|
env/models.py
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Typed Pydantic models for the CDN Cache Optimizer environment.
|
| 3 |
+
Implements OpenEnv spec: Observation, Action, Reward.
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
from pydantic import BaseModel, Field
|
| 7 |
+
from typing import List, Optional, Dict
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
class FileEntry(BaseModel):
|
| 11 |
+
"""Represents a file currently in the cache."""
|
| 12 |
+
file_id: str
|
| 13 |
+
size_mb: float
|
| 14 |
+
request_frequency: float # requests per last N steps
|
| 15 |
+
is_viral: bool
|
| 16 |
+
last_accessed: int # step number
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
class Observation(BaseModel):
|
| 20 |
+
"""What the agent sees at each step."""
|
| 21 |
+
step: int
|
| 22 |
+
cache_used_mb: float
|
| 23 |
+
cache_capacity_mb: float
|
| 24 |
+
cache_fill_ratio: float
|
| 25 |
+
cached_files: List[FileEntry]
|
| 26 |
+
incoming_file_id: str
|
| 27 |
+
incoming_file_size_mb: float
|
| 28 |
+
incoming_file_is_viral: bool
|
| 29 |
+
cache_hit: bool # was incoming_file already cached?
|
| 30 |
+
recent_hit_rate: float # rolling hit rate last 20 steps
|
| 31 |
+
time_of_day: float # 0.0 to 1.0 (normalized)
|
| 32 |
+
queue_preview: List[str] # next 3 file_ids coming
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
class Action(BaseModel):
|
| 36 |
+
"""What the agent decides to do."""
|
| 37 |
+
evict_file_id: Optional[str] = None # None = do nothing / already cached
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
class Reward(BaseModel):
|
| 41 |
+
"""Reward breakdown for transparency."""
|
| 42 |
+
total: float
|
| 43 |
+
cache_hit_bonus: float
|
| 44 |
+
eviction_penalty: float
|
| 45 |
+
thrash_penalty: float
|
| 46 |
+
bandwidth_saved: float
|
| 47 |
+
wasted_capacity_penalty: float
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
class StepResult(BaseModel):
|
| 51 |
+
"""Full result returned by step()."""
|
| 52 |
+
observation: Observation
|
| 53 |
+
reward: Reward
|
| 54 |
+
done: bool
|
| 55 |
+
info: Dict
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
class TaskConfig(BaseModel):
|
| 59 |
+
"""Configuration for a specific task."""
|
| 60 |
+
task_id: str
|
| 61 |
+
name: str
|
| 62 |
+
difficulty: str
|
| 63 |
+
cache_capacity_mb: float
|
| 64 |
+
num_files: int
|
| 65 |
+
viral_ratio: float
|
| 66 |
+
episode_length: int
|
| 67 |
+
description: str
|
env/traffic.py
ADDED
|
@@ -0,0 +1,119 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Traffic generator for CDN Cache Optimizer.
|
| 3 |
+
Simulates realistic web traffic: steady files + viral bursts.
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import random
|
| 7 |
+
import math
|
| 8 |
+
from dataclasses import dataclass, field
|
| 9 |
+
from typing import List, Tuple
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
@dataclass
|
| 13 |
+
class FileProfile:
|
| 14 |
+
file_id: str
|
| 15 |
+
size_mb: float
|
| 16 |
+
base_popularity: float # base request probability
|
| 17 |
+
is_viral: bool = False
|
| 18 |
+
viral_start: int = -1
|
| 19 |
+
viral_duration: int = 0
|
| 20 |
+
viral_peak: float = 0.0
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
class TrafficGenerator:
|
| 24 |
+
"""
|
| 25 |
+
Generates a stream of file requests.
|
| 26 |
+
- Steady files: consistent low-level demand
|
| 27 |
+
- Viral files: spike suddenly, dominate for a window, then die
|
| 28 |
+
"""
|
| 29 |
+
|
| 30 |
+
def __init__(
|
| 31 |
+
self,
|
| 32 |
+
num_files: int = 50,
|
| 33 |
+
viral_ratio: float = 0.2,
|
| 34 |
+
episode_length: int = 200,
|
| 35 |
+
seed: int = 42,
|
| 36 |
+
):
|
| 37 |
+
self.num_files = num_files
|
| 38 |
+
self.viral_ratio = viral_ratio
|
| 39 |
+
self.episode_length = episode_length
|
| 40 |
+
self.rng = random.Random(seed)
|
| 41 |
+
self.files: List[FileProfile] = []
|
| 42 |
+
self.request_log: List[str] = [] # precomputed episode
|
| 43 |
+
self._build_file_profiles()
|
| 44 |
+
self._precompute_requests()
|
| 45 |
+
|
| 46 |
+
def _build_file_profiles(self):
|
| 47 |
+
num_viral = max(1, int(self.num_files * self.viral_ratio))
|
| 48 |
+
for i in range(self.num_files):
|
| 49 |
+
fid = f"file_{i:03d}"
|
| 50 |
+
size = round(self.rng.uniform(1.0, 20.0), 1)
|
| 51 |
+
is_viral = i < num_viral
|
| 52 |
+
|
| 53 |
+
if is_viral:
|
| 54 |
+
viral_start = self.rng.randint(
|
| 55 |
+
5, max(6, self.episode_length - 30)
|
| 56 |
+
)
|
| 57 |
+
viral_duration = self.rng.randint(10, 30)
|
| 58 |
+
viral_peak = self.rng.uniform(0.4, 0.8)
|
| 59 |
+
base_pop = self.rng.uniform(0.01, 0.05)
|
| 60 |
+
self.files.append(FileProfile(
|
| 61 |
+
file_id=fid,
|
| 62 |
+
size_mb=size,
|
| 63 |
+
base_popularity=base_pop,
|
| 64 |
+
is_viral=True,
|
| 65 |
+
viral_start=viral_start,
|
| 66 |
+
viral_duration=viral_duration,
|
| 67 |
+
viral_peak=viral_peak,
|
| 68 |
+
))
|
| 69 |
+
else:
|
| 70 |
+
base_pop = self.rng.uniform(0.02, 0.15)
|
| 71 |
+
self.files.append(FileProfile(
|
| 72 |
+
file_id=fid,
|
| 73 |
+
size_mb=size,
|
| 74 |
+
base_popularity=base_pop,
|
| 75 |
+
))
|
| 76 |
+
|
| 77 |
+
def _get_popularity_at_step(self, fp: FileProfile, step: int) -> float:
|
| 78 |
+
if not fp.is_viral:
|
| 79 |
+
# Steady with slight daily cycle
|
| 80 |
+
cycle = 0.3 * math.sin(2 * math.pi * step / 50)
|
| 81 |
+
return max(0.001, fp.base_popularity + cycle * fp.base_popularity)
|
| 82 |
+
|
| 83 |
+
# Viral: bell curve spike
|
| 84 |
+
if step < fp.viral_start or step > fp.viral_start + fp.viral_duration:
|
| 85 |
+
return fp.base_popularity
|
| 86 |
+
center = fp.viral_start + fp.viral_duration / 2
|
| 87 |
+
spread = fp.viral_duration / 4
|
| 88 |
+
spike = fp.viral_peak * math.exp(-((step - center) ** 2) / (2 * spread ** 2))
|
| 89 |
+
return fp.base_popularity + spike
|
| 90 |
+
|
| 91 |
+
def _precompute_requests(self):
|
| 92 |
+
self.request_log = []
|
| 93 |
+
for step in range(self.episode_length):
|
| 94 |
+
weights = [
|
| 95 |
+
self._get_popularity_at_step(fp, step) for fp in self.files
|
| 96 |
+
]
|
| 97 |
+
total = sum(weights)
|
| 98 |
+
norm = [w / total for w in weights]
|
| 99 |
+
chosen = self.rng.choices(self.files, weights=norm, k=1)[0]
|
| 100 |
+
self.request_log.append(chosen.file_id)
|
| 101 |
+
|
| 102 |
+
def get_request(self, step: int) -> Tuple[str, float, bool]:
|
| 103 |
+
"""Returns (file_id, size_mb, is_viral) for a given step."""
|
| 104 |
+
if step >= len(self.request_log):
|
| 105 |
+
return self.request_log[-1], 1.0, False
|
| 106 |
+
fid = self.request_log[step]
|
| 107 |
+
fp = next(f for f in self.files if f.file_id == fid)
|
| 108 |
+
return fid, fp.size_mb, fp.is_viral
|
| 109 |
+
|
| 110 |
+
def get_preview(self, step: int, n: int = 3) -> List[str]:
|
| 111 |
+
"""Peek at next n file_ids (simulates prefetch hints)."""
|
| 112 |
+
return self.request_log[step + 1: step + 1 + n]
|
| 113 |
+
|
| 114 |
+
def get_file_profile(self, file_id: str) -> FileProfile:
|
| 115 |
+
return next((f for f in self.files if f.file_id == file_id), None)
|
| 116 |
+
|
| 117 |
+
def time_of_day(self, step: int) -> float:
|
| 118 |
+
"""Normalized 0.0β1.0 cycle."""
|
| 119 |
+
return (step % 50) / 50.0
|
inference.py
ADDED
|
@@ -0,0 +1,221 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
inference.py - CDN Cache Optimizer Baseline Agent
|
| 3 |
+
Uses OpenAI client to run an LLM agent against the environment.
|
| 4 |
+
Emits structured [START], [STEP], [END] logs to stdout.
|
| 5 |
+
|
| 6 |
+
Required env vars:
|
| 7 |
+
API_BASE_URL - LLM API endpoint
|
| 8 |
+
MODEL_NAME - model identifier
|
| 9 |
+
HF_TOKEN - Hugging Face / API key
|
| 10 |
+
"""
|
| 11 |
+
|
| 12 |
+
import os
|
| 13 |
+
import sys
|
| 14 |
+
import json
|
| 15 |
+
import time
|
| 16 |
+
import requests
|
| 17 |
+
from openai import OpenAI
|
| 18 |
+
from env.cache import CDNCacheEnv, TASK_CONFIGS
|
| 19 |
+
from env.models import Action, Observation
|
| 20 |
+
|
| 21 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 22 |
+
# Config from environment
|
| 23 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 24 |
+
API_BASE_URL = os.environ.get("API_BASE_URL", "https://api.openai.com/v1")
|
| 25 |
+
MODEL_NAME = os.environ.get("MODEL_NAME", "gpt-4o-mini")
|
| 26 |
+
HF_TOKEN = os.environ.get("HF_TOKEN", "")
|
| 27 |
+
|
| 28 |
+
if not HF_TOKEN:
|
| 29 |
+
print("[WARN] HF_TOKEN not set. Using API_BASE_URL without auth header override.")
|
| 30 |
+
|
| 31 |
+
client = OpenAI(
|
| 32 |
+
base_url=API_BASE_URL,
|
| 33 |
+
api_key=HF_TOKEN or "placeholder",
|
| 34 |
+
)
|
| 35 |
+
|
| 36 |
+
TASKS = ["task_easy", "task_medium", "task_hard"]
|
| 37 |
+
SEED = 42
|
| 38 |
+
|
| 39 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 40 |
+
# LLM Agent
|
| 41 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 42 |
+
|
| 43 |
+
SYSTEM_PROMPT = """You are an intelligent CDN cache management agent.
|
| 44 |
+
|
| 45 |
+
At each step you receive the current cache state and an incoming file request.
|
| 46 |
+
Your job: decide which file to evict (if any) to make room for new content.
|
| 47 |
+
|
| 48 |
+
Rules:
|
| 49 |
+
- Only evict a file if the cache is nearly full and the incoming file is NOT already cached
|
| 50 |
+
- Prefer evicting files with LOW request_frequency and NOT viral
|
| 51 |
+
- Never evict a file that was just evicted (cache thrashing)
|
| 52 |
+
- If cache has space, respond with null (no eviction needed)
|
| 53 |
+
|
| 54 |
+
You MUST respond with ONLY valid JSON in this exact format:
|
| 55 |
+
{"evict_file_id": "<file_id>" or null}
|
| 56 |
+
|
| 57 |
+
No explanation. No markdown. Only the JSON object."""
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def build_user_prompt(obs: Observation) -> str:
|
| 61 |
+
cached_summary = []
|
| 62 |
+
for f in obs.cached_files:
|
| 63 |
+
cached_summary.append(
|
| 64 |
+
f" - {f.file_id}: size={f.size_mb}MB freq={f.request_frequency:.1f} "
|
| 65 |
+
f"viral={f.is_viral} last_accessed=step_{f.last_accessed}"
|
| 66 |
+
)
|
| 67 |
+
cached_str = "\n".join(cached_summary) if cached_summary else " (empty)"
|
| 68 |
+
|
| 69 |
+
space_needed = obs.incoming_file_size_mb
|
| 70 |
+
space_free = obs.cache_capacity_mb - obs.cache_used_mb
|
| 71 |
+
|
| 72 |
+
return f"""Step {obs.step} | Time of day: {obs.time_of_day:.2f} | Hit rate: {obs.recent_hit_rate:.2f}
|
| 73 |
+
|
| 74 |
+
Cache: {obs.cache_used_mb:.1f}MB / {obs.cache_capacity_mb:.1f}MB used ({obs.cache_fill_ratio*100:.1f}% full)
|
| 75 |
+
Free space: {space_free:.1f}MB
|
| 76 |
+
|
| 77 |
+
Incoming request:
|
| 78 |
+
file_id: {obs.incoming_file_id}
|
| 79 |
+
size: {obs.incoming_file_size_mb}MB
|
| 80 |
+
viral: {obs.incoming_file_is_viral}
|
| 81 |
+
already_cached: {obs.cache_hit}
|
| 82 |
+
space_needed_to_cache: {"none (fits)" if space_free >= space_needed else f"{space_needed - space_free:.1f}MB deficit"}
|
| 83 |
+
|
| 84 |
+
Next 3 requests preview: {obs.queue_preview}
|
| 85 |
+
|
| 86 |
+
Currently cached files ({len(obs.cached_files)} files):
|
| 87 |
+
{cached_str}
|
| 88 |
+
|
| 89 |
+
Decide: which file to evict? (null if no eviction needed)"""
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
def llm_action(obs: Observation, step_num: int) -> Action:
|
| 93 |
+
"""Call LLM and parse action. Fall back to LRU on failure."""
|
| 94 |
+
prompt = build_user_prompt(obs)
|
| 95 |
+
try:
|
| 96 |
+
response = client.chat.completions.create(
|
| 97 |
+
model=MODEL_NAME,
|
| 98 |
+
messages=[
|
| 99 |
+
{"role": "system", "content": SYSTEM_PROMPT},
|
| 100 |
+
{"role": "user", "content": prompt},
|
| 101 |
+
],
|
| 102 |
+
max_tokens=50,
|
| 103 |
+
temperature=0.0,
|
| 104 |
+
)
|
| 105 |
+
raw = response.choices[0].message.content.strip()
|
| 106 |
+
parsed = json.loads(raw)
|
| 107 |
+
return Action(evict_file_id=parsed.get("evict_file_id"))
|
| 108 |
+
except Exception as e:
|
| 109 |
+
# Fallback: LRU
|
| 110 |
+
if obs.cached_files:
|
| 111 |
+
lru = min(obs.cached_files, key=lambda f: f.last_accessed)
|
| 112 |
+
return Action(evict_file_id=lru.file_id)
|
| 113 |
+
return Action(evict_file_id=None)
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 117 |
+
# Run one task episode
|
| 118 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 119 |
+
|
| 120 |
+
def run_task(task_id: str) -> dict:
|
| 121 |
+
config = TASK_CONFIGS[task_id]
|
| 122 |
+
env = CDNCacheEnv(task_id=task_id, seed=SEED)
|
| 123 |
+
obs = env.reset()
|
| 124 |
+
|
| 125 |
+
total_reward = 0.0
|
| 126 |
+
step_num = 0
|
| 127 |
+
|
| 128 |
+
# ββ [START] ββ
|
| 129 |
+
print(json.dumps({
|
| 130 |
+
"type": "START",
|
| 131 |
+
"task_id": task_id,
|
| 132 |
+
"task_name": config.name,
|
| 133 |
+
"difficulty": config.difficulty,
|
| 134 |
+
"episode_length": config.episode_length,
|
| 135 |
+
"cache_capacity_mb": config.cache_capacity_mb,
|
| 136 |
+
"model": MODEL_NAME,
|
| 137 |
+
"seed": SEED,
|
| 138 |
+
}))
|
| 139 |
+
sys.stdout.flush()
|
| 140 |
+
|
| 141 |
+
while True:
|
| 142 |
+
action = llm_action(obs, step_num)
|
| 143 |
+
result = env.step(action)
|
| 144 |
+
|
| 145 |
+
total_reward += result.reward.total
|
| 146 |
+
|
| 147 |
+
# ββ [STEP] ββ
|
| 148 |
+
print(json.dumps({
|
| 149 |
+
"type": "STEP",
|
| 150 |
+
"task_id": task_id,
|
| 151 |
+
"step": step_num,
|
| 152 |
+
"action": {"evict_file_id": action.evict_file_id},
|
| 153 |
+
"cache_hit": result.observation.cache_hit,
|
| 154 |
+
"reward": result.reward.total,
|
| 155 |
+
"reward_breakdown": {
|
| 156 |
+
"cache_hit_bonus": result.reward.cache_hit_bonus,
|
| 157 |
+
"eviction_penalty": result.reward.eviction_penalty,
|
| 158 |
+
"thrash_penalty": result.reward.thrash_penalty,
|
| 159 |
+
"bandwidth_saved": result.reward.bandwidth_saved,
|
| 160 |
+
"wasted_capacity_penalty": result.reward.wasted_capacity_penalty,
|
| 161 |
+
},
|
| 162 |
+
"cumulative_reward": round(total_reward, 4),
|
| 163 |
+
"hit_rate": result.observation.recent_hit_rate,
|
| 164 |
+
"cache_fill": result.observation.cache_fill_ratio,
|
| 165 |
+
"done": result.done,
|
| 166 |
+
}))
|
| 167 |
+
sys.stdout.flush()
|
| 168 |
+
|
| 169 |
+
obs = result.observation
|
| 170 |
+
step_num += 1
|
| 171 |
+
|
| 172 |
+
if result.done:
|
| 173 |
+
break
|
| 174 |
+
|
| 175 |
+
final_state = env.state()
|
| 176 |
+
final_hit_rate = final_state["hit_rate"]
|
| 177 |
+
|
| 178 |
+
# ββ [END] ββ
|
| 179 |
+
print(json.dumps({
|
| 180 |
+
"type": "END",
|
| 181 |
+
"task_id": task_id,
|
| 182 |
+
"task_name": config.name,
|
| 183 |
+
"total_steps": step_num,
|
| 184 |
+
"total_reward": round(total_reward, 4),
|
| 185 |
+
"final_hit_rate": round(final_hit_rate, 4),
|
| 186 |
+
"bandwidth_saved_mb": round(final_state["bandwidth_saved_mb"], 2),
|
| 187 |
+
"total_hits": final_state["hits"],
|
| 188 |
+
"total_misses": final_state["misses"],
|
| 189 |
+
"score": round(min(1.0, final_hit_rate / {"task_easy": 0.60, "task_medium": 0.55, "task_hard": 0.45}[task_id]), 4),
|
| 190 |
+
}))
|
| 191 |
+
sys.stdout.flush()
|
| 192 |
+
|
| 193 |
+
return {
|
| 194 |
+
"task_id": task_id,
|
| 195 |
+
"total_reward": round(total_reward, 4),
|
| 196 |
+
"final_hit_rate": round(final_hit_rate, 4),
|
| 197 |
+
"score": round(min(1.0, final_hit_rate / {"task_easy": 0.60, "task_medium": 0.55, "task_hard": 0.45}[task_id]), 4),
|
| 198 |
+
}
|
| 199 |
+
|
| 200 |
+
|
| 201 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 202 |
+
# Main
|
| 203 |
+
# βββββββββββββββββββββββββββββββββββββββββββββ
|
| 204 |
+
|
| 205 |
+
if __name__ == "__main__":
|
| 206 |
+
print(f"[INFO] Starting CDN Cache Optimizer inference", file=sys.stderr)
|
| 207 |
+
print(f"[INFO] Model: {MODEL_NAME} | API: {API_BASE_URL}", file=sys.stderr)
|
| 208 |
+
|
| 209 |
+
results = []
|
| 210 |
+
for task_id in TASKS:
|
| 211 |
+
print(f"\n[INFO] Running {task_id}...", file=sys.stderr)
|
| 212 |
+
r = run_task(task_id)
|
| 213 |
+
results.append(r)
|
| 214 |
+
print(f"[INFO] {task_id} done | score={r['score']} hit_rate={r['final_hit_rate']}", file=sys.stderr)
|
| 215 |
+
|
| 216 |
+
print("\n[INFO] === FINAL RESULTS ===", file=sys.stderr)
|
| 217 |
+
for r in results:
|
| 218 |
+
print(f"[INFO] {r['task_id']}: score={r['score']} reward={r['total_reward']}", file=sys.stderr)
|
| 219 |
+
|
| 220 |
+
overall = round(sum(r["score"] for r in results) / len(results), 4)
|
| 221 |
+
print(f"[INFO] Overall score: {overall}", file=sys.stderr)
|
openenv.yaml
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: cdn-cache-optimizer
|
| 2 |
+
version: "1.0.0"
|
| 3 |
+
description: >
|
| 4 |
+
Edge CDN Cache Optimizer β an RL environment where an agent manages
|
| 5 |
+
a content delivery network cache. The agent decides which files to evict
|
| 6 |
+
when the cache is full, balancing hit rate, bandwidth efficiency, and
|
| 7 |
+
avoiding cache thrashing. Simulates real-world viral traffic spikes
|
| 8 |
+
alongside steady baseline demand.
|
| 9 |
+
|
| 10 |
+
author: umar
|
| 11 |
+
tags:
|
| 12 |
+
- openenv
|
| 13 |
+
- cdn
|
| 14 |
+
- cache
|
| 15 |
+
- infrastructure
|
| 16 |
+
- real-world
|
| 17 |
+
|
| 18 |
+
tasks:
|
| 19 |
+
- id: task_easy
|
| 20 |
+
name: Steady Traffic Cache
|
| 21 |
+
difficulty: easy
|
| 22 |
+
episode_length: 100
|
| 23 |
+
cache_capacity_mb: 100.0
|
| 24 |
+
|
| 25 |
+
- id: task_medium
|
| 26 |
+
name: Mixed Traffic Cache
|
| 27 |
+
difficulty: medium
|
| 28 |
+
episode_length: 150
|
| 29 |
+
cache_capacity_mb: 80.0
|
| 30 |
+
|
| 31 |
+
- id: task_hard
|
| 32 |
+
name: Constrained Cache with Viral Bursts
|
| 33 |
+
difficulty: hard
|
| 34 |
+
episode_length: 200
|
| 35 |
+
cache_capacity_mb: 50.0
|
| 36 |
+
|
| 37 |
+
observation_space:
|
| 38 |
+
type: structured
|
| 39 |
+
fields:
|
| 40 |
+
- step: int
|
| 41 |
+
- cache_used_mb: float
|
| 42 |
+
- cache_capacity_mb: float
|
| 43 |
+
- cache_fill_ratio: float
|
| 44 |
+
- cached_files: list[FileEntry]
|
| 45 |
+
- incoming_file_id: str
|
| 46 |
+
- incoming_file_size_mb: float
|
| 47 |
+
- incoming_file_is_viral: bool
|
| 48 |
+
- cache_hit: bool
|
| 49 |
+
- recent_hit_rate: float
|
| 50 |
+
- time_of_day: float
|
| 51 |
+
- queue_preview: list[str]
|
| 52 |
+
|
| 53 |
+
action_space:
|
| 54 |
+
type: structured
|
| 55 |
+
fields:
|
| 56 |
+
- evict_file_id: str | null
|
| 57 |
+
|
| 58 |
+
reward_range: [-1.0, 1.5]
|
| 59 |
+
|
| 60 |
+
endpoints:
|
| 61 |
+
reset: POST /reset
|
| 62 |
+
step: POST /step
|
| 63 |
+
state: GET /state
|
| 64 |
+
|
| 65 |
+
runtime:
|
| 66 |
+
framework: fastapi
|
| 67 |
+
python: "3.11"
|
| 68 |
+
port: 7860
|
requirements.txt
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
fastapi==0.111.0
|
| 2 |
+
uvicorn==0.29.0
|
| 3 |
+
pydantic==2.7.1
|
| 4 |
+
openai==1.30.1
|
| 5 |
+
requests==2.31.0
|
| 6 |
+
python-multipart==0.0.9
|