umar-sharif821 commited on
Commit
09e32d2
Β·
0 Parent(s):

initial: CDN Cache Optimizer OpenEnv

Browse files
Dockerfile ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ # HF Spaces expects port 7860
4
+ EXPOSE 7860
5
+
6
+ WORKDIR /app
7
+
8
+ # Install deps
9
+ COPY requirements.txt .
10
+ RUN pip install --no-cache-dir -r requirements.txt
11
+
12
+ # Copy source
13
+ COPY env/ ./env/
14
+ COPY api/ ./api/
15
+ COPY inference.py .
16
+ COPY openenv.yaml .
17
+
18
+ # Environment variables (override at runtime)
19
+ ENV API_BASE_URL="https://api.openai.com/v1"
20
+ ENV MODEL_NAME="gpt-4o-mini"
21
+ ENV HF_TOKEN=""
22
+
23
+ # Start FastAPI server
24
+ CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "7860"]
README.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🌐 CDN Cache Optimizer β€” OpenEnv RL Environment
2
+
3
+ An RL environment simulating **edge CDN cache management** β€” the exact problem companies like Meta solve at planetary scale. An agent manages a cache of limited size, deciding which files to evict when new content arrives, balancing **hit rate**, **bandwidth efficiency**, and **thrash avoidance**.
4
+
5
+ ---
6
+
7
+ ## 🎯 Motivation
8
+
9
+ Content Delivery Networks serve billions of files daily. Edge servers have limited storage, so they must constantly decide: *which cached files to keep, and which to evict?* Standard algorithms like LRU aren't optimal β€” especially when traffic has **viral bursts** (a file suddenly gets 50x more requests for 20 minutes, then drops back to zero).
10
+
11
+ A smarter agent can:
12
+ - Predict viral spikes from queue previews
13
+ - Avoid evicting high-frequency files
14
+ - Prevent cache thrashing (evicting then immediately re-requesting)
15
+ - Maximize bandwidth saved for users
16
+
17
+ ---
18
+
19
+ ## πŸ”§ Environment Description
20
+
21
+ At each step, a file is requested from the network. If it's already in the cache β†’ **cache hit** (reward). If not β†’ **cache miss**, and the agent must decide whether to evict an existing file to make room.
22
+
23
+ ### Traffic Model
24
+ - **Steady files**: Consistent, cyclical demand
25
+ - **Viral files**: Bell-curve spike in popularity, then fade back to baseline
26
+
27
+ ---
28
+
29
+ ## πŸ“ Action & Observation Space
30
+
31
+ ### Observation Space
32
+ | Field | Type | Description |
33
+ |-------|------|-------------|
34
+ | `step` | int | Current episode step |
35
+ | `cache_used_mb` | float | MB currently used |
36
+ | `cache_capacity_mb` | float | Total cache size |
37
+ | `cache_fill_ratio` | float | 0.0–1.0 fill level |
38
+ | `cached_files` | List[FileEntry] | All files in cache with metadata |
39
+ | `incoming_file_id` | str | File being requested |
40
+ | `incoming_file_size_mb` | float | Size of incoming file |
41
+ | `incoming_file_is_viral` | bool | Is this file currently viral? |
42
+ | `cache_hit` | bool | Is incoming file already cached? |
43
+ | `recent_hit_rate` | float | Rolling hit rate (last 20 steps) |
44
+ | `time_of_day` | float | Normalized 0.0–1.0 daily cycle |
45
+ | `queue_preview` | List[str] | Next 3 file IDs (prefetch hint) |
46
+
47
+ ### FileEntry Fields
48
+ | Field | Type | Description |
49
+ |-------|------|-------------|
50
+ | `file_id` | str | Unique identifier |
51
+ | `size_mb` | float | File size in MB |
52
+ | `request_frequency` | float | Requests since cached |
53
+ | `is_viral` | bool | Currently viral |
54
+ | `last_accessed` | int | Step number of last access |
55
+
56
+ ### Action Space
57
+ | Field | Type | Description |
58
+ |-------|------|-------------|
59
+ | `evict_file_id` | str \| null | File to evict (null = no eviction) |
60
+
61
+ ### Reward Function
62
+ | Component | Range | Description |
63
+ |-----------|-------|-------------|
64
+ | `cache_hit_bonus` | +1.0 to +1.5 | Hit reward (viral hits = +1.5) |
65
+ | `bandwidth_saved` | +0.0 to +0.2 | Reward for bandwidth efficiency |
66
+ | `eviction_penalty` | -0.0 to -0.5 | Penalty for evicting popular files |
67
+ | `thrash_penalty` | 0.0 or -0.5 | Penalty for evicting same file twice |
68
+ | `wasted_capacity_penalty` | -0.0 to -0.3 | Penalty for leaving cache empty |
69
+
70
+ ---
71
+
72
+ ## πŸ“‹ Tasks
73
+
74
+ ### Task 1: Steady Traffic Cache (Easy)
75
+ - **Cache**: 100MB | **Files**: 30 | **Steps**: 100
76
+ - No viral files β€” steady demand only
77
+ - Agent learns basic LRU-style eviction
78
+ - **Target hit rate**: β‰₯ 0.60 β†’ score 1.0
79
+ - **Baseline score**: ~0.75
80
+
81
+ ### Task 2: Mixed Traffic Cache (Medium)
82
+ - **Cache**: 80MB | **Files**: 50 | **Steps**: 150
83
+ - 20% viral files mixed with steady demand
84
+ - Agent must handle spikes and prioritize popular content
85
+ - **Score**: 70% hit rate + 30% bandwidth
86
+ - **Baseline score**: ~0.60
87
+
88
+ ### Task 3: Constrained Cache with Viral Bursts (Hard)
89
+ - **Cache**: 50MB | **Files**: 80 | **Steps**: 200
90
+ - 35% viral files, tight capacity, large file sizes
91
+ - Agent must predict spikes, avoid thrashing
92
+ - **Score**: 50% hit rate + 25% bandwidth + 25% reward quality
93
+ - **Baseline score**: ~0.45
94
+
95
+ ---
96
+
97
+ ## πŸš€ Setup & Usage
98
+
99
+ ### Local Setup
100
+ ```bash
101
+ git clone <repo>
102
+ cd cdn-cache-env
103
+ pip install -r requirements.txt
104
+ ```
105
+
106
+ ### Run API Server
107
+ ```bash
108
+ uvicorn api.main:app --host 0.0.0.0 --port 7860
109
+ ```
110
+
111
+ ### Run Inference (Baseline Agent)
112
+ ```bash
113
+ export API_BASE_URL="https://api.openai.com/v1"
114
+ export MODEL_NAME="gpt-4o-mini"
115
+ export HF_TOKEN="your_token_here"
116
+
117
+ python inference.py
118
+ ```
119
+
120
+ ### Docker
121
+ ```bash
122
+ docker build -t cdn-cache-env .
123
+ docker run -p 7860:7860 \
124
+ -e API_BASE_URL="https://api.openai.com/v1" \
125
+ -e MODEL_NAME="gpt-4o-mini" \
126
+ -e HF_TOKEN="your_token" \
127
+ cdn-cache-env
128
+ ```
129
+
130
+ ---
131
+
132
+ ## 🌐 API Endpoints
133
+
134
+ | Method | Endpoint | Description |
135
+ |--------|----------|-------------|
136
+ | GET | `/health` | Health check (returns 200) |
137
+ | GET | `/tasks` | List all tasks |
138
+ | POST | `/reset` | Start episode `{"task_id": "task_easy", "seed": 42}` |
139
+ | POST | `/step` | Take action `{"evict_file_id": "file_001" or null}` |
140
+ | GET | `/state` | Full environment state |
141
+
142
+ ---
143
+
144
+ ## πŸ“Š Baseline Scores
145
+
146
+ Using the built-in `smart_policy` (non-LLM baseline):
147
+
148
+ | Task | Hit Rate | Score |
149
+ |------|----------|-------|
150
+ | Easy | ~0.72 | ~1.00 |
151
+ | Medium | ~0.61 | ~0.82 |
152
+ | Hard | ~0.48 | ~0.78 |
153
+ | **Overall** | | **~0.87** |
154
+
155
+ ---
156
+
157
+ ## πŸ“ Log Format
158
+
159
+ `inference.py` emits structured JSON logs:
160
+
161
+ ```
162
+ {"type": "START", "task_id": "task_easy", ...}
163
+ {"type": "STEP", "step": 0, "action": {...}, "reward": 1.0, ...}
164
+ {"type": "END", "total_reward": 87.3, "final_hit_rate": 0.72, "score": 1.0}
165
+ ```
api/__init__.py ADDED
File without changes
api/__pycache__/__init__.cpython-312.pyc ADDED
Binary file (139 Bytes). View file
 
api/__pycache__/main.cpython-312.pyc ADDED
Binary file (5.07 kB). View file
 
api/main.py ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ FastAPI server exposing OpenEnv interface over HTTP.
3
+ Endpoints: POST /reset, POST /step, GET /state, GET /health, GET /tasks
4
+ """
5
+
6
+ import sys
7
+ import os
8
+ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
9
+
10
+ from fastapi import FastAPI, HTTPException
11
+ from fastapi.middleware.cors import CORSMiddleware
12
+ from pydantic import BaseModel
13
+ from typing import Optional, Dict
14
+ import uvicorn
15
+
16
+ from env.cache import CDNCacheEnv, TASK_CONFIGS
17
+ from env.models import Action, StepResult
18
+
19
+ app = FastAPI(
20
+ title="CDN Cache Optimizer - OpenEnv",
21
+ description=(
22
+ "RL environment simulating edge CDN cache management. "
23
+ "Agent decides which files to evict when cache is full. "
24
+ "Implements full OpenEnv spec."
25
+ ),
26
+ version="1.0.0",
27
+ )
28
+
29
+ app.add_middleware(
30
+ CORSMiddleware,
31
+ allow_origins=["*"],
32
+ allow_methods=["*"],
33
+ allow_headers=["*"],
34
+ )
35
+
36
+ # Global env instance (stateful per session)
37
+ _env: Optional[CDNCacheEnv] = None
38
+
39
+
40
+ class ResetRequest(BaseModel):
41
+ task_id: str = "task_easy"
42
+ seed: int = 42
43
+
44
+
45
+ class StepRequest(BaseModel):
46
+ evict_file_id: Optional[str] = None
47
+
48
+
49
+ @app.get("/health")
50
+ def health():
51
+ return {"status": "ok", "env": "cdn-cache-optimizer"}
52
+
53
+
54
+ @app.get("/tasks")
55
+ def list_tasks():
56
+ return {
57
+ task_id: {
58
+ "name": cfg.name,
59
+ "difficulty": cfg.difficulty,
60
+ "description": cfg.description,
61
+ "cache_capacity_mb": cfg.cache_capacity_mb,
62
+ "episode_length": cfg.episode_length,
63
+ }
64
+ for task_id, cfg in TASK_CONFIGS.items()
65
+ }
66
+
67
+
68
+ @app.post("/reset")
69
+ def reset(req: ResetRequest):
70
+ global _env
71
+ if req.task_id not in TASK_CONFIGS:
72
+ raise HTTPException(
73
+ status_code=400,
74
+ detail=f"Unknown task_id '{req.task_id}'. Valid: {list(TASK_CONFIGS.keys())}"
75
+ )
76
+ _env = CDNCacheEnv(task_id=req.task_id, seed=req.seed)
77
+ obs = _env.reset()
78
+ return {"observation": obs.dict(), "task": _env.config.dict()}
79
+
80
+
81
+ @app.post("/step")
82
+ def step(req: StepRequest):
83
+ global _env
84
+ if _env is None:
85
+ raise HTTPException(status_code=400, detail="Call /reset first.")
86
+ if _env._done:
87
+ raise HTTPException(status_code=400, detail="Episode done. Call /reset.")
88
+
89
+ action = Action(evict_file_id=req.evict_file_id)
90
+ result: StepResult = _env.step(action)
91
+ return result.dict()
92
+
93
+
94
+ @app.get("/state")
95
+ def state():
96
+ global _env
97
+ if _env is None:
98
+ raise HTTPException(status_code=400, detail="Call /reset first.")
99
+ return _env.state()
100
+
101
+
102
+ @app.get("/")
103
+ def root():
104
+ return {
105
+ "name": "CDN Cache Optimizer",
106
+ "spec": "OpenEnv v1",
107
+ "endpoints": ["/reset", "/step", "/state", "/health", "/tasks"],
108
+ "tasks": list(TASK_CONFIGS.keys()),
109
+ }
110
+
111
+
112
+ if __name__ == "__main__":
113
+ uvicorn.run("api.main:app", host="0.0.0.0", port=7860, reload=False)
env/__init__.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ from env.cache import CDNCacheEnv, TASK_CONFIGS
2
+ from env.models import Observation, Action, Reward, StepResult, TaskConfig
3
+ from env.traffic import TrafficGenerator
4
+ from env.graders import run_all_graders, grade_task_easy, grade_task_medium, grade_task_hard
env/__pycache__/__init__.cpython-312.pyc ADDED
Binary file (524 Bytes). View file
 
env/__pycache__/cache.cpython-312.pyc ADDED
Binary file (11.3 kB). View file
 
env/__pycache__/graders.cpython-312.pyc ADDED
Binary file (7.16 kB). View file
 
env/__pycache__/models.cpython-312.pyc ADDED
Binary file (2.89 kB). View file
 
env/__pycache__/traffic.cpython-312.pyc ADDED
Binary file (7.31 kB). View file
 
env/cache.py ADDED
@@ -0,0 +1,266 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Core CDN Cache simulation.
3
+ Implements full OpenEnv interface: reset(), step(), state()
4
+ """
5
+
6
+ from collections import defaultdict
7
+ from typing import Dict, Optional, List, Tuple
8
+ from env.models import (
9
+ Observation, Action, Reward, StepResult, FileEntry, TaskConfig
10
+ )
11
+ from env.traffic import TrafficGenerator
12
+
13
+
14
+ TASK_CONFIGS = {
15
+ "task_easy": TaskConfig(
16
+ task_id="task_easy",
17
+ name="Steady Traffic Cache",
18
+ difficulty="easy",
19
+ cache_capacity_mb=100.0,
20
+ num_files=30,
21
+ viral_ratio=0.0, # no viral files
22
+ episode_length=100,
23
+ description=(
24
+ "Cache has 100MB capacity. Only steady traffic files. "
25
+ "Agent must learn LRU-style eviction. Target hit rate >= 0.60."
26
+ ),
27
+ ),
28
+ "task_medium": TaskConfig(
29
+ task_id="task_medium",
30
+ name="Mixed Traffic Cache",
31
+ difficulty="medium",
32
+ cache_capacity_mb=80.0,
33
+ num_files=50,
34
+ viral_ratio=0.2,
35
+ episode_length=150,
36
+ description=(
37
+ "80MB cache, mix of steady and viral files. "
38
+ "Agent must prioritize popular content and handle viral spikes. "
39
+ "Target hit rate >= 0.55 with efficient eviction."
40
+ ),
41
+ ),
42
+ "task_hard": TaskConfig(
43
+ task_id="task_hard",
44
+ name="Constrained Cache with Viral Bursts",
45
+ difficulty="hard",
46
+ cache_capacity_mb=50.0,
47
+ num_files=80,
48
+ viral_ratio=0.35,
49
+ episode_length=200,
50
+ description=(
51
+ "Tight 50MB cache, many viral bursts, large file sizes. "
52
+ "Agent must predict spikes, avoid cache thrashing, "
53
+ "and maximize bandwidth saved. Target hit rate >= 0.45."
54
+ ),
55
+ ),
56
+ }
57
+
58
+
59
+ class CDNCacheEnv:
60
+ """
61
+ CDN Cache Optimizer Environment.
62
+ At each step, a file is requested. If not cached, agent must decide
63
+ which file (if any) to evict to make room for the new one.
64
+ """
65
+
66
+ def __init__(self, task_id: str = "task_easy", seed: int = 42):
67
+ if task_id not in TASK_CONFIGS:
68
+ raise ValueError(f"Unknown task_id: {task_id}. Choose from {list(TASK_CONFIGS.keys())}")
69
+ self.config = TASK_CONFIGS[task_id]
70
+ self.seed = seed
71
+ self._cache: Dict[str, FileEntry] = {} # file_id -> FileEntry
72
+ self._cache_used_mb: float = 0.0
73
+ self._step: int = 0
74
+ self._hits: int = 0
75
+ self._misses: int = 0
76
+ self._recent_hits: List[bool] = []
77
+ self._last_evicted: Optional[str] = None
78
+ self._eviction_counts: Dict[str, int] = defaultdict(int)
79
+ self._total_bandwidth_saved: float = 0.0
80
+ self._done: bool = False
81
+ self.traffic = TrafficGenerator(
82
+ num_files=self.config.num_files,
83
+ viral_ratio=self.config.viral_ratio,
84
+ episode_length=self.config.episode_length,
85
+ seed=seed,
86
+ )
87
+
88
+ # ─────────────────────────────────────────────
89
+ # OpenEnv Interface
90
+ # ─────────────────────────────────────────────
91
+
92
+ def reset(self) -> Observation:
93
+ """Reset environment to initial state."""
94
+ self._cache = {}
95
+ self._cache_used_mb = 0.0
96
+ self._step = 0
97
+ self._hits = 0
98
+ self._misses = 0
99
+ self._recent_hits = []
100
+ self._last_evicted = None
101
+ self._eviction_counts = defaultdict(int)
102
+ self._total_bandwidth_saved = 0.0
103
+ self._done = False
104
+ self.traffic = TrafficGenerator(
105
+ num_files=self.config.num_files,
106
+ viral_ratio=self.config.viral_ratio,
107
+ episode_length=self.config.episode_length,
108
+ seed=self.seed,
109
+ )
110
+ return self._make_observation(cache_hit=False)
111
+
112
+ def step(self, action: Action) -> StepResult:
113
+ """Process one step: handle eviction, then serve the request."""
114
+ if self._done:
115
+ raise RuntimeError("Episode done. Call reset() first.")
116
+
117
+ file_id, size_mb, is_viral = self.traffic.get_request(self._step)
118
+ cache_hit = file_id in self._cache
119
+ reward = self._process_step(action, file_id, size_mb, is_viral, cache_hit)
120
+
121
+ self._step += 1
122
+ self._done = self._step >= self.config.episode_length
123
+
124
+ obs = self._make_observation(cache_hit=cache_hit)
125
+ info = {
126
+ "total_hits": self._hits,
127
+ "total_misses": self._misses,
128
+ "hit_rate": self._hits / max(1, self._hits + self._misses),
129
+ "cache_fill_ratio": self._cache_used_mb / self.config.cache_capacity_mb,
130
+ "bandwidth_saved_mb": self._total_bandwidth_saved,
131
+ }
132
+ return StepResult(observation=obs, reward=reward, done=self._done, info=info)
133
+
134
+ def state(self) -> dict:
135
+ """Return current full environment state."""
136
+ return {
137
+ "step": self._step,
138
+ "done": self._done,
139
+ "cache": {k: v.dict() for k, v in self._cache.items()},
140
+ "cache_used_mb": self._cache_used_mb,
141
+ "cache_capacity_mb": self.config.cache_capacity_mb,
142
+ "hits": self._hits,
143
+ "misses": self._misses,
144
+ "hit_rate": self._hits / max(1, self._hits + self._misses),
145
+ "bandwidth_saved_mb": self._total_bandwidth_saved,
146
+ "task": self.config.dict(),
147
+ }
148
+
149
+ # ─────────────────────────────────────────────
150
+ # Internal Logic
151
+ # ─────────────────────────────────────────────
152
+
153
+ def _process_step(
154
+ self,
155
+ action: Action,
156
+ file_id: str,
157
+ size_mb: float,
158
+ is_viral: bool,
159
+ cache_hit: bool,
160
+ ) -> Reward:
161
+ hit_bonus = 0.0
162
+ eviction_penalty = 0.0
163
+ thrash_penalty = 0.0
164
+ bandwidth_saved = 0.0
165
+ wasted_penalty = 0.0
166
+
167
+ if cache_hit:
168
+ self._hits += 1
169
+ self._recent_hits.append(True)
170
+ hit_bonus = 1.0 + (0.5 if is_viral else 0.0) # viral hits worth more
171
+ bandwidth_saved = size_mb * 0.01 # normalized
172
+ self._total_bandwidth_saved += size_mb
173
+ # Update frequency
174
+ entry = self._cache[file_id]
175
+ entry.request_frequency = min(entry.request_frequency + 1, 50)
176
+ entry.last_accessed = self._step
177
+ else:
178
+ self._misses += 1
179
+ self._recent_hits.append(False)
180
+
181
+ # Try to insert new file
182
+ if self._cache_used_mb + size_mb <= self.config.cache_capacity_mb:
183
+ # Fits without eviction
184
+ self._insert_file(file_id, size_mb, is_viral)
185
+ else:
186
+ # Need to evict
187
+ if action.evict_file_id and action.evict_file_id in self._cache:
188
+ evicted = self._cache[action.evict_file_id]
189
+
190
+ # Penalize evicting high-frequency files
191
+ if evicted.request_frequency > 10:
192
+ eviction_penalty -= 0.3
193
+ if evicted.is_viral:
194
+ eviction_penalty -= 0.2
195
+
196
+ # Thrash penalty: evicted and re-requested soon
197
+ if action.evict_file_id == self._last_evicted:
198
+ thrash_penalty = -0.5
199
+
200
+ self._eviction_counts[action.evict_file_id] += 1
201
+ self._remove_file(action.evict_file_id)
202
+ self._last_evicted = action.evict_file_id
203
+
204
+ if self._cache_used_mb + size_mb <= self.config.cache_capacity_mb:
205
+ self._insert_file(file_id, size_mb, is_viral)
206
+ else:
207
+ # No valid eviction action β€” wasted capacity penalty
208
+ wasted_penalty = -0.2
209
+
210
+ # Wasted capacity: cache too empty when we could be caching
211
+ fill_ratio = self._cache_used_mb / self.config.cache_capacity_mb
212
+ if fill_ratio < 0.3 and self._step > 10:
213
+ wasted_penalty -= 0.1
214
+
215
+ # Keep recent_hits window at 20
216
+ if len(self._recent_hits) > 20:
217
+ self._recent_hits.pop(0)
218
+
219
+ total = hit_bonus + eviction_penalty + thrash_penalty + bandwidth_saved + wasted_penalty
220
+ return Reward(
221
+ total=round(total, 4),
222
+ cache_hit_bonus=hit_bonus,
223
+ eviction_penalty=eviction_penalty,
224
+ thrash_penalty=thrash_penalty,
225
+ bandwidth_saved=bandwidth_saved,
226
+ wasted_capacity_penalty=wasted_penalty,
227
+ )
228
+
229
+ def _insert_file(self, file_id: str, size_mb: float, is_viral: bool):
230
+ self._cache[file_id] = FileEntry(
231
+ file_id=file_id,
232
+ size_mb=size_mb,
233
+ request_frequency=1.0,
234
+ is_viral=is_viral,
235
+ last_accessed=self._step,
236
+ )
237
+ self._cache_used_mb += size_mb
238
+
239
+ def _remove_file(self, file_id: str):
240
+ if file_id in self._cache:
241
+ self._cache_used_mb -= self._cache[file_id].size_mb
242
+ self._cache_used_mb = max(0.0, self._cache_used_mb)
243
+ del self._cache[file_id]
244
+
245
+ def _make_observation(self, cache_hit: bool) -> Observation:
246
+ file_id, size_mb, is_viral = self.traffic.get_request(self._step)
247
+ preview = self.traffic.get_preview(self._step)
248
+ recent_hit_rate = (
249
+ sum(self._recent_hits) / len(self._recent_hits)
250
+ if self._recent_hits else 0.0
251
+ )
252
+ fill = self._cache_used_mb / self.config.cache_capacity_mb
253
+ return Observation(
254
+ step=self._step,
255
+ cache_used_mb=round(self._cache_used_mb, 2),
256
+ cache_capacity_mb=self.config.cache_capacity_mb,
257
+ cache_fill_ratio=round(fill, 4),
258
+ cached_files=list(self._cache.values()),
259
+ incoming_file_id=file_id,
260
+ incoming_file_size_mb=size_mb,
261
+ incoming_file_is_viral=is_viral,
262
+ cache_hit=cache_hit,
263
+ recent_hit_rate=round(recent_hit_rate, 4),
264
+ time_of_day=round(self.traffic.time_of_day(self._step), 4),
265
+ queue_preview=preview,
266
+ )
env/graders.py ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Deterministic graders for all 3 tasks.
3
+ Each grader runs a full episode and returns a score in [0.0, 1.0].
4
+ """
5
+
6
+ from typing import Callable, Dict, List
7
+ from env.cache import CDNCacheEnv, TASK_CONFIGS
8
+ from env.models import Action, Observation
9
+
10
+
11
+ GraderPolicy = Callable[[Observation], Action]
12
+
13
+
14
+ def _run_episode(task_id: str, policy: GraderPolicy, seed: int = 42) -> Dict:
15
+ """Run one full episode with a given policy. Returns stats dict."""
16
+ env = CDNCacheEnv(task_id=task_id, seed=seed)
17
+ obs = env.reset()
18
+ total_reward = 0.0
19
+ steps = 0
20
+
21
+ while True:
22
+ action = policy(obs)
23
+ result = env.step(action)
24
+ total_reward += result.reward.total
25
+ obs = result.observation
26
+ steps += 1
27
+ if result.done:
28
+ break
29
+
30
+ state = env.state()
31
+ return {
32
+ "hit_rate": state["hit_rate"],
33
+ "total_reward": total_reward,
34
+ "bandwidth_saved_mb": state["bandwidth_saved_mb"],
35
+ "steps": steps,
36
+ "hits": state["hits"],
37
+ "misses": state["misses"],
38
+ }
39
+
40
+
41
+ # ─────────────────────────────────────────────
42
+ # Built-in Policies (for baseline + grading)
43
+ # ─────────────────────────────────────────────
44
+
45
+ def lru_policy(obs: Observation) -> Action:
46
+ """Evict Least Recently Used file."""
47
+ if not obs.cached_files:
48
+ return Action(evict_file_id=None)
49
+ lru = min(obs.cached_files, key=lambda f: f.last_accessed)
50
+ return Action(evict_file_id=lru.file_id)
51
+
52
+
53
+ def lfu_policy(obs: Observation) -> Action:
54
+ """Evict Least Frequently Used file."""
55
+ if not obs.cached_files:
56
+ return Action(evict_file_id=None)
57
+ lfu = min(obs.cached_files, key=lambda f: f.request_frequency)
58
+ return Action(evict_file_id=lfu.file_id)
59
+
60
+
61
+ def smart_policy(obs: Observation) -> Action:
62
+ """
63
+ Smarter policy:
64
+ - Never evict viral files
65
+ - Evict the lowest-frequency, largest file (wastes least value, frees most space)
66
+ """
67
+ if not obs.cached_files:
68
+ return Action(evict_file_id=None)
69
+
70
+ # Filter out viral files from eviction candidates
71
+ candidates = [f for f in obs.cached_files if not f.is_viral]
72
+ if not candidates:
73
+ candidates = obs.cached_files # fallback: evict anything
74
+
75
+ # Score: low frequency = good eviction, large size = good eviction
76
+ def eviction_score(f):
77
+ return -f.request_frequency + f.size_mb * 0.1
78
+
79
+ best = max(candidates, key=eviction_score)
80
+ return Action(evict_file_id=best.file_id)
81
+
82
+
83
+ def no_op_policy(obs: Observation) -> Action:
84
+ """Never evict anything (baseline floor)."""
85
+ return Action(evict_file_id=None)
86
+
87
+
88
+ # ─────────────────────────────────────────────
89
+ # Grader Functions
90
+ # ─────────────────────────────────────────────
91
+
92
+ def grade_task_easy(policy: GraderPolicy, seed: int = 42) -> float:
93
+ """
94
+ Easy: steady traffic, 100MB cache.
95
+ Score based purely on hit rate.
96
+ >= 0.60 hit rate = 1.0, scales down to 0.0.
97
+ """
98
+ stats = _run_episode("task_easy", policy, seed)
99
+ hit_rate = stats["hit_rate"]
100
+
101
+ # Linear scale: 0.0 hit_rate -> 0.0 score, 0.60+ -> 1.0
102
+ score = min(1.0, hit_rate / 0.60)
103
+ return round(score, 4)
104
+
105
+
106
+ def grade_task_medium(policy: GraderPolicy, seed: int = 42) -> float:
107
+ """
108
+ Medium: mixed traffic, viral files.
109
+ Score = weighted combo of hit rate + bandwidth saved.
110
+ """
111
+ stats = _run_episode("task_medium", policy, seed)
112
+ hit_rate = stats["hit_rate"]
113
+ bandwidth = stats["bandwidth_saved_mb"]
114
+
115
+ # Normalize bandwidth: assume 500MB = perfect
116
+ bw_score = min(1.0, bandwidth / 500.0)
117
+
118
+ # Hit rate: 0.55 = 1.0
119
+ hr_score = min(1.0, hit_rate / 0.55)
120
+
121
+ # 70% hit rate, 30% bandwidth
122
+ score = 0.70 * hr_score + 0.30 * bw_score
123
+ return round(score, 4)
124
+
125
+
126
+ def grade_task_hard(policy: GraderPolicy, seed: int = 42) -> float:
127
+ """
128
+ Hard: constrained cache, many viral bursts.
129
+ Score = hit rate + bandwidth + thrash avoidance.
130
+ """
131
+ stats = _run_episode("task_hard", policy, seed)
132
+ hit_rate = stats["hit_rate"]
133
+ bandwidth = stats["bandwidth_saved_mb"]
134
+ total_reward = stats["total_reward"]
135
+
136
+ # Hit rate target: 0.45 = 1.0
137
+ hr_score = min(1.0, hit_rate / 0.45)
138
+
139
+ # Bandwidth: 400MB = 1.0
140
+ bw_score = min(1.0, bandwidth / 400.0)
141
+
142
+ # Reward signal (captures thrash penalties implicitly)
143
+ # Normalize: 200 reward = 1.0
144
+ rw_score = max(0.0, min(1.0, total_reward / 200.0))
145
+
146
+ # 50% hit rate, 25% bandwidth, 25% reward quality
147
+ score = 0.50 * hr_score + 0.25 * bw_score + 0.25 * rw_score
148
+ return round(score, 4)
149
+
150
+
151
+ # ────────────────────────���────────────────────
152
+ # Master Grader
153
+ # ─────────────────────────────────────────────
154
+
155
+ def run_all_graders(policy: GraderPolicy, seed: int = 42) -> Dict:
156
+ """Run all 3 graders and return scores + summary."""
157
+ easy = grade_task_easy(policy, seed)
158
+ medium = grade_task_medium(policy, seed)
159
+ hard = grade_task_hard(policy, seed)
160
+ overall = round((easy + medium + hard) / 3, 4)
161
+
162
+ return {
163
+ "task_easy": easy,
164
+ "task_medium": medium,
165
+ "task_hard": hard,
166
+ "overall": overall,
167
+ "all_in_range": all(0.0 <= s <= 1.0 for s in [easy, medium, hard]),
168
+ }
169
+
170
+
171
+ if __name__ == "__main__":
172
+ print("=== Running Grader Validation ===\n")
173
+
174
+ policies = {
175
+ "no_op": no_op_policy,
176
+ "lru": lru_policy,
177
+ "lfu": lfu_policy,
178
+ "smart": smart_policy,
179
+ }
180
+
181
+ for name, policy in policies.items():
182
+ results = run_all_graders(policy)
183
+ print(f"Policy: {name}")
184
+ print(f" Easy: {results['task_easy']}")
185
+ print(f" Medium: {results['task_medium']}")
186
+ print(f" Hard: {results['task_hard']}")
187
+ print(f" Overall:{results['overall']}")
188
+ print(f" Valid: {results['all_in_range']}\n")
env/models.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Typed Pydantic models for the CDN Cache Optimizer environment.
3
+ Implements OpenEnv spec: Observation, Action, Reward.
4
+ """
5
+
6
+ from pydantic import BaseModel, Field
7
+ from typing import List, Optional, Dict
8
+
9
+
10
+ class FileEntry(BaseModel):
11
+ """Represents a file currently in the cache."""
12
+ file_id: str
13
+ size_mb: float
14
+ request_frequency: float # requests per last N steps
15
+ is_viral: bool
16
+ last_accessed: int # step number
17
+
18
+
19
+ class Observation(BaseModel):
20
+ """What the agent sees at each step."""
21
+ step: int
22
+ cache_used_mb: float
23
+ cache_capacity_mb: float
24
+ cache_fill_ratio: float
25
+ cached_files: List[FileEntry]
26
+ incoming_file_id: str
27
+ incoming_file_size_mb: float
28
+ incoming_file_is_viral: bool
29
+ cache_hit: bool # was incoming_file already cached?
30
+ recent_hit_rate: float # rolling hit rate last 20 steps
31
+ time_of_day: float # 0.0 to 1.0 (normalized)
32
+ queue_preview: List[str] # next 3 file_ids coming
33
+
34
+
35
+ class Action(BaseModel):
36
+ """What the agent decides to do."""
37
+ evict_file_id: Optional[str] = None # None = do nothing / already cached
38
+
39
+
40
+ class Reward(BaseModel):
41
+ """Reward breakdown for transparency."""
42
+ total: float
43
+ cache_hit_bonus: float
44
+ eviction_penalty: float
45
+ thrash_penalty: float
46
+ bandwidth_saved: float
47
+ wasted_capacity_penalty: float
48
+
49
+
50
+ class StepResult(BaseModel):
51
+ """Full result returned by step()."""
52
+ observation: Observation
53
+ reward: Reward
54
+ done: bool
55
+ info: Dict
56
+
57
+
58
+ class TaskConfig(BaseModel):
59
+ """Configuration for a specific task."""
60
+ task_id: str
61
+ name: str
62
+ difficulty: str
63
+ cache_capacity_mb: float
64
+ num_files: int
65
+ viral_ratio: float
66
+ episode_length: int
67
+ description: str
env/traffic.py ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Traffic generator for CDN Cache Optimizer.
3
+ Simulates realistic web traffic: steady files + viral bursts.
4
+ """
5
+
6
+ import random
7
+ import math
8
+ from dataclasses import dataclass, field
9
+ from typing import List, Tuple
10
+
11
+
12
+ @dataclass
13
+ class FileProfile:
14
+ file_id: str
15
+ size_mb: float
16
+ base_popularity: float # base request probability
17
+ is_viral: bool = False
18
+ viral_start: int = -1
19
+ viral_duration: int = 0
20
+ viral_peak: float = 0.0
21
+
22
+
23
+ class TrafficGenerator:
24
+ """
25
+ Generates a stream of file requests.
26
+ - Steady files: consistent low-level demand
27
+ - Viral files: spike suddenly, dominate for a window, then die
28
+ """
29
+
30
+ def __init__(
31
+ self,
32
+ num_files: int = 50,
33
+ viral_ratio: float = 0.2,
34
+ episode_length: int = 200,
35
+ seed: int = 42,
36
+ ):
37
+ self.num_files = num_files
38
+ self.viral_ratio = viral_ratio
39
+ self.episode_length = episode_length
40
+ self.rng = random.Random(seed)
41
+ self.files: List[FileProfile] = []
42
+ self.request_log: List[str] = [] # precomputed episode
43
+ self._build_file_profiles()
44
+ self._precompute_requests()
45
+
46
+ def _build_file_profiles(self):
47
+ num_viral = max(1, int(self.num_files * self.viral_ratio))
48
+ for i in range(self.num_files):
49
+ fid = f"file_{i:03d}"
50
+ size = round(self.rng.uniform(1.0, 20.0), 1)
51
+ is_viral = i < num_viral
52
+
53
+ if is_viral:
54
+ viral_start = self.rng.randint(
55
+ 5, max(6, self.episode_length - 30)
56
+ )
57
+ viral_duration = self.rng.randint(10, 30)
58
+ viral_peak = self.rng.uniform(0.4, 0.8)
59
+ base_pop = self.rng.uniform(0.01, 0.05)
60
+ self.files.append(FileProfile(
61
+ file_id=fid,
62
+ size_mb=size,
63
+ base_popularity=base_pop,
64
+ is_viral=True,
65
+ viral_start=viral_start,
66
+ viral_duration=viral_duration,
67
+ viral_peak=viral_peak,
68
+ ))
69
+ else:
70
+ base_pop = self.rng.uniform(0.02, 0.15)
71
+ self.files.append(FileProfile(
72
+ file_id=fid,
73
+ size_mb=size,
74
+ base_popularity=base_pop,
75
+ ))
76
+
77
+ def _get_popularity_at_step(self, fp: FileProfile, step: int) -> float:
78
+ if not fp.is_viral:
79
+ # Steady with slight daily cycle
80
+ cycle = 0.3 * math.sin(2 * math.pi * step / 50)
81
+ return max(0.001, fp.base_popularity + cycle * fp.base_popularity)
82
+
83
+ # Viral: bell curve spike
84
+ if step < fp.viral_start or step > fp.viral_start + fp.viral_duration:
85
+ return fp.base_popularity
86
+ center = fp.viral_start + fp.viral_duration / 2
87
+ spread = fp.viral_duration / 4
88
+ spike = fp.viral_peak * math.exp(-((step - center) ** 2) / (2 * spread ** 2))
89
+ return fp.base_popularity + spike
90
+
91
+ def _precompute_requests(self):
92
+ self.request_log = []
93
+ for step in range(self.episode_length):
94
+ weights = [
95
+ self._get_popularity_at_step(fp, step) for fp in self.files
96
+ ]
97
+ total = sum(weights)
98
+ norm = [w / total for w in weights]
99
+ chosen = self.rng.choices(self.files, weights=norm, k=1)[0]
100
+ self.request_log.append(chosen.file_id)
101
+
102
+ def get_request(self, step: int) -> Tuple[str, float, bool]:
103
+ """Returns (file_id, size_mb, is_viral) for a given step."""
104
+ if step >= len(self.request_log):
105
+ return self.request_log[-1], 1.0, False
106
+ fid = self.request_log[step]
107
+ fp = next(f for f in self.files if f.file_id == fid)
108
+ return fid, fp.size_mb, fp.is_viral
109
+
110
+ def get_preview(self, step: int, n: int = 3) -> List[str]:
111
+ """Peek at next n file_ids (simulates prefetch hints)."""
112
+ return self.request_log[step + 1: step + 1 + n]
113
+
114
+ def get_file_profile(self, file_id: str) -> FileProfile:
115
+ return next((f for f in self.files if f.file_id == file_id), None)
116
+
117
+ def time_of_day(self, step: int) -> float:
118
+ """Normalized 0.0–1.0 cycle."""
119
+ return (step % 50) / 50.0
inference.py ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ inference.py - CDN Cache Optimizer Baseline Agent
3
+ Uses OpenAI client to run an LLM agent against the environment.
4
+ Emits structured [START], [STEP], [END] logs to stdout.
5
+
6
+ Required env vars:
7
+ API_BASE_URL - LLM API endpoint
8
+ MODEL_NAME - model identifier
9
+ HF_TOKEN - Hugging Face / API key
10
+ """
11
+
12
+ import os
13
+ import sys
14
+ import json
15
+ import time
16
+ import requests
17
+ from openai import OpenAI
18
+ from env.cache import CDNCacheEnv, TASK_CONFIGS
19
+ from env.models import Action, Observation
20
+
21
+ # ─────────────────────────────────────────────
22
+ # Config from environment
23
+ # ─────────────────────────────────────────────
24
+ API_BASE_URL = os.environ.get("API_BASE_URL", "https://api.openai.com/v1")
25
+ MODEL_NAME = os.environ.get("MODEL_NAME", "gpt-4o-mini")
26
+ HF_TOKEN = os.environ.get("HF_TOKEN", "")
27
+
28
+ if not HF_TOKEN:
29
+ print("[WARN] HF_TOKEN not set. Using API_BASE_URL without auth header override.")
30
+
31
+ client = OpenAI(
32
+ base_url=API_BASE_URL,
33
+ api_key=HF_TOKEN or "placeholder",
34
+ )
35
+
36
+ TASKS = ["task_easy", "task_medium", "task_hard"]
37
+ SEED = 42
38
+
39
+ # ─────────────────────────────────────────────
40
+ # LLM Agent
41
+ # ─────────────────────────────────────────────
42
+
43
+ SYSTEM_PROMPT = """You are an intelligent CDN cache management agent.
44
+
45
+ At each step you receive the current cache state and an incoming file request.
46
+ Your job: decide which file to evict (if any) to make room for new content.
47
+
48
+ Rules:
49
+ - Only evict a file if the cache is nearly full and the incoming file is NOT already cached
50
+ - Prefer evicting files with LOW request_frequency and NOT viral
51
+ - Never evict a file that was just evicted (cache thrashing)
52
+ - If cache has space, respond with null (no eviction needed)
53
+
54
+ You MUST respond with ONLY valid JSON in this exact format:
55
+ {"evict_file_id": "<file_id>" or null}
56
+
57
+ No explanation. No markdown. Only the JSON object."""
58
+
59
+
60
+ def build_user_prompt(obs: Observation) -> str:
61
+ cached_summary = []
62
+ for f in obs.cached_files:
63
+ cached_summary.append(
64
+ f" - {f.file_id}: size={f.size_mb}MB freq={f.request_frequency:.1f} "
65
+ f"viral={f.is_viral} last_accessed=step_{f.last_accessed}"
66
+ )
67
+ cached_str = "\n".join(cached_summary) if cached_summary else " (empty)"
68
+
69
+ space_needed = obs.incoming_file_size_mb
70
+ space_free = obs.cache_capacity_mb - obs.cache_used_mb
71
+
72
+ return f"""Step {obs.step} | Time of day: {obs.time_of_day:.2f} | Hit rate: {obs.recent_hit_rate:.2f}
73
+
74
+ Cache: {obs.cache_used_mb:.1f}MB / {obs.cache_capacity_mb:.1f}MB used ({obs.cache_fill_ratio*100:.1f}% full)
75
+ Free space: {space_free:.1f}MB
76
+
77
+ Incoming request:
78
+ file_id: {obs.incoming_file_id}
79
+ size: {obs.incoming_file_size_mb}MB
80
+ viral: {obs.incoming_file_is_viral}
81
+ already_cached: {obs.cache_hit}
82
+ space_needed_to_cache: {"none (fits)" if space_free >= space_needed else f"{space_needed - space_free:.1f}MB deficit"}
83
+
84
+ Next 3 requests preview: {obs.queue_preview}
85
+
86
+ Currently cached files ({len(obs.cached_files)} files):
87
+ {cached_str}
88
+
89
+ Decide: which file to evict? (null if no eviction needed)"""
90
+
91
+
92
+ def llm_action(obs: Observation, step_num: int) -> Action:
93
+ """Call LLM and parse action. Fall back to LRU on failure."""
94
+ prompt = build_user_prompt(obs)
95
+ try:
96
+ response = client.chat.completions.create(
97
+ model=MODEL_NAME,
98
+ messages=[
99
+ {"role": "system", "content": SYSTEM_PROMPT},
100
+ {"role": "user", "content": prompt},
101
+ ],
102
+ max_tokens=50,
103
+ temperature=0.0,
104
+ )
105
+ raw = response.choices[0].message.content.strip()
106
+ parsed = json.loads(raw)
107
+ return Action(evict_file_id=parsed.get("evict_file_id"))
108
+ except Exception as e:
109
+ # Fallback: LRU
110
+ if obs.cached_files:
111
+ lru = min(obs.cached_files, key=lambda f: f.last_accessed)
112
+ return Action(evict_file_id=lru.file_id)
113
+ return Action(evict_file_id=None)
114
+
115
+
116
+ # ─────────────────────────────────────────────
117
+ # Run one task episode
118
+ # ─────────────────────────────────────────────
119
+
120
+ def run_task(task_id: str) -> dict:
121
+ config = TASK_CONFIGS[task_id]
122
+ env = CDNCacheEnv(task_id=task_id, seed=SEED)
123
+ obs = env.reset()
124
+
125
+ total_reward = 0.0
126
+ step_num = 0
127
+
128
+ # ── [START] ──
129
+ print(json.dumps({
130
+ "type": "START",
131
+ "task_id": task_id,
132
+ "task_name": config.name,
133
+ "difficulty": config.difficulty,
134
+ "episode_length": config.episode_length,
135
+ "cache_capacity_mb": config.cache_capacity_mb,
136
+ "model": MODEL_NAME,
137
+ "seed": SEED,
138
+ }))
139
+ sys.stdout.flush()
140
+
141
+ while True:
142
+ action = llm_action(obs, step_num)
143
+ result = env.step(action)
144
+
145
+ total_reward += result.reward.total
146
+
147
+ # ── [STEP] ──
148
+ print(json.dumps({
149
+ "type": "STEP",
150
+ "task_id": task_id,
151
+ "step": step_num,
152
+ "action": {"evict_file_id": action.evict_file_id},
153
+ "cache_hit": result.observation.cache_hit,
154
+ "reward": result.reward.total,
155
+ "reward_breakdown": {
156
+ "cache_hit_bonus": result.reward.cache_hit_bonus,
157
+ "eviction_penalty": result.reward.eviction_penalty,
158
+ "thrash_penalty": result.reward.thrash_penalty,
159
+ "bandwidth_saved": result.reward.bandwidth_saved,
160
+ "wasted_capacity_penalty": result.reward.wasted_capacity_penalty,
161
+ },
162
+ "cumulative_reward": round(total_reward, 4),
163
+ "hit_rate": result.observation.recent_hit_rate,
164
+ "cache_fill": result.observation.cache_fill_ratio,
165
+ "done": result.done,
166
+ }))
167
+ sys.stdout.flush()
168
+
169
+ obs = result.observation
170
+ step_num += 1
171
+
172
+ if result.done:
173
+ break
174
+
175
+ final_state = env.state()
176
+ final_hit_rate = final_state["hit_rate"]
177
+
178
+ # ── [END] ──
179
+ print(json.dumps({
180
+ "type": "END",
181
+ "task_id": task_id,
182
+ "task_name": config.name,
183
+ "total_steps": step_num,
184
+ "total_reward": round(total_reward, 4),
185
+ "final_hit_rate": round(final_hit_rate, 4),
186
+ "bandwidth_saved_mb": round(final_state["bandwidth_saved_mb"], 2),
187
+ "total_hits": final_state["hits"],
188
+ "total_misses": final_state["misses"],
189
+ "score": round(min(1.0, final_hit_rate / {"task_easy": 0.60, "task_medium": 0.55, "task_hard": 0.45}[task_id]), 4),
190
+ }))
191
+ sys.stdout.flush()
192
+
193
+ return {
194
+ "task_id": task_id,
195
+ "total_reward": round(total_reward, 4),
196
+ "final_hit_rate": round(final_hit_rate, 4),
197
+ "score": round(min(1.0, final_hit_rate / {"task_easy": 0.60, "task_medium": 0.55, "task_hard": 0.45}[task_id]), 4),
198
+ }
199
+
200
+
201
+ # ─────────────────────────────────────────────
202
+ # Main
203
+ # ─────────────────────────────────────────────
204
+
205
+ if __name__ == "__main__":
206
+ print(f"[INFO] Starting CDN Cache Optimizer inference", file=sys.stderr)
207
+ print(f"[INFO] Model: {MODEL_NAME} | API: {API_BASE_URL}", file=sys.stderr)
208
+
209
+ results = []
210
+ for task_id in TASKS:
211
+ print(f"\n[INFO] Running {task_id}...", file=sys.stderr)
212
+ r = run_task(task_id)
213
+ results.append(r)
214
+ print(f"[INFO] {task_id} done | score={r['score']} hit_rate={r['final_hit_rate']}", file=sys.stderr)
215
+
216
+ print("\n[INFO] === FINAL RESULTS ===", file=sys.stderr)
217
+ for r in results:
218
+ print(f"[INFO] {r['task_id']}: score={r['score']} reward={r['total_reward']}", file=sys.stderr)
219
+
220
+ overall = round(sum(r["score"] for r in results) / len(results), 4)
221
+ print(f"[INFO] Overall score: {overall}", file=sys.stderr)
openenv.yaml ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: cdn-cache-optimizer
2
+ version: "1.0.0"
3
+ description: >
4
+ Edge CDN Cache Optimizer β€” an RL environment where an agent manages
5
+ a content delivery network cache. The agent decides which files to evict
6
+ when the cache is full, balancing hit rate, bandwidth efficiency, and
7
+ avoiding cache thrashing. Simulates real-world viral traffic spikes
8
+ alongside steady baseline demand.
9
+
10
+ author: umar
11
+ tags:
12
+ - openenv
13
+ - cdn
14
+ - cache
15
+ - infrastructure
16
+ - real-world
17
+
18
+ tasks:
19
+ - id: task_easy
20
+ name: Steady Traffic Cache
21
+ difficulty: easy
22
+ episode_length: 100
23
+ cache_capacity_mb: 100.0
24
+
25
+ - id: task_medium
26
+ name: Mixed Traffic Cache
27
+ difficulty: medium
28
+ episode_length: 150
29
+ cache_capacity_mb: 80.0
30
+
31
+ - id: task_hard
32
+ name: Constrained Cache with Viral Bursts
33
+ difficulty: hard
34
+ episode_length: 200
35
+ cache_capacity_mb: 50.0
36
+
37
+ observation_space:
38
+ type: structured
39
+ fields:
40
+ - step: int
41
+ - cache_used_mb: float
42
+ - cache_capacity_mb: float
43
+ - cache_fill_ratio: float
44
+ - cached_files: list[FileEntry]
45
+ - incoming_file_id: str
46
+ - incoming_file_size_mb: float
47
+ - incoming_file_is_viral: bool
48
+ - cache_hit: bool
49
+ - recent_hit_rate: float
50
+ - time_of_day: float
51
+ - queue_preview: list[str]
52
+
53
+ action_space:
54
+ type: structured
55
+ fields:
56
+ - evict_file_id: str | null
57
+
58
+ reward_range: [-1.0, 1.5]
59
+
60
+ endpoints:
61
+ reset: POST /reset
62
+ step: POST /step
63
+ state: GET /state
64
+
65
+ runtime:
66
+ framework: fastapi
67
+ python: "3.11"
68
+ port: 7860
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ fastapi==0.111.0
2
+ uvicorn==0.29.0
3
+ pydantic==2.7.1
4
+ openai==1.30.1
5
+ requests==2.31.0
6
+ python-multipart==0.0.9