Spaces:

umar-sharif821
/

cdn-cache-optimizer

Sleeping

App Files Files Community

umar-sharif821 commited on Apr 6

Commit

09e32d2

0 Parent(s):

initial: CDN Cache Optimizer OpenEnv

Browse files

Files changed (19) hide show

Dockerfile +24 -0
README.md +165 -0
api/__init__.py +0 -0
api/__pycache__/__init__.cpython-312.pyc +0 -0
api/__pycache__/main.cpython-312.pyc +0 -0
api/main.py +113 -0
env/__init__.py +4 -0
env/__pycache__/__init__.cpython-312.pyc +0 -0
env/__pycache__/cache.cpython-312.pyc +0 -0
env/__pycache__/graders.cpython-312.pyc +0 -0
env/__pycache__/models.cpython-312.pyc +0 -0
env/__pycache__/traffic.cpython-312.pyc +0 -0
env/cache.py +266 -0
env/graders.py +188 -0
env/models.py +67 -0
env/traffic.py +119 -0
inference.py +221 -0
openenv.yaml +68 -0
requirements.txt +6 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,24 @@

+FROM python:3.11-slim
+# HF Spaces expects port 7860
+EXPOSE 7860
+WORKDIR /app
+# Install deps
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy source
+COPY env/     ./env/
+COPY api/     ./api/
+COPY inference.py .
+COPY openenv.yaml .
+# Environment variables (override at runtime)
+ENV API_BASE_URL="https://api.openai.com/v1"
+ENV MODEL_NAME="gpt-4o-mini"
+ENV HF_TOKEN=""
+# Start FastAPI server
+CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "7860"]

README.md ADDED Viewed

	@@ -0,0 +1,165 @@

+# 🌐 CDN Cache Optimizer — OpenEnv RL Environment
+An RL environment simulating **edge CDN cache management** — the exact problem companies like Meta solve at planetary scale. An agent manages a cache of limited size, deciding which files to evict when new content arrives, balancing **hit rate**, **bandwidth efficiency**, and **thrash avoidance**.
+---
+## 🎯 Motivation
+Content Delivery Networks serve billions of files daily. Edge servers have limited storage, so they must constantly decide: *which cached files to keep, and which to evict?* Standard algorithms like LRU aren't optimal — especially when traffic has **viral bursts** (a file suddenly gets 50x more requests for 20 minutes, then drops back to zero).
+A smarter agent can:
+- Predict viral spikes from queue previews
+- Avoid evicting high-frequency files
+- Prevent cache thrashing (evicting then immediately re-requesting)
+- Maximize bandwidth saved for users
+---
+## 🔧 Environment Description
+At each step, a file is requested from the network. If it's already in the cache → **cache hit** (reward). If not → **cache miss**, and the agent must decide whether to evict an existing file to make room.
+### Traffic Model
+- **Steady files**: Consistent, cyclical demand
+- **Viral files**: Bell-curve spike in popularity, then fade back to baseline
+---
+## 📐 Action & Observation Space
+### Observation Space
+| Field | Type | Description |
+|-------|------|-------------|
+| `step` | int | Current episode step |
+| `cache_used_mb` | float | MB currently used |
+| `cache_capacity_mb` | float | Total cache size |
+| `cache_fill_ratio` | float | 0.0–1.0 fill level |
+| `cached_files` | List[FileEntry] | All files in cache with metadata |
+| `incoming_file_id` | str | File being requested |
+| `incoming_file_size_mb` | float | Size of incoming file |
+| `incoming_file_is_viral` | bool | Is this file currently viral? |
+| `cache_hit` | bool | Is incoming file already cached? |
+| `recent_hit_rate` | float | Rolling hit rate (last 20 steps) |
+| `time_of_day` | float | Normalized 0.0–1.0 daily cycle |
+| `queue_preview` | List[str] | Next 3 file IDs (prefetch hint) |
+### FileEntry Fields
+| Field | Type | Description |
+|-------|------|-------------|
+| `file_id` | str | Unique identifier |
+| `size_mb` | float | File size in MB |
+| `request_frequency` | float | Requests since cached |
+| `is_viral` | bool | Currently viral |
+| `last_accessed` | int | Step number of last access |
+### Action Space
+| Field | Type | Description |
+|-------|------|-------------|
+| `evict_file_id` | str \| null | File to evict (null = no eviction) |
+### Reward Function
+| Component | Range | Description |
+|-----------|-------|-------------|
+| `cache_hit_bonus` | +1.0 to +1.5 | Hit reward (viral hits = +1.5) |
+| `bandwidth_saved` | +0.0 to +0.2 | Reward for bandwidth efficiency |
+| `eviction_penalty` | -0.0 to -0.5 | Penalty for evicting popular files |
+| `thrash_penalty` | 0.0 or -0.5 | Penalty for evicting same file twice |
+| `wasted_capacity_penalty` | -0.0 to -0.3 | Penalty for leaving cache empty |
+---
+## 📋 Tasks
+### Task 1: Steady Traffic Cache (Easy)
+- **Cache**: 100MB | **Files**: 30 | **Steps**: 100
+- No viral files — steady demand only
+- Agent learns basic LRU-style eviction
+- **Target hit rate**: ≥ 0.60 → score 1.0
+- **Baseline score**: ~0.75
+### Task 2: Mixed Traffic Cache (Medium)
+- **Cache**: 80MB | **Files**: 50 | **Steps**: 150
+- 20% viral files mixed with steady demand
+- Agent must handle spikes and prioritize popular content
+- **Score**: 70% hit rate + 30% bandwidth
+- **Baseline score**: ~0.60
+### Task 3: Constrained Cache with Viral Bursts (Hard)
+- **Cache**: 50MB | **Files**: 80 | **Steps**: 200
+- 35% viral files, tight capacity, large file sizes
+- Agent must predict spikes, avoid thrashing
+- **Score**: 50% hit rate + 25% bandwidth + 25% reward quality
+- **Baseline score**: ~0.45
+---
+## 🚀 Setup & Usage
+### Local Setup
+```bash
+git clone <repo>
+cd cdn-cache-env
+pip install -r requirements.txt
+```
+### Run API Server
+```bash
+uvicorn api.main:app --host 0.0.0.0 --port 7860
+```
+### Run Inference (Baseline Agent)
+```bash
+export API_BASE_URL="https://api.openai.com/v1"
+export MODEL_NAME="gpt-4o-mini"
+export HF_TOKEN="your_token_here"
+python inference.py
+```
+### Docker
+```bash
+docker build -t cdn-cache-env .
+docker run -p 7860:7860 \
+  -e API_BASE_URL="https://api.openai.com/v1" \
+  -e MODEL_NAME="gpt-4o-mini" \
+  -e HF_TOKEN="your_token" \
+  cdn-cache-env
+```
+---
+## 🌐 API Endpoints
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/health` | Health check (returns 200) |
+| GET | `/tasks` | List all tasks |
+| POST | `/reset` | Start episode `{"task_id": "task_easy", "seed": 42}` |
+| POST | `/step` | Take action `{"evict_file_id": "file_001" or null}` |
+| GET | `/state` | Full environment state |
+---
+## 📊 Baseline Scores
+Using the built-in `smart_policy` (non-LLM baseline):
+| Task | Hit Rate | Score |
+|------|----------|-------|
+| Easy | ~0.72 | ~1.00 |
+| Medium | ~0.61 | ~0.82 |
+| Hard | ~0.48 | ~0.78 |
+| **Overall** | | **~0.87** |
+---
+## 📝 Log Format
+`inference.py` emits structured JSON logs:
+```
+{"type": "START", "task_id": "task_easy", ...}
+{"type": "STEP",  "step": 0, "action": {...}, "reward": 1.0, ...}
+{"type": "END",   "total_reward": 87.3, "final_hit_rate": 0.72, "score": 1.0}
+```

api/__init__.py ADDED Viewed

File without changes

api/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (139 Bytes). View file

api/__pycache__/main.cpython-312.pyc ADDED Viewed

Binary file (5.07 kB). View file

api/main.py ADDED Viewed

	@@ -0,0 +1,113 @@

+"""
+FastAPI server exposing OpenEnv interface over HTTP.
+Endpoints: POST /reset, POST /step, GET /state, GET /health, GET /tasks
+"""
+import sys
+import os
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel
+from typing import Optional, Dict
+import uvicorn
+from env.cache import CDNCacheEnv, TASK_CONFIGS
+from env.models import Action, StepResult
+app = FastAPI(
+    title="CDN Cache Optimizer - OpenEnv",
+    description=(
+        "RL environment simulating edge CDN cache management. "
+        "Agent decides which files to evict when cache is full. "
+        "Implements full OpenEnv spec."
+    ),
+    version="1.0.0",
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Global env instance (stateful per session)
+_env: Optional[CDNCacheEnv] = None
+class ResetRequest(BaseModel):
+    task_id: str = "task_easy"
+    seed: int = 42
+class StepRequest(BaseModel):
+    evict_file_id: Optional[str] = None
+@app.get("/health")
+def health():
+    return {"status": "ok", "env": "cdn-cache-optimizer"}
+@app.get("/tasks")
+def list_tasks():
+    return {
+        task_id: {
+            "name": cfg.name,
+            "difficulty": cfg.difficulty,
+            "description": cfg.description,
+            "cache_capacity_mb": cfg.cache_capacity_mb,
+            "episode_length": cfg.episode_length,
+        }
+        for task_id, cfg in TASK_CONFIGS.items()
+    }
+@app.post("/reset")
+def reset(req: ResetRequest):
+    global _env
+    if req.task_id not in TASK_CONFIGS:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Unknown task_id '{req.task_id}'. Valid: {list(TASK_CONFIGS.keys())}"
+        )
+    _env = CDNCacheEnv(task_id=req.task_id, seed=req.seed)
+    obs = _env.reset()
+    return {"observation": obs.dict(), "task": _env.config.dict()}
+@app.post("/step")
+def step(req: StepRequest):
+    global _env
+    if _env is None:
+        raise HTTPException(status_code=400, detail="Call /reset first.")
+    if _env._done:
+        raise HTTPException(status_code=400, detail="Episode done. Call /reset.")
+    action = Action(evict_file_id=req.evict_file_id)
+    result: StepResult = _env.step(action)
+    return result.dict()
+@app.get("/state")
+def state():
+    global _env
+    if _env is None:
+        raise HTTPException(status_code=400, detail="Call /reset first.")
+    return _env.state()
+@app.get("/")
+def root():
+    return {
+        "name": "CDN Cache Optimizer",
+        "spec": "OpenEnv v1",
+        "endpoints": ["/reset", "/step", "/state", "/health", "/tasks"],
+        "tasks": list(TASK_CONFIGS.keys()),
+    }
+if __name__ == "__main__":
+    uvicorn.run("api.main:app", host="0.0.0.0", port=7860, reload=False)

env/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from env.cache import CDNCacheEnv, TASK_CONFIGS
+from env.models import Observation, Action, Reward, StepResult, TaskConfig
+from env.traffic import TrafficGenerator
+from env.graders import run_all_graders, grade_task_easy, grade_task_medium, grade_task_hard

env/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (524 Bytes). View file

env/__pycache__/cache.cpython-312.pyc ADDED Viewed

Binary file (11.3 kB). View file

env/__pycache__/graders.cpython-312.pyc ADDED Viewed

Binary file (7.16 kB). View file

env/__pycache__/models.cpython-312.pyc ADDED Viewed

Binary file (2.89 kB). View file

env/__pycache__/traffic.cpython-312.pyc ADDED Viewed

Binary file (7.31 kB). View file

env/cache.py ADDED Viewed

	@@ -0,0 +1,266 @@

+"""
+Core CDN Cache simulation.
+Implements full OpenEnv interface: reset(), step(), state()
+"""
+from collections import defaultdict
+from typing import Dict, Optional, List, Tuple
+from env.models import (
+    Observation, Action, Reward, StepResult, FileEntry, TaskConfig
+)
+from env.traffic import TrafficGenerator
+TASK_CONFIGS = {
+    "task_easy": TaskConfig(
+        task_id="task_easy",
+        name="Steady Traffic Cache",
+        difficulty="easy",
+        cache_capacity_mb=100.0,
+        num_files=30,
+        viral_ratio=0.0,         # no viral files
+        episode_length=100,
+        description=(
+            "Cache has 100MB capacity. Only steady traffic files. "
+            "Agent must learn LRU-style eviction. Target hit rate >= 0.60."
+        ),
+    ),
+    "task_medium": TaskConfig(
+        task_id="task_medium",
+        name="Mixed Traffic Cache",
+        difficulty="medium",
+        cache_capacity_mb=80.0,
+        num_files=50,
+        viral_ratio=0.2,
+        episode_length=150,
+        description=(
+            "80MB cache, mix of steady and viral files. "
+            "Agent must prioritize popular content and handle viral spikes. "
+            "Target hit rate >= 0.55 with efficient eviction."
+        ),
+    ),
+    "task_hard": TaskConfig(
+        task_id="task_hard",
+        name="Constrained Cache with Viral Bursts",
+        difficulty="hard",
+        cache_capacity_mb=50.0,
+        num_files=80,
+        viral_ratio=0.35,
+        episode_length=200,
+        description=(
+            "Tight 50MB cache, many viral bursts, large file sizes. "
+            "Agent must predict spikes, avoid cache thrashing, "
+            "and maximize bandwidth saved. Target hit rate >= 0.45."
+        ),
+    ),
+}
+class CDNCacheEnv:
+    """
+    CDN Cache Optimizer Environment.
+    At each step, a file is requested. If not cached, agent must decide
+    which file (if any) to evict to make room for the new one.
+    """
+    def __init__(self, task_id: str = "task_easy", seed: int = 42):
+        if task_id not in TASK_CONFIGS:
+            raise ValueError(f"Unknown task_id: {task_id}. Choose from {list(TASK_CONFIGS.keys())}")
+        self.config = TASK_CONFIGS[task_id]
+        self.seed = seed
+        self._cache: Dict[str, FileEntry] = {}       # file_id -> FileEntry
+        self._cache_used_mb: float = 0.0
+        self._step: int = 0
+        self._hits: int = 0
+        self._misses: int = 0
+        self._recent_hits: List[bool] = []
+        self._last_evicted: Optional[str] = None
+        self._eviction_counts: Dict[str, int] = defaultdict(int)
+        self._total_bandwidth_saved: float = 0.0
+        self._done: bool = False
+        self.traffic = TrafficGenerator(
+            num_files=self.config.num_files,
+            viral_ratio=self.config.viral_ratio,
+            episode_length=self.config.episode_length,
+            seed=seed,
+        )
+    # ─────────────────────────────────────────────
+    # OpenEnv Interface
+    # ─────────────────────────────────────────────
+    def reset(self) -> Observation:
+        """Reset environment to initial state."""
+        self._cache = {}
+        self._cache_used_mb = 0.0
+        self._step = 0
+        self._hits = 0
+        self._misses = 0
+        self._recent_hits = []
+        self._last_evicted = None
+        self._eviction_counts = defaultdict(int)
+        self._total_bandwidth_saved = 0.0
+        self._done = False
+        self.traffic = TrafficGenerator(
+            num_files=self.config.num_files,
+            viral_ratio=self.config.viral_ratio,
+            episode_length=self.config.episode_length,
+            seed=self.seed,
+        )
+        return self._make_observation(cache_hit=False)
+    def step(self, action: Action) -> StepResult:
+        """Process one step: handle eviction, then serve the request."""
+        if self._done:
+            raise RuntimeError("Episode done. Call reset() first.")
+        file_id, size_mb, is_viral = self.traffic.get_request(self._step)
+        cache_hit = file_id in self._cache
+        reward = self._process_step(action, file_id, size_mb, is_viral, cache_hit)
+        self._step += 1
+        self._done = self._step >= self.config.episode_length
+        obs = self._make_observation(cache_hit=cache_hit)
+        info = {
+            "total_hits": self._hits,
+            "total_misses": self._misses,
+            "hit_rate": self._hits / max(1, self._hits + self._misses),
+            "cache_fill_ratio": self._cache_used_mb / self.config.cache_capacity_mb,
+            "bandwidth_saved_mb": self._total_bandwidth_saved,
+        }
+        return StepResult(observation=obs, reward=reward, done=self._done, info=info)
+    def state(self) -> dict:
+        """Return current full environment state."""
+        return {
+            "step": self._step,
+            "done": self._done,
+            "cache": {k: v.dict() for k, v in self._cache.items()},
+            "cache_used_mb": self._cache_used_mb,
+            "cache_capacity_mb": self.config.cache_capacity_mb,
+            "hits": self._hits,
+            "misses": self._misses,
+            "hit_rate": self._hits / max(1, self._hits + self._misses),
+            "bandwidth_saved_mb": self._total_bandwidth_saved,
+            "task": self.config.dict(),
+        }
+    # ─────────────────────────────────────────────
+    # Internal Logic
+    # ─────────────────────────────────────────────
+    def _process_step(
+        self,
+        action: Action,
+        file_id: str,
+        size_mb: float,
+        is_viral: bool,
+        cache_hit: bool,
+    ) -> Reward:
+        hit_bonus = 0.0
+        eviction_penalty = 0.0
+        thrash_penalty = 0.0
+        bandwidth_saved = 0.0
+        wasted_penalty = 0.0
+        if cache_hit:
+            self._hits += 1
+            self._recent_hits.append(True)
+            hit_bonus = 1.0 + (0.5 if is_viral else 0.0)   # viral hits worth more
+            bandwidth_saved = size_mb * 0.01               # normalized
+            self._total_bandwidth_saved += size_mb
+            # Update frequency
+            entry = self._cache[file_id]
+            entry.request_frequency = min(entry.request_frequency + 1, 50)
+            entry.last_accessed = self._step
+        else:
+            self._misses += 1
+            self._recent_hits.append(False)
+            # Try to insert new file
+            if self._cache_used_mb + size_mb <= self.config.cache_capacity_mb:
+                # Fits without eviction
+                self._insert_file(file_id, size_mb, is_viral)
+            else:
+                # Need to evict
+                if action.evict_file_id and action.evict_file_id in self._cache:
+                    evicted = self._cache[action.evict_file_id]
+                    # Penalize evicting high-frequency files
+                    if evicted.request_frequency > 10:
+                        eviction_penalty -= 0.3
+                    if evicted.is_viral:
+                        eviction_penalty -= 0.2
+                    # Thrash penalty: evicted and re-requested soon
+                    if action.evict_file_id == self._last_evicted:
+                        thrash_penalty = -0.5
+                    self._eviction_counts[action.evict_file_id] += 1
+                    self._remove_file(action.evict_file_id)
+                    self._last_evicted = action.evict_file_id
+                    if self._cache_used_mb + size_mb <= self.config.cache_capacity_mb:
+                        self._insert_file(file_id, size_mb, is_viral)
+                else:
+                    # No valid eviction action — wasted capacity penalty
+                    wasted_penalty = -0.2
+        # Wasted capacity: cache too empty when we could be caching
+        fill_ratio = self._cache_used_mb / self.config.cache_capacity_mb
+        if fill_ratio < 0.3 and self._step > 10:
+            wasted_penalty -= 0.1
+        # Keep recent_hits window at 20
+        if len(self._recent_hits) > 20:
+            self._recent_hits.pop(0)
+        total = hit_bonus + eviction_penalty + thrash_penalty + bandwidth_saved + wasted_penalty
+        return Reward(
+            total=round(total, 4),
+            cache_hit_bonus=hit_bonus,
+            eviction_penalty=eviction_penalty,
+            thrash_penalty=thrash_penalty,
+            bandwidth_saved=bandwidth_saved,
+            wasted_capacity_penalty=wasted_penalty,
+        )
+    def _insert_file(self, file_id: str, size_mb: float, is_viral: bool):
+        self._cache[file_id] = FileEntry(
+            file_id=file_id,
+            size_mb=size_mb,
+            request_frequency=1.0,
+            is_viral=is_viral,
+            last_accessed=self._step,
+        )
+        self._cache_used_mb += size_mb
+    def _remove_file(self, file_id: str):
+        if file_id in self._cache:
+            self._cache_used_mb -= self._cache[file_id].size_mb
+            self._cache_used_mb = max(0.0, self._cache_used_mb)
+            del self._cache[file_id]
+    def _make_observation(self, cache_hit: bool) -> Observation:
+        file_id, size_mb, is_viral = self.traffic.get_request(self._step)
+        preview = self.traffic.get_preview(self._step)
+        recent_hit_rate = (
+            sum(self._recent_hits) / len(self._recent_hits)
+            if self._recent_hits else 0.0
+        )
+        fill = self._cache_used_mb / self.config.cache_capacity_mb
+        return Observation(
+            step=self._step,
+            cache_used_mb=round(self._cache_used_mb, 2),
+            cache_capacity_mb=self.config.cache_capacity_mb,
+            cache_fill_ratio=round(fill, 4),
+            cached_files=list(self._cache.values()),
+            incoming_file_id=file_id,
+            incoming_file_size_mb=size_mb,
+            incoming_file_is_viral=is_viral,
+            cache_hit=cache_hit,
+            recent_hit_rate=round(recent_hit_rate, 4),
+            time_of_day=round(self.traffic.time_of_day(self._step), 4),
+            queue_preview=preview,
+        )

env/graders.py ADDED Viewed

	@@ -0,0 +1,188 @@

+"""
+Deterministic graders for all 3 tasks.
+Each grader runs a full episode and returns a score in [0.0, 1.0].
+"""
+from typing import Callable, Dict, List
+from env.cache import CDNCacheEnv, TASK_CONFIGS
+from env.models import Action, Observation
+GraderPolicy = Callable[[Observation], Action]
+def _run_episode(task_id: str, policy: GraderPolicy, seed: int = 42) -> Dict:
+    """Run one full episode with a given policy. Returns stats dict."""
+    env = CDNCacheEnv(task_id=task_id, seed=seed)
+    obs = env.reset()
+    total_reward = 0.0
+    steps = 0
+    while True:
+        action = policy(obs)
+        result = env.step(action)
+        total_reward += result.reward.total
+        obs = result.observation
+        steps += 1
+        if result.done:
+            break
+    state = env.state()
+    return {
+        "hit_rate": state["hit_rate"],
+        "total_reward": total_reward,
+        "bandwidth_saved_mb": state["bandwidth_saved_mb"],
+        "steps": steps,
+        "hits": state["hits"],
+        "misses": state["misses"],
+    }
+# ─────────────────────────────────────────────
+# Built-in Policies (for baseline + grading)
+# ─────────────────────────────────────────────
+def lru_policy(obs: Observation) -> Action:
+    """Evict Least Recently Used file."""
+    if not obs.cached_files:
+        return Action(evict_file_id=None)
+    lru = min(obs.cached_files, key=lambda f: f.last_accessed)
+    return Action(evict_file_id=lru.file_id)
+def lfu_policy(obs: Observation) -> Action:
+    """Evict Least Frequently Used file."""
+    if not obs.cached_files:
+        return Action(evict_file_id=None)
+    lfu = min(obs.cached_files, key=lambda f: f.request_frequency)
+    return Action(evict_file_id=lfu.file_id)
+def smart_policy(obs: Observation) -> Action:
+    """
+    Smarter policy:
+    - Never evict viral files
+    - Evict the lowest-frequency, largest file (wastes least value, frees most space)
+    """
+    if not obs.cached_files:
+        return Action(evict_file_id=None)
+    # Filter out viral files from eviction candidates
+    candidates = [f for f in obs.cached_files if not f.is_viral]
+    if not candidates:
+        candidates = obs.cached_files  # fallback: evict anything
+    # Score: low frequency = good eviction, large size = good eviction
+    def eviction_score(f):
+        return -f.request_frequency + f.size_mb * 0.1
+    best = max(candidates, key=eviction_score)
+    return Action(evict_file_id=best.file_id)
+def no_op_policy(obs: Observation) -> Action:
+    """Never evict anything (baseline floor)."""
+    return Action(evict_file_id=None)
+# ─────────────────────────────────────────────
+# Grader Functions
+# ─────────────────────────────────────────────
+def grade_task_easy(policy: GraderPolicy, seed: int = 42) -> float:
+    """
+    Easy: steady traffic, 100MB cache.
+    Score based purely on hit rate.
+    >= 0.60 hit rate = 1.0, scales down to 0.0.
+    """
+    stats = _run_episode("task_easy", policy, seed)
+    hit_rate = stats["hit_rate"]
+    # Linear scale: 0.0 hit_rate -> 0.0 score, 0.60+ -> 1.0
+    score = min(1.0, hit_rate / 0.60)
+    return round(score, 4)
+def grade_task_medium(policy: GraderPolicy, seed: int = 42) -> float:
+    """
+    Medium: mixed traffic, viral files.
+    Score = weighted combo of hit rate + bandwidth saved.
+    """
+    stats = _run_episode("task_medium", policy, seed)
+    hit_rate = stats["hit_rate"]
+    bandwidth = stats["bandwidth_saved_mb"]
+    # Normalize bandwidth: assume 500MB = perfect
+    bw_score = min(1.0, bandwidth / 500.0)
+    # Hit rate: 0.55 = 1.0
+    hr_score = min(1.0, hit_rate / 0.55)
+    # 70% hit rate, 30% bandwidth
+    score = 0.70 * hr_score + 0.30 * bw_score
+    return round(score, 4)
+def grade_task_hard(policy: GraderPolicy, seed: int = 42) -> float:
+    """
+    Hard: constrained cache, many viral bursts.
+    Score = hit rate + bandwidth + thrash avoidance.
+    """
+    stats = _run_episode("task_hard", policy, seed)
+    hit_rate = stats["hit_rate"]
+    bandwidth = stats["bandwidth_saved_mb"]
+    total_reward = stats["total_reward"]
+    # Hit rate target: 0.45 = 1.0
+    hr_score = min(1.0, hit_rate / 0.45)
+    # Bandwidth: 400MB = 1.0
+    bw_score = min(1.0, bandwidth / 400.0)
+    # Reward signal (captures thrash penalties implicitly)
+    # Normalize: 200 reward = 1.0
+    rw_score = max(0.0, min(1.0, total_reward / 200.0))
+    # 50% hit rate, 25% bandwidth, 25% reward quality
+    score = 0.50 * hr_score + 0.25 * bw_score + 0.25 * rw_score
+    return round(score, 4)
+# ────────────────────────���────────────────────
+# Master Grader
+# ─────────────────────────────────────────────
+def run_all_graders(policy: GraderPolicy, seed: int = 42) -> Dict:
+    """Run all 3 graders and return scores + summary."""
+    easy = grade_task_easy(policy, seed)
+    medium = grade_task_medium(policy, seed)
+    hard = grade_task_hard(policy, seed)
+    overall = round((easy + medium + hard) / 3, 4)
+    return {
+        "task_easy": easy,
+        "task_medium": medium,
+        "task_hard": hard,
+        "overall": overall,
+        "all_in_range": all(0.0 <= s <= 1.0 for s in [easy, medium, hard]),
+    }
+if __name__ == "__main__":
+    print("=== Running Grader Validation ===\n")
+    policies = {
+        "no_op": no_op_policy,
+        "lru": lru_policy,
+        "lfu": lfu_policy,
+        "smart": smart_policy,
+    }
+    for name, policy in policies.items():
+        results = run_all_graders(policy)
+        print(f"Policy: {name}")
+        print(f"  Easy:   {results['task_easy']}")
+        print(f"  Medium: {results['task_medium']}")
+        print(f"  Hard:   {results['task_hard']}")
+        print(f"  Overall:{results['overall']}")
+        print(f"  Valid:  {results['all_in_range']}\n")

env/models.py ADDED Viewed

	@@ -0,0 +1,67 @@

+"""
+Typed Pydantic models for the CDN Cache Optimizer environment.
+Implements OpenEnv spec: Observation, Action, Reward.
+"""
+from pydantic import BaseModel, Field
+from typing import List, Optional, Dict
+class FileEntry(BaseModel):
+    """Represents a file currently in the cache."""
+    file_id: str
+    size_mb: float
+    request_frequency: float   # requests per last N steps
+    is_viral: bool
+    last_accessed: int         # step number
+class Observation(BaseModel):
+    """What the agent sees at each step."""
+    step: int
+    cache_used_mb: float
+    cache_capacity_mb: float
+    cache_fill_ratio: float
+    cached_files: List[FileEntry]
+    incoming_file_id: str
+    incoming_file_size_mb: float
+    incoming_file_is_viral: bool
+    cache_hit: bool                     # was incoming_file already cached?
+    recent_hit_rate: float              # rolling hit rate last 20 steps
+    time_of_day: float                  # 0.0 to 1.0 (normalized)
+    queue_preview: List[str]            # next 3 file_ids coming
+class Action(BaseModel):
+    """What the agent decides to do."""
+    evict_file_id: Optional[str] = None   # None = do nothing / already cached
+class Reward(BaseModel):
+    """Reward breakdown for transparency."""
+    total: float
+    cache_hit_bonus: float
+    eviction_penalty: float
+    thrash_penalty: float
+    bandwidth_saved: float
+    wasted_capacity_penalty: float
+class StepResult(BaseModel):
+    """Full result returned by step()."""
+    observation: Observation
+    reward: Reward
+    done: bool
+    info: Dict
+class TaskConfig(BaseModel):
+    """Configuration for a specific task."""
+    task_id: str
+    name: str
+    difficulty: str
+    cache_capacity_mb: float
+    num_files: int
+    viral_ratio: float
+    episode_length: int
+    description: str

env/traffic.py ADDED Viewed

	@@ -0,0 +1,119 @@

+"""
+Traffic generator for CDN Cache Optimizer.
+Simulates realistic web traffic: steady files + viral bursts.
+"""
+import random
+import math
+from dataclasses import dataclass, field
+from typing import List, Tuple
+@dataclass
+class FileProfile:
+    file_id: str
+    size_mb: float
+    base_popularity: float   # base request probability
+    is_viral: bool = False
+    viral_start: int = -1
+    viral_duration: int = 0
+    viral_peak: float = 0.0
+class TrafficGenerator:
+    """
+    Generates a stream of file requests.
+    - Steady files: consistent low-level demand
+    - Viral files: spike suddenly, dominate for a window, then die
+    """
+    def __init__(
+        self,
+        num_files: int = 50,
+        viral_ratio: float = 0.2,
+        episode_length: int = 200,
+        seed: int = 42,
+    ):
+        self.num_files = num_files
+        self.viral_ratio = viral_ratio
+        self.episode_length = episode_length
+        self.rng = random.Random(seed)
+        self.files: List[FileProfile] = []
+        self.request_log: List[str] = []  # precomputed episode
+        self._build_file_profiles()
+        self._precompute_requests()
+    def _build_file_profiles(self):
+        num_viral = max(1, int(self.num_files * self.viral_ratio))
+        for i in range(self.num_files):
+            fid = f"file_{i:03d}"
+            size = round(self.rng.uniform(1.0, 20.0), 1)
+            is_viral = i < num_viral
+            if is_viral:
+                viral_start = self.rng.randint(
+                    5, max(6, self.episode_length - 30)
+                )
+                viral_duration = self.rng.randint(10, 30)
+                viral_peak = self.rng.uniform(0.4, 0.8)
+                base_pop = self.rng.uniform(0.01, 0.05)
+                self.files.append(FileProfile(
+                    file_id=fid,
+                    size_mb=size,
+                    base_popularity=base_pop,
+                    is_viral=True,
+                    viral_start=viral_start,
+                    viral_duration=viral_duration,
+                    viral_peak=viral_peak,
+                ))
+            else:
+                base_pop = self.rng.uniform(0.02, 0.15)
+                self.files.append(FileProfile(
+                    file_id=fid,
+                    size_mb=size,
+                    base_popularity=base_pop,
+                ))
+    def _get_popularity_at_step(self, fp: FileProfile, step: int) -> float:
+        if not fp.is_viral:
+            # Steady with slight daily cycle
+            cycle = 0.3 * math.sin(2 * math.pi * step / 50)
+            return max(0.001, fp.base_popularity + cycle * fp.base_popularity)
+        # Viral: bell curve spike
+        if step < fp.viral_start or step > fp.viral_start + fp.viral_duration:
+            return fp.base_popularity
+        center = fp.viral_start + fp.viral_duration / 2
+        spread = fp.viral_duration / 4
+        spike = fp.viral_peak * math.exp(-((step - center) ** 2) / (2 * spread ** 2))
+        return fp.base_popularity + spike
+    def _precompute_requests(self):
+        self.request_log = []
+        for step in range(self.episode_length):
+            weights = [
+                self._get_popularity_at_step(fp, step) for fp in self.files
+            ]
+            total = sum(weights)
+            norm = [w / total for w in weights]
+            chosen = self.rng.choices(self.files, weights=norm, k=1)[0]
+            self.request_log.append(chosen.file_id)
+    def get_request(self, step: int) -> Tuple[str, float, bool]:
+        """Returns (file_id, size_mb, is_viral) for a given step."""
+        if step >= len(self.request_log):
+            return self.request_log[-1], 1.0, False
+        fid = self.request_log[step]
+        fp = next(f for f in self.files if f.file_id == fid)
+        return fid, fp.size_mb, fp.is_viral
+    def get_preview(self, step: int, n: int = 3) -> List[str]:
+        """Peek at next n file_ids (simulates prefetch hints)."""
+        return self.request_log[step + 1: step + 1 + n]
+    def get_file_profile(self, file_id: str) -> FileProfile:
+        return next((f for f in self.files if f.file_id == file_id), None)
+    def time_of_day(self, step: int) -> float:
+        """Normalized 0.0–1.0 cycle."""
+        return (step % 50) / 50.0

inference.py ADDED Viewed

	@@ -0,0 +1,221 @@

+"""
+inference.py - CDN Cache Optimizer Baseline Agent
+Uses OpenAI client to run an LLM agent against the environment.
+Emits structured [START], [STEP], [END] logs to stdout.
+Required env vars:
+  API_BASE_URL  - LLM API endpoint
+  MODEL_NAME    - model identifier
+  HF_TOKEN      - Hugging Face / API key
+"""
+import os
+import sys
+import json
+import time
+import requests
+from openai import OpenAI
+from env.cache import CDNCacheEnv, TASK_CONFIGS
+from env.models import Action, Observation
+# ─────────────────────────────────────────────
+# Config from environment
+# ─────────────────────────────────────────────
+API_BASE_URL = os.environ.get("API_BASE_URL", "https://api.openai.com/v1")
+MODEL_NAME   = os.environ.get("MODEL_NAME", "gpt-4o-mini")
+HF_TOKEN     = os.environ.get("HF_TOKEN", "")
+if not HF_TOKEN:
+    print("[WARN] HF_TOKEN not set. Using API_BASE_URL without auth header override.")
+client = OpenAI(
+    base_url=API_BASE_URL,
+    api_key=HF_TOKEN or "placeholder",
+)
+TASKS = ["task_easy", "task_medium", "task_hard"]
+SEED  = 42
+# ─────────────────────────────────────────────
+# LLM Agent
+# ─────────────────────────────────────────────
+SYSTEM_PROMPT = """You are an intelligent CDN cache management agent.
+At each step you receive the current cache state and an incoming file request.
+Your job: decide which file to evict (if any) to make room for new content.
+Rules:
+- Only evict a file if the cache is nearly full and the incoming file is NOT already cached
+- Prefer evicting files with LOW request_frequency and NOT viral
+- Never evict a file that was just evicted (cache thrashing)
+- If cache has space, respond with null (no eviction needed)
+You MUST respond with ONLY valid JSON in this exact format:
+{"evict_file_id": "<file_id>" or null}
+No explanation. No markdown. Only the JSON object."""
+def build_user_prompt(obs: Observation) -> str:
+    cached_summary = []
+    for f in obs.cached_files:
+        cached_summary.append(
+            f"  - {f.file_id}: size={f.size_mb}MB freq={f.request_frequency:.1f} "
+            f"viral={f.is_viral} last_accessed=step_{f.last_accessed}"
+        )
+    cached_str = "\n".join(cached_summary) if cached_summary else "  (empty)"
+    space_needed = obs.incoming_file_size_mb
+    space_free   = obs.cache_capacity_mb - obs.cache_used_mb
+    return f"""Step {obs.step} | Time of day: {obs.time_of_day:.2f} | Hit rate: {obs.recent_hit_rate:.2f}
+Cache: {obs.cache_used_mb:.1f}MB / {obs.cache_capacity_mb:.1f}MB used ({obs.cache_fill_ratio*100:.1f}% full)
+Free space: {space_free:.1f}MB
+Incoming request:
+  file_id: {obs.incoming_file_id}
+  size: {obs.incoming_file_size_mb}MB
+  viral: {obs.incoming_file_is_viral}
+  already_cached: {obs.cache_hit}
+  space_needed_to_cache: {"none (fits)" if space_free >= space_needed else f"{space_needed - space_free:.1f}MB deficit"}
+Next 3 requests preview: {obs.queue_preview}
+Currently cached files ({len(obs.cached_files)} files):
+{cached_str}
+Decide: which file to evict? (null if no eviction needed)"""
+def llm_action(obs: Observation, step_num: int) -> Action:
+    """Call LLM and parse action. Fall back to LRU on failure."""
+    prompt = build_user_prompt(obs)
+    try:
+        response = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[
+                {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user",   "content": prompt},
+            ],
+            max_tokens=50,
+            temperature=0.0,
+        )
+        raw = response.choices[0].message.content.strip()
+        parsed = json.loads(raw)
+        return Action(evict_file_id=parsed.get("evict_file_id"))
+    except Exception as e:
+        # Fallback: LRU
+        if obs.cached_files:
+            lru = min(obs.cached_files, key=lambda f: f.last_accessed)
+            return Action(evict_file_id=lru.file_id)
+        return Action(evict_file_id=None)
+# ─────────────────────────────────────────────
+# Run one task episode
+# ─────────────────────────────────────────────
+def run_task(task_id: str) -> dict:
+    config = TASK_CONFIGS[task_id]
+    env    = CDNCacheEnv(task_id=task_id, seed=SEED)
+    obs    = env.reset()
+    total_reward = 0.0
+    step_num     = 0
+    # ── [START] ──
+    print(json.dumps({
+        "type":    "START",
+        "task_id": task_id,
+        "task_name": config.name,
+        "difficulty": config.difficulty,
+        "episode_length": config.episode_length,
+        "cache_capacity_mb": config.cache_capacity_mb,
+        "model": MODEL_NAME,
+        "seed": SEED,
+    }))
+    sys.stdout.flush()
+    while True:
+        action = llm_action(obs, step_num)
+        result = env.step(action)
+        total_reward += result.reward.total
+        # ── [STEP] ──
+        print(json.dumps({
+            "type":           "STEP",
+            "task_id":        task_id,
+            "step":           step_num,
+            "action":         {"evict_file_id": action.evict_file_id},
+            "cache_hit":      result.observation.cache_hit,
+            "reward":         result.reward.total,
+            "reward_breakdown": {
+                "cache_hit_bonus":       result.reward.cache_hit_bonus,
+                "eviction_penalty":      result.reward.eviction_penalty,
+                "thrash_penalty":        result.reward.thrash_penalty,
+                "bandwidth_saved":       result.reward.bandwidth_saved,
+                "wasted_capacity_penalty": result.reward.wasted_capacity_penalty,
+            },
+            "cumulative_reward": round(total_reward, 4),
+            "hit_rate":       result.observation.recent_hit_rate,
+            "cache_fill":     result.observation.cache_fill_ratio,
+            "done":           result.done,
+        }))
+        sys.stdout.flush()
+        obs      = result.observation
+        step_num += 1
+        if result.done:
+            break
+    final_state = env.state()
+    final_hit_rate = final_state["hit_rate"]
+    # ── [END] ──
+    print(json.dumps({
+        "type":              "END",
+        "task_id":           task_id,
+        "task_name":         config.name,
+        "total_steps":       step_num,
+        "total_reward":      round(total_reward, 4),
+        "final_hit_rate":    round(final_hit_rate, 4),
+        "bandwidth_saved_mb": round(final_state["bandwidth_saved_mb"], 2),
+        "total_hits":        final_state["hits"],
+        "total_misses":      final_state["misses"],
+        "score":             round(min(1.0, final_hit_rate / {"task_easy": 0.60, "task_medium": 0.55, "task_hard": 0.45}[task_id]), 4),
+    }))
+    sys.stdout.flush()
+    return {
+        "task_id":        task_id,
+        "total_reward":   round(total_reward, 4),
+        "final_hit_rate": round(final_hit_rate, 4),
+        "score":          round(min(1.0, final_hit_rate / {"task_easy": 0.60, "task_medium": 0.55, "task_hard": 0.45}[task_id]), 4),
+    }
+# ─────────────────────────────────────────────
+# Main
+# ─────────────────────────────────────────────
+if __name__ == "__main__":
+    print(f"[INFO] Starting CDN Cache Optimizer inference", file=sys.stderr)
+    print(f"[INFO] Model: {MODEL_NAME} | API: {API_BASE_URL}", file=sys.stderr)
+    results = []
+    for task_id in TASKS:
+        print(f"\n[INFO] Running {task_id}...", file=sys.stderr)
+        r = run_task(task_id)
+        results.append(r)
+        print(f"[INFO] {task_id} done | score={r['score']} hit_rate={r['final_hit_rate']}", file=sys.stderr)
+    print("\n[INFO] === FINAL RESULTS ===", file=sys.stderr)
+    for r in results:
+        print(f"[INFO] {r['task_id']}: score={r['score']} reward={r['total_reward']}", file=sys.stderr)
+    overall = round(sum(r["score"] for r in results) / len(results), 4)
+    print(f"[INFO] Overall score: {overall}", file=sys.stderr)

openenv.yaml ADDED Viewed

	@@ -0,0 +1,68 @@

+name: cdn-cache-optimizer
+version: "1.0.0"
+description: >
+  Edge CDN Cache Optimizer — an RL environment where an agent manages
+  a content delivery network cache. The agent decides which files to evict
+  when the cache is full, balancing hit rate, bandwidth efficiency, and
+  avoiding cache thrashing. Simulates real-world viral traffic spikes
+  alongside steady baseline demand.
+author: umar
+tags:
+  - openenv
+  - cdn
+  - cache
+  - infrastructure
+  - real-world
+tasks:
+  - id: task_easy
+    name: Steady Traffic Cache
+    difficulty: easy
+    episode_length: 100
+    cache_capacity_mb: 100.0
+  - id: task_medium
+    name: Mixed Traffic Cache
+    difficulty: medium
+    episode_length: 150
+    cache_capacity_mb: 80.0
+  - id: task_hard
+    name: Constrained Cache with Viral Bursts
+    difficulty: hard
+    episode_length: 200
+    cache_capacity_mb: 50.0
+observation_space:
+  type: structured
+  fields:
+    - step: int
+    - cache_used_mb: float
+    - cache_capacity_mb: float
+    - cache_fill_ratio: float
+    - cached_files: list[FileEntry]
+    - incoming_file_id: str
+    - incoming_file_size_mb: float
+    - incoming_file_is_viral: bool
+    - cache_hit: bool
+    - recent_hit_rate: float
+    - time_of_day: float
+    - queue_preview: list[str]
+action_space:
+  type: structured
+  fields:
+    - evict_file_id: str | null
+reward_range: [-1.0, 1.5]
+endpoints:
+  reset: POST /reset
+  step:  POST /step
+  state: GET  /state
+runtime:
+  framework: fastapi
+  python: "3.11"
+  port: 7860

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+fastapi==0.111.0
+uvicorn==0.29.0
+pydantic==2.7.1
+openai==1.30.1
+requests==2.31.0
+python-multipart==0.0.9