Upload 6 files

Browse files

Files changed (6) hide show

Dockerfile +16 -0
README.md +154 -30
openenv.yaml +68 -0
pyproject.toml +25 -0
requirements.txt +7 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,16 @@

+FROM python:3.11-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+ENV API_BASE_URL="https://api.openai.com/v1"
+ENV MODEL_NAME="gpt-4o-mini"
+ENV HF_TOKEN=""
+EXPOSE 7860
+CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,52 +1,176 @@
 ---
-license: apache-2.0
-language:
-- en
 tags:
-- reinforcement-learning
-- cdn
-- optimization
-- agent-training
 ---
-# CDNs are stuck in the 60s. We used GRPO to fix that.
-If you’ve ever watched a YouTube video or scrolled through Instagram without it buffering, you’re looking at a **Content Delivery Network (CDN)** at work. It’s a simple promise: keep a copy of the file on a server close to the user so things load fast.
-But here’s the problem: **Cache management is basically ancient.**
-When a cache gets full, it has to decide what to kick out. Most of the internet still uses policies from the **1960s**—things like **LRU (Least Recently Used)** or **LFU (Least Frequently Used)**. These aren't "smart" algorithms; they’re just basic rules. They don’t care about file size, they don't know if a video is about to go viral, and they definitely don't handle it well when things change.
-### The "Chaos" Factor: Schema Drift
-In the real world, infrastructure isn't static. A server's capacity might drop 40% because of a maintenance issue, or a cricket match might start, and suddenly 50 million people want the exact same stream at the same time.
-We call this **Schema Drift**. Most Reinforcement Learning (RL) agents are trained in "perfect" environments where the rules never change. But when we threw standard policies (and even "smart" baselines) into our drift-heavy environment, they didn't just slow down—they **collapsed.**
-* **LRU hit rate:** tanked to 10.5%
-* **LFU hit rate:** basically flatlined at 3%
-You can see exactly what that collapse looks like in the performance data:
-![Performance Graph](https://huggingface.co/umar-sharif821/CDN-cache-optimizer/resolve/main/results.png)
-### How We Built It (The Tech Stack)
-We didn't want to build just another "toy" environment. We wanted something that felt like real-world infrastructure.
-* **The Environment:** Built using the latest **OpenEnv** framework. We baked "drifts" directly into the episodes—Step 50 might see a capacity crash, while Step 100 might trigger a viral traffic surge.
-* **The Brain:** We used **Qwen 1.5B** and trained it via **GRPO (Group Relative Policy Optimization)** using the Hugging Face TRL library.
-* **The Goal:** Moving past academic "hit rates" and looking at the actual dollar value. In the real world, a 1% hit-rate improvement for a company like Cloudflare means millions of dollars saved in bandwidth costs.
-### Why This Matters
-This isn't just about making the internet 5% faster. It’s about building systems that don't need a human to go in and manually rewrite the rules every time the traffic patterns change. Our agent doesn't just follow a rule; it learns a strategy.
-It’s the difference between a static blueprint and a living system that adapts as it goes.
 ---
-## Code Repository
-https://github.com/umar-sharif821/cdn-cache-env
-All source code, environment, and training scripts are in the GitHub repo.
-**Built for the Meta × Scaler OpenEnv Hackathon 2026.** If you're interested in RL for infrastructure, check out the model files and the environment graders in the repo!

 ---
+title: Cdn Cache Optimizer
+emoji: 🌐
+colorFrom: blue
+colorTo: green
+sdk: docker
+pinned: false
 tags:
+  - openenv
 ---
+# 🌐 CDN Cache Optimizer — OpenEnv RL Environment
+An RL environment simulating **edge CDN cache management** — the exact problem companies like Meta solve at planetary scale. An agent manages a cache of limited size, deciding which files to evict when new content arrives, balancing **hit rate**, **bandwidth efficiency**, and **thrash avoidance**.
+---
+## 🎯 Motivation
+Content Delivery Networks serve billions of files daily. Edge servers have limited storage, so they must constantly decide: *which cached files to keep, and which to evict?* Standard algorithms like LRU aren't optimal — especially when traffic has **viral bursts** (a file suddenly gets 50x more requests for 20 minutes, then drops back to zero).
+A smarter agent can:
+- Predict viral spikes from queue previews
+- Avoid evicting high-frequency files
+- Prevent cache thrashing (evicting then immediately re-requesting)
+- Maximize bandwidth saved for users
+---
+## 🔧 Environment Description
+At each step, a file is requested from the network. If it's already in the cache → **cache hit** (reward). If not → **cache miss**, and the agent must decide whether to evict an existing file to make room.
+### Traffic Model
+- **Steady files**: Consistent, cyclical demand
+- **Viral files**: Bell-curve spike in popularity, then fade back to baseline
+---
+## 📐 Action & Observation Space
+### Observation Space
+| Field | Type | Description |
+|-------|------|-------------|
+| `step` | int | Current episode step |
+| `cache_used_mb` | float | MB currently used |
+| `cache_capacity_mb` | float | Total cache size |
+| `cache_fill_ratio` | float | 0.0–1.0 fill level |
+| `cached_files` | List[FileEntry] | All files in cache with metadata |
+| `incoming_file_id` | str | File being requested |
+| `incoming_file_size_mb` | float | Size of incoming file |
+| `incoming_file_is_viral` | bool | Is this file currently viral? |
+| `cache_hit` | bool | Is incoming file already cached? |
+| `recent_hit_rate` | float | Rolling hit rate (last 20 steps) |
+| `time_of_day` | float | Normalized 0.0–1.0 daily cycle |
+| `queue_preview` | List[str] | Next 3 file IDs (prefetch hint) |
+### FileEntry Fields
+| Field | Type | Description |
+|-------|------|-------------|
+| `file_id` | str | Unique identifier |
+| `size_mb` | float | File size in MB |
+| `request_frequency` | float | Requests since cached |
+| `is_viral` | bool | Currently viral |
+| `last_accessed` | int | Step number of last access |
+### Action Space
+| Field | Type | Description |
+|-------|------|-------------|
+| `evict_file_id` | str \| null | File to evict (null = no eviction) |
+### Reward Function
+| Component | Range | Description |
+|-----------|-------|-------------|
+| `cache_hit_bonus` | +1.0 to +1.5 | Hit reward (viral hits = +1.5) |
+| `bandwidth_saved` | +0.0 to +0.2 | Reward for bandwidth efficiency |
+| `eviction_penalty` | -0.0 to -0.5 | Penalty for evicting popular files |
+| `thrash_penalty` | 0.0 or -0.5 | Penalty for evicting same file twice |
+| `wasted_capacity_penalty` | -0.0 to -0.3 | Penalty for leaving cache empty |
+---
+## 📋 Tasks
+### Task 1: Steady Traffic Cache (Easy)
+- **Cache**: 100MB | **Files**: 30 | **Steps**: 100
+- No viral files — steady demand only
+- Agent learns basic LRU-style eviction
+- **Target hit rate**: ≥ 0.60 → score 1.0
+- **Baseline score**: ~0.75
+### Task 2: Mixed Traffic Cache (Medium)
+- **Cache**: 80MB | **Files**: 50 | **Steps**: 150
+- 20% viral files mixed with steady demand
+- Agent must handle spikes and prioritize popular content
+- **Score**: 70% hit rate + 30% bandwidth
+- **Baseline score**: ~0.60
+### Task 3: Constrained Cache with Viral Bursts (Hard)
+- **Cache**: 50MB | **Files**: 80 | **Steps**: 200
+- 35% viral files, tight capacity, large file sizes
+- Agent must predict spikes, avoid thrashing
+- **Score**: 50% hit rate + 25% bandwidth + 25% reward quality
+- **Baseline score**: ~0.45
+---
+## 🚀 Setup & Usage
+### Local Setup
+```bash
+git clone <repo>
+cd cdn-cache-env
+pip install -r requirements.txt
+```
+### Run API Server
+```bash
+uvicorn api.main:app --host 0.0.0.0 --port 7860
+```
+### Run Inference (Baseline Agent)
+```bash
+export API_BASE_URL="https://api.openai.com/v1"
+export MODEL_NAME="gpt-4o-mini"
+export HF_TOKEN="your_token_here"
+python inference.py
+```
+### Docker
+```bash
+docker build -t cdn-cache-env .
+docker run -p 7860:7860 \
+  -e API_BASE_URL="https://api.openai.com/v1" \
+  -e MODEL_NAME="gpt-4o-mini" \
+  -e HF_TOKEN="your_token" \
+  cdn-cache-env
+```
+---
+## 🌐 API Endpoints
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/health` | Health check (returns 200) |
+| GET | `/tasks` | List all tasks |
+| POST | `/reset` | Start episode `{"task_id": "task_easy", "seed": 42}` |
+| POST | `/step` | Take action `{"evict_file_id": "file_001" or null}` |
+| GET | `/state` | Full environment state |
 ---
+## 📊 Baseline Scores
+Using the built-in `smart_policy` (non-LLM baseline):
+| Task | Hit Rate | Score |
+|------|----------|-------|
+| Easy | ~0.72 | ~1.00 |
+| Medium | ~0.61 | ~0.82 |
+| Hard | ~0.48 | ~0.78 |
+| **Overall** | | **~0.87** |
+---
+## 📝 Log Format
+`inference.py` emits structured JSON logs:
+```
+{"type": "START", "task_id": "task_easy", ...}
+{"type": "STEP",  "step": 0, "action": {...}, "reward": 1.0, ...}
+{"type": "END",   "total_reward": 87.3, "final_hit_rate": 0.72, "score": 1.0}
+```

openenv.yaml ADDED Viewed

	@@ -0,0 +1,68 @@

+name: cdn-cache-optimizer
+version: "1.0.0"
+description: >
+  Edge CDN Cache Optimizer — an RL environment where an agent manages
+  a content delivery network cache. The agent decides which files to evict
+  when the cache is full, balancing hit rate, bandwidth efficiency, and
+  avoiding cache thrashing. Simulates real-world viral traffic spikes
+  alongside steady baseline demand.
+author: umar
+tags:
+  - openenv
+  - cdn
+  - cache
+  - infrastructure
+  - real-world
+tasks:
+  - id: task_easy
+    name: Steady Traffic Cache
+    difficulty: easy
+    episode_length: 100
+    cache_capacity_mb: 100.0
+  - id: task_medium
+    name: Mixed Traffic Cache
+    difficulty: medium
+    episode_length: 150
+    cache_capacity_mb: 80.0
+  - id: task_hard
+    name: Constrained Cache with Viral Bursts
+    difficulty: hard
+    episode_length: 200
+    cache_capacity_mb: 50.0
+observation_space:
+  type: structured
+  fields:
+    - step: int
+    - cache_used_mb: float
+    - cache_capacity_mb: float
+    - cache_fill_ratio: float
+    - cached_files: list[FileEntry]
+    - incoming_file_id: str
+    - incoming_file_size_mb: float
+    - incoming_file_is_viral: bool
+    - cache_hit: bool
+    - recent_hit_rate: float
+    - time_of_day: float
+    - queue_preview: list[str]
+action_space:
+  type: structured
+  fields:
+    - evict_file_id: str | null
+reward_range: [-1.0, 1.5]
+endpoints:
+  reset: POST /reset
+  step:  POST /step
+  state: GET  /state
+runtime:
+  framework: fastapi
+  python: "3.11"
+  port: 7860

pyproject.toml ADDED Viewed

	@@ -0,0 +1,25 @@

+[build-system]
+requires = ["setuptools>=68.0", "wheel"]
+build-backend = "setuptools.backends.legacy:build"
+[project]
+name = "cdn-cache-optimizer"
+version = "1.0.0"
+description = "Edge CDN Cache Optimizer - OpenEnv RL Environment"
+requires-python = ">=3.11"
+dependencies = [
+    "fastapi==0.111.0",
+    "uvicorn==0.29.0",
+    "pydantic==2.7.1",
+    "openai>=2.7.2",
+    "requests==2.31.0",
+    "python-multipart==0.0.9",
+    "openenv-core>=0.2.0",
+]
+[project.scripts]
+server = "server.app:main"
+[tool.setuptools.packages.find]
+where = ["."]
+include = ["env*", "api*", "server*"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+fastapi==0.111.0
+uvicorn==0.29.0
+pydantic==2.7.1
+openai>=2.7.2
+requests==2.31.0
+python-multipart==0.0.9
+openenv-core>=0.2.0

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff