Spaces:

umar-sharif821
/

cdn-cache-env-improvedone

Running on CPU Upgrade

App Files Files Community

cdn-cache-env-improvedone / README.md

umar-sharif821

feat: reproducible eval + shared policy module; fix smart_agent wasted-capacity bug

ddf831c 24 days ago

preview code

raw

history blame contribute delete

11.4 kB

metadata

title: CDN Cache Optimizer
emoji: 🌐
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
tags:
  - openenv
  - reinforcement-learning
  - cdn
  - caching
  - hackathon

CDN Cache Optimizer - OpenEnv RL Agent

Hackathon-ready OpenEnv project for edge CDN cache admission and eviction. It simulates the real production tradeoff between serving from a fast edge cache and falling back to slower origin fetches, while handling schema drift in CDN logs.

Judges Quickstart (30 seconds)

# 1. Click the live demo (no install)
open https://huggingface.co/spaces/umar-sharif821/cdn-cache-env-improvedone

# 2. Run locally
pip install -r requirements.txt && python app.py   # Gradio UI on :7860

# 3. Reproduce training in Colab
!wget -q https://raw.githubusercontent.com/umar-sharif821/cdn-cache-env-improvedone/master/colab_submission_script.py
!python colab_submission_script.py

Results at a Glance

Mean over 5 seeds (0–4), generated by python scripts/eval.py --seeds 0 1 2 3 4 from a clean checkout:

Task	Policy	Hit Rate	Reward (mean +/- std)	Bandwidth MB
task_easy	random	35.2%	35.70 +/- 9.70	381.9
task_easy	lru_baseline	31.4%	31.43 +/- 4.95	319.5
task_easy	smart_agent	41.2%	41.97 +/- 6.67	397.4
task_medium	random	18.5%	27.75 +/- 7.30	270.8
task_medium	lru_baseline	18.7%	29.59 +/- 8.78	253.2
task_medium	smart_agent	24.5%	43.34 +/- 11.70	274.0
task_hard	random	10.3%	14.86 +/- 6.60	168.2
task_hard	lru_baseline	11.3%	21.03 +/- 6.87	188.8
task_hard	smart_agent	12.2%	26.93 +/- 8.39	199.0

smart_agent beats the LRU baseline on hit rate, total reward, and bandwidth saved across all three tasks. The same code path is used by the Hugging Face Space UI, so what you measure locally is what the live demo shows. training_results.png (produced by colab_submission_script.py) shows the 2x2 RL training/comparison chart judges can reference.

Reproducible Evaluation

Judges can regenerate the numbers above in ~30 seconds with no GPU and no external services:

pip install -r requirements.txt
python scripts/eval.py            # full 3-task x 3-seed sweep
python scripts/eval.py --quick    # 1 seed per task (fast smoke)

The script runs three policies (random, lru_baseline, smart_agent) against all three OpenEnv tasks, prints a markdown table to stdout, and writes eval_results.json with per-episode rewards, hit rates, and bandwidth saved. The same smart_agent code path powers the Hugging Face Space, so what you see in the UI and what you get from the evaluator are the same policy.

Hackathon writeup: Blog.MD

Live Space: https://huggingface.co/spaces/umar-sharif821/cdn-cache-env-improvedone

GitHub: https://github.com/umar-sharif821/cdn-cache-env-improvedone

Colab training script: colab_submission_script.py Direct raw URL for Colab download: https://raw.githubusercontent.com/umar-sharif821/cdn-cache-env-improvedone/master/colab_submission_script.py

OpenEnv manifest: openenv.yaml

OpenEnv Compliance

This environment is built against the OpenEnv spec:

openenv.yaml describes tasks, observation space, action space, reward range, and endpoints.
Pydantic models in env/models.py define Observation, Action, Reward, StepResult, FileEntry, and TaskConfig.
env/cache.py implements the standard reset() / step() / state() loop.
The FastAPI server (api/main.py) exposes /reset, /step, /state, /tasks, and /health.
The Colab RL environment uses a gymnasium.Env subclass that emits the canonical 5-tuple (obs, reward, terminated, truncated, info) with deterministic reset(seed=...) and registered metadata including openenv_version.

To install the OpenEnv core package alongside the training script (outside the HF Space, where Gradio pins FastAPI):

pip install openenv-core

Why It Matters

Content Delivery Networks serve billions of files daily. Edge servers have limited storage, so they must constantly decide: which cached files to keep, and which to evict? Standard algorithms like LRU aren't optimal — especially when traffic has viral bursts (a file suddenly gets 50x more requests for 20 minutes, then drops back to zero).

A smarter agent can:

Predict viral spikes from queue previews
Avoid evicting high-frequency files
Prevent cache thrashing (evicting then immediately re-requesting)
Maximize bandwidth saved for users

Live Demo

This repo is Hugging Face Spaces-ready. The Docker Space runs app.py, a Gradio UI that compares:

Baseline LRU: always evicts the least recently used file.
Fine-tuned Agent: preserves viral/previewed objects, avoids bulky cold admissions, and evicts low-value content under cache pressure.

Run locally:

pip install -r requirements.txt
python app.py

Open http://localhost:7860.

Google Colab Submission

For judges who want a single reproducible run:

!python /content/colab_submission_script.py

The script installs dependencies, mounts Drive when available, trains/evaluates the agent, verifies schema drift normalization, and saves:

training_results.png
policy.pt
drift_report.json
metrics.json

Environment Description

At each step, a file is requested from the network. If it is already in cache, the user is served from the edge. If not, the request goes to origin and the agent decides whether to admit the file and what to evict.

Traffic Model

Steady files: consistent, cyclical demand.
Viral files: bell-curve spikes that fade back to baseline.
Queue preview: short lookahead signal similar to CDN prefetch telemetry.

Reward Grounding

The Colab RL environment uses a multi-component reward:

R = w1 * Perf - w2 * Cost

Perf captures edge-latency savings versus origin fetch, while Cost penalizes cache churn and write/admission cost.

Schema Drift

SchemaDriftGuard in colab_submission_script.py normalizes CDN logs across renamed, missing, extra, and type-shifted fields, for example:

ts, time, event_time -> timestamp
fid, object_id, oid -> file_id
bytes, size_bytes -> size_mb
cache_hit, is_hit -> hit

📐 Action & Observation Space

Observation Space

Field	Type	Description
`step`	int	Current episode step
`cache_used_mb`	float	MB currently used
`cache_capacity_mb`	float	Total cache size
`cache_fill_ratio`	float	0.0–1.0 fill level
`cached_files`	List[FileEntry]	All files in cache with metadata
`incoming_file_id`	str	File being requested
`incoming_file_size_mb`	float	Size of incoming file
`incoming_file_is_viral`	bool	Is this file currently viral?
`cache_hit`	bool	Is incoming file already cached?
`recent_hit_rate`	float	Rolling hit rate (last 20 steps)
`time_of_day`	float	Normalized 0.0–1.0 daily cycle
`queue_preview`	List[str]	Next 3 file IDs (prefetch hint)

FileEntry Fields

Field	Type	Description
`file_id`	str	Unique identifier
`size_mb`	float	File size in MB
`request_frequency`	float	Requests since cached
`is_viral`	bool	Currently viral
`last_accessed`	int	Step number of last access

Action Space

Field	Type	Description
`evict_file_id`	str \| null	File to evict (null = no eviction)

Reward Function

Component	Range	Description
`cache_hit_bonus`	+1.0 to +1.5	Hit reward (viral hits = +1.5)
`bandwidth_saved`	+0.0 to +0.2	Reward for bandwidth efficiency
`eviction_penalty`	-0.0 to -0.5	Penalty for evicting popular files
`thrash_penalty`	0.0 or -0.5	Penalty for evicting same file twice
`wasted_capacity_penalty`	-0.0 to -0.3	Penalty for leaving cache empty

📋 Tasks

Task 1: Steady Traffic Cache (Easy)

Cache: 100MB | Files: 30 | Steps: 100
No viral files — steady demand only
Agent learns basic LRU-style eviction
Target hit rate: ≥ 0.60 → score 1.0
Baseline score: ~0.75

Task 2: Mixed Traffic Cache (Medium)

Cache: 80MB | Files: 50 | Steps: 150
20% viral files mixed with steady demand
Agent must handle spikes and prioritize popular content
Score: 70% hit rate + 30% bandwidth
Baseline score: ~0.60

Task 3: Constrained Cache with Viral Bursts (Hard)

Cache: 50MB | Files: 80 | Steps: 200
35% viral files, tight capacity, large file sizes
Agent must predict spikes, avoid thrashing
Score: 50% hit rate + 25% bandwidth + 25% reward quality
Baseline score: ~0.45

Hugging Face Deployment

Create a new Hugging Face Space.
Choose Docker as the SDK.
Push this repository to the Space remote.
The Space starts automatically from Dockerfile and serves app.py on port 7860.

git remote add space https://huggingface.co/spaces/<username>/cdn-cache-optimizer
git push space main

GitHub Deployment

git add .
git commit -m "Prepare CDN Cache Optimizer hackathon submission"
git branch -M main
git remote add origin https://github.com/<username>/cdn-cache-optimizer.git
git push -u origin main

🚀 Setup & Usage

Local Setup

git clone <repo>
cd cdn-cache-env
pip install -r requirements.txt

Run API Server

uvicorn api.main:app --host 0.0.0.0 --port 7860

Run Inference (Baseline Agent)

export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o-mini"
export HF_TOKEN="your_token_here"

python inference.py

Docker

docker build -t cdn-cache-env .
docker run -p 7860:7860 cdn-cache-env

🌐 API Endpoints

Method	Endpoint	Description
GET	`/health`	Health check (returns 200)
GET	`/tasks`	List all tasks
POST	`/reset`	Start episode `{"task_id": "task_easy", "seed": 42}`
POST	`/step`	Take action `{"evict_file_id": "file_001" or null}`
GET	`/state`	Full environment state

📊 Baseline Scores

Using the built-in smart_policy (non-LLM baseline):

Task	Hit Rate	Score
Easy	~0.72	~1.00
Medium	~0.61	~0.82
Hard	~0.48	~0.78
Overall		~0.87

📝 Log Format

inference.py emits structured JSON logs:

{"type": "START", "task_id": "task_easy", ...}
{"type": "STEP",  "step": 0, "action": {...}, "reward": 1.0, ...}
{"type": "END",   "total_reward": 87.3, "final_hit_rate": 0.72, "score": 1.0}