File size: 5,887 Bytes
0d1ca1e
0753958
 
 
 
 
 
9d51e1b
0753958
9d51e1b
 
0753958
9d51e1b
0753958
9d51e1b
0753958
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d51e1b
0753958
 
 
9d51e1b
0753958
9d51e1b
0753958
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d51e1b
0753958
9d51e1b
0753958
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d51e1b
3740ee3
 
 
 
 
 
 
 
 
 
 
0753958
9d51e1b
0753958
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d51e1b
0753958
9d51e1b
0753958
9d51e1b
0753958
 
 
 
 
 
 
9d51e1b
 
885a7c2
0753958
 
 
 
 
 
 
 
 
 
 
 
885a7c2
0753958
885a7c2
0753958
9d51e1b
0753958
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
title: Cdn Cache Optimizer
emoji: 🌐
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
tags:
  - openenv
---

# 🌐 CDN Cache Optimizer β€” OpenEnv RL Environment

An RL environment simulating **edge CDN cache management** β€” the exact problem companies like Meta solve at planetary scale. An agent manages a cache of limited size, deciding which files to evict when new content arrives, balancing **hit rate**, **bandwidth efficiency**, and **thrash avoidance**.

---

## 🎯 Motivation

Content Delivery Networks serve billions of files daily. Edge servers have limited storage, so they must constantly decide: *which cached files to keep, and which to evict?* Standard algorithms like LRU aren't optimal β€” especially when traffic has **viral bursts** (a file suddenly gets 50x more requests for 20 minutes, then drops back to zero).

A smarter agent can:
- Predict viral spikes from queue previews
- Avoid evicting high-frequency files
- Prevent cache thrashing (evicting then immediately re-requesting)
- Maximize bandwidth saved for users

---

## πŸ”§ Environment Description

At each step, a file is requested from the network. If it's already in the cache β†’ **cache hit** (reward). If not β†’ **cache miss**, and the agent must decide whether to evict an existing file to make room.

### Traffic Model
- **Steady files**: Consistent, cyclical demand
- **Viral files**: Bell-curve spike in popularity, then fade back to baseline

---

## πŸ“ Action & Observation Space

### Observation Space
| Field | Type | Description |
|-------|------|-------------|
| `step` | int | Current episode step |
| `cache_used_mb` | float | MB currently used |
| `cache_capacity_mb` | float | Total cache size |
| `cache_fill_ratio` | float | 0.0–1.0 fill level |
| `cached_files` | List[FileEntry] | All files in cache with metadata |
| `incoming_file_id` | str | File being requested |
| `incoming_file_size_mb` | float | Size of incoming file |
| `incoming_file_is_viral` | bool | Is this file currently viral? |
| `cache_hit` | bool | Is incoming file already cached? |
| `recent_hit_rate` | float | Rolling hit rate (last 20 steps) |
| `time_of_day` | float | Normalized 0.0–1.0 daily cycle |
| `queue_preview` | List[str] | Next 3 file IDs (prefetch hint) |

### FileEntry Fields
| Field | Type | Description |
|-------|------|-------------|
| `file_id` | str | Unique identifier |
| `size_mb` | float | File size in MB |
| `request_frequency` | float | Requests since cached |
| `is_viral` | bool | Currently viral |
| `last_accessed` | int | Step number of last access |

### Action Space
| Field | Type | Description |
|-------|------|-------------|
| `evict_file_id` | str \| null | File to evict (null = no eviction) |

### Reward Function
| Component | Range | Description |
|-----------|-------|-------------|
| `cache_hit_bonus` | +1.0 to +1.5 | Hit reward (viral hits = +1.5) |
| `bandwidth_saved` | +0.0 to +0.2 | Reward for bandwidth efficiency |
| `eviction_penalty` | -0.0 to -0.5 | Penalty for evicting popular files |
| `thrash_penalty` | 0.0 or -0.5 | Penalty for evicting same file twice |
| `wasted_capacity_penalty` | -0.0 to -0.3 | Penalty for leaving cache empty |

---

## πŸ“‹ Tasks

### Task 1: Steady Traffic Cache (Easy)
- **Cache**: 100MB | **Files**: 30 | **Steps**: 100
- No viral files β€” steady demand only
- Agent learns basic LRU-style eviction
- **Target hit rate**: β‰₯ 0.60 β†’ score 1.0
- **Baseline score**: ~0.75

### Task 2: Mixed Traffic Cache (Medium)
- **Cache**: 80MB | **Files**: 50 | **Steps**: 150  
- 20% viral files mixed with steady demand
- Agent must handle spikes and prioritize popular content
- **Score**: 70% hit rate + 30% bandwidth
- **Baseline score**: ~0.60

### Task 3: Constrained Cache with Viral Bursts (Hard)
- **Cache**: 50MB | **Files**: 80 | **Steps**: 200
- 35% viral files, tight capacity, large file sizes
- Agent must predict spikes, avoid thrashing
- **Score**: 50% hit rate + 25% bandwidth + 25% reward quality
- **Baseline score**: ~0.45

## Code Repository

Full source: https://github.com/umar-sharif821/cdn-cache-env

## Files Included

- **env/cache.py** - DriftCDNEnv environment implementation
- **server/app.py** - OpenEnv FastAPI server
- **training/train.py** - Fine-tuning script
- **training_results_finetuned.png** - Training results chart
- **baseline_drift.png** - Baseline comparison chart
---

## πŸš€ Setup & Usage

### Local Setup
```bash
git clone <repo>
cd cdn-cache-env
pip install -r requirements.txt
```

### Run API Server
```bash
uvicorn api.main:app --host 0.0.0.0 --port 7860
```

### Run Inference (Baseline Agent)
```bash
export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o-mini"
export HF_TOKEN="your_token_here"

python inference.py
```

### Docker
```bash
docker build -t cdn-cache-env .
docker run -p 7860:7860 \
  -e API_BASE_URL="https://api.openai.com/v1" \
  -e MODEL_NAME="gpt-4o-mini" \
  -e HF_TOKEN="your_token" \
  cdn-cache-env
```

---

## 🌐 API Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/health` | Health check (returns 200) |
| GET | `/tasks` | List all tasks |
| POST | `/reset` | Start episode `{"task_id": "task_easy", "seed": 42}` |
| POST | `/step` | Take action `{"evict_file_id": "file_001" or null}` |
| GET | `/state` | Full environment state |

---

## πŸ“Š Baseline Scores

Using the built-in `smart_policy` (non-LLM baseline):

| Task | Hit Rate | Score |
|------|----------|-------|
| Easy | ~0.72 | ~1.00 |
| Medium | ~0.61 | ~0.82 |
| Hard | ~0.48 | ~0.78 |
| **Overall** | | **~0.87** |

---

## πŸ“ Log Format

`inference.py` emits structured JSON logs:

```
{"type": "START", "task_id": "task_easy", ...}
{"type": "STEP",  "step": 0, "action": {...}, "reward": 1.0, ...}
{"type": "END",   "total_reward": 87.3, "final_hit_rate": 0.72, "score": 1.0}
```