Update README.md
Browse files
README.md
CHANGED
|
@@ -2,4 +2,45 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
+
tags:
|
| 6 |
+
- reinforcement-learning
|
| 7 |
+
- cdn
|
| 8 |
+
- optimization
|
| 9 |
+
- agent-training
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# CDNs are stuck in the 60s. We used GRPO to fix that.
|
| 13 |
+
|
| 14 |
+
If you’ve ever watched a YouTube video or scrolled through Instagram without it buffering, you’re looking at a **Content Delivery Network (CDN)** at work. It’s a simple promise: keep a copy of the file on a server close to the user so things load fast.
|
| 15 |
+
|
| 16 |
+
But here’s the problem: **Cache management is basically ancient.**
|
| 17 |
+
|
| 18 |
+
When a cache gets full, it has to decide what to kick out. Most of the internet still uses policies from the **1960s**—things like **LRU (Least Recently Used)** or **LFU (Least Frequently Used)**. These aren't "smart" algorithms; they’re just basic rules. They don’t care about file size, they don't know if a video is about to go viral, and they definitely don't handle it well when things change.
|
| 19 |
+
|
| 20 |
+
### The "Chaos" Factor: Schema Drift
|
| 21 |
+
In the real world, infrastructure isn't static. A server's capacity might drop 40% because of a maintenance issue, or a cricket match might start, and suddenly 50 million people want the exact same stream at the same time.
|
| 22 |
+
|
| 23 |
+
We call this **Schema Drift**. Most Reinforcement Learning (RL) agents are trained in "perfect" environments where the rules never change. But when we threw standard policies (and even "smart" baselines) into our drift-heavy environment, they didn't just slow down—they **collapsed.**
|
| 24 |
+
|
| 25 |
+
* **LRU hit rate:** tanked to 10.5%
|
| 26 |
+
* **LFU hit rate:** basically flatlined at 3%
|
| 27 |
+
|
| 28 |
+
You can see exactly what that collapse looks like in the performance data:
|
| 29 |
+
|
| 30 |
+

|
| 31 |
+
|
| 32 |
+
### How We Built It (The Tech Stack)
|
| 33 |
+
We didn't want to build just another "toy" environment. We wanted something that felt like real-world infrastructure.
|
| 34 |
+
|
| 35 |
+
* **The Environment:** Built using the latest **OpenEnv** framework. We baked "drifts" directly into the episodes—Step 50 might see a capacity crash, while Step 100 might trigger a viral traffic surge.
|
| 36 |
+
* **The Brain:** We used **Qwen 1.5B** and trained it via **GRPO (Group Relative Policy Optimization)** using the Hugging Face TRL library.
|
| 37 |
+
* **The Goal:** Moving past academic "hit rates" and looking at the actual dollar value. In the real world, a 1% hit-rate improvement for a company like Cloudflare means millions of dollars saved in bandwidth costs.
|
| 38 |
+
|
| 39 |
+
### Why This Matters
|
| 40 |
+
This isn't just about making the internet 5% faster. It’s about building systems that don't need a human to go in and manually rewrite the rules every time the traffic patterns change. Our agent doesn't just follow a rule; it learns a strategy.
|
| 41 |
+
|
| 42 |
+
It’s the difference between a static blueprint and a living system that adapts as it goes.
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
**Built for the Meta × Scaler OpenEnv Hackathon 2026.** If you're interested in RL for infrastructure, check out the model files and the environment graders in the repo!
|