umar-sharif821 commited on
Commit
9d51e1b
·
verified ·
1 Parent(s): 0d1ca1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -1
README.md CHANGED
@@ -2,4 +2,45 @@
2
  license: apache-2.0
3
  language:
4
  - en
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  language:
4
  - en
5
+ tags:
6
+ - reinforcement-learning
7
+ - cdn
8
+ - optimization
9
+ - agent-training
10
+ ---
11
+
12
+ # CDNs are stuck in the 60s. We used GRPO to fix that.
13
+
14
+ If you’ve ever watched a YouTube video or scrolled through Instagram without it buffering, you’re looking at a **Content Delivery Network (CDN)** at work. It’s a simple promise: keep a copy of the file on a server close to the user so things load fast.
15
+
16
+ But here’s the problem: **Cache management is basically ancient.**
17
+
18
+ When a cache gets full, it has to decide what to kick out. Most of the internet still uses policies from the **1960s**—things like **LRU (Least Recently Used)** or **LFU (Least Frequently Used)**. These aren't "smart" algorithms; they’re just basic rules. They don’t care about file size, they don't know if a video is about to go viral, and they definitely don't handle it well when things change.
19
+
20
+ ### The "Chaos" Factor: Schema Drift
21
+ In the real world, infrastructure isn't static. A server's capacity might drop 40% because of a maintenance issue, or a cricket match might start, and suddenly 50 million people want the exact same stream at the same time.
22
+
23
+ We call this **Schema Drift**. Most Reinforcement Learning (RL) agents are trained in "perfect" environments where the rules never change. But when we threw standard policies (and even "smart" baselines) into our drift-heavy environment, they didn't just slow down—they **collapsed.**
24
+
25
+ * **LRU hit rate:** tanked to 10.5%
26
+ * **LFU hit rate:** basically flatlined at 3%
27
+
28
+ You can see exactly what that collapse looks like in the performance data:
29
+
30
+ ![Chart showing cumulative reward and cache hit rate under drift, titled "Baseline Policies Collapsing Under Schema Drift".](image_3.png)
31
+
32
+ ### How We Built It (The Tech Stack)
33
+ We didn't want to build just another "toy" environment. We wanted something that felt like real-world infrastructure.
34
+
35
+ * **The Environment:** Built using the latest **OpenEnv** framework. We baked "drifts" directly into the episodes—Step 50 might see a capacity crash, while Step 100 might trigger a viral traffic surge.
36
+ * **The Brain:** We used **Qwen 1.5B** and trained it via **GRPO (Group Relative Policy Optimization)** using the Hugging Face TRL library.
37
+ * **The Goal:** Moving past academic "hit rates" and looking at the actual dollar value. In the real world, a 1% hit-rate improvement for a company like Cloudflare means millions of dollars saved in bandwidth costs.
38
+
39
+ ### Why This Matters
40
+ This isn't just about making the internet 5% faster. It’s about building systems that don't need a human to go in and manually rewrite the rules every time the traffic patterns change. Our agent doesn't just follow a rule; it learns a strategy.
41
+
42
+ It’s the difference between a static blueprint and a living system that adapts as it goes.
43
+
44
+ ---
45
+
46
+ **Built for the Meta × Scaler OpenEnv Hackathon 2026.** If you're interested in RL for infrastructure, check out the model files and the environment graders in the repo!