umar-sharif821
/

CDN-cache-optimizer

Model card Files Files and versions

umar-sharif821 commited on 24 days ago

Commit

9d51e1b

·

verified ·

1 Parent(s): 0d1ca1e

Update README.md

Files changed (1) hide show

README.md +42 -1

README.md CHANGED Viewed

@@ -2,4 +2,45 @@
 license: apache-2.0
 language:
 - en
----

 license: apache-2.0
 language:
 - en
+tags:
+- reinforcement-learning
+- cdn
+- optimization
+- agent-training
+---
+# CDNs are stuck in the 60s. We used GRPO to fix that.
+If you’ve ever watched a YouTube video or scrolled through Instagram without it buffering, you’re looking at a **Content Delivery Network (CDN)** at work. It’s a simple promise: keep a copy of the file on a server close to the user so things load fast.
+But here’s the problem: **Cache management is basically ancient.**
+When a cache gets full, it has to decide what to kick out. Most of the internet still uses policies from the **1960s**—things like **LRU (Least Recently Used)** or **LFU (Least Frequently Used)**. These aren't "smart" algorithms; they’re just basic rules. They don’t care about file size, they don't know if a video is about to go viral, and they definitely don't handle it well when things change.
+### The "Chaos" Factor: Schema Drift
+In the real world, infrastructure isn't static. A server's capacity might drop 40% because of a maintenance issue, or a cricket match might start, and suddenly 50 million people want the exact same stream at the same time.
+We call this **Schema Drift**. Most Reinforcement Learning (RL) agents are trained in "perfect" environments where the rules never change. But when we threw standard policies (and even "smart" baselines) into our drift-heavy environment, they didn't just slow down—they **collapsed.**
+* **LRU hit rate:** tanked to 10.5%
+* **LFU hit rate:** basically flatlined at 3%
+You can see exactly what that collapse looks like in the performance data:
+![Chart showing cumulative reward and cache hit rate under drift, titled "Baseline Policies Collapsing Under Schema Drift".](image_3.png)
+### How We Built It (The Tech Stack)
+We didn't want to build just another "toy" environment. We wanted something that felt like real-world infrastructure.
+* **The Environment:** Built using the latest **OpenEnv** framework. We baked "drifts" directly into the episodes—Step 50 might see a capacity crash, while Step 100 might trigger a viral traffic surge.
+* **The Brain:** We used **Qwen 1.5B** and trained it via **GRPO (Group Relative Policy Optimization)** using the Hugging Face TRL library.
+* **The Goal:** Moving past academic "hit rates" and looking at the actual dollar value. In the real world, a 1% hit-rate improvement for a company like Cloudflare means millions of dollars saved in bandwidth costs.
+### Why This Matters
+This isn't just about making the internet 5% faster. It’s about building systems that don't need a human to go in and manually rewrite the rules every time the traffic patterns change. Our agent doesn't just follow a rule; it learns a strategy.
+It’s the difference between a static blueprint and a living system that adapts as it goes.
+---
+**Built for the Meta × Scaler OpenEnv Hackathon 2026.** If you're interested in RL for infrastructure, check out the model files and the environment graders in the repo!