Update README.md
Browse filesCDN Cache Optimizer: Training Agents Under Schema Drift
The "Silent" Problem
Most people don't realize that every time they load a video or scroll through a feed, they aren't hitting a central server. They're hitting a CDN (Content Delivery Network) that’s basically a local copy.
The promise is faster internet, but the reality is that cache management is kind of stuck in the past. When a cache gets full, it has to decide what to kick out. Today, even giant providers like Cloudflare or AWS are still using policies from the 1960s:
LRU (Least Recently Used): Toss whatever hasn’t been touched in a while.
LFU (Least Frequently Used): Toss whatever isn’t popular.
These are static "dumb" rules. They don’t adapt to viral surges or changing business goals. They just exist. And when the infrastructure shifts (what we call Schema Drift), these policies completely fall apart.
The Solution: Learning to Adapt
We didn't want to build a static rulebook. We built an RL environment that trains agents to actually learn patterns.
What makes this special is that we don't train them in a perfect, stable world. Real life is messy. We introduced three mid-episode "drift" events that simulate what actually happens in a real CDN:
Step 50: Capacity suddenly drops by 40% (server maintenance/issues).
Step 100: A massive traffic surge (something went viral).
Step 150: The reward policy changes (the goal shifts from speed to cost-saving).
The agent has to figure these out on the fly without being told they're coming.
The Results (Spoiler: It's Brutal)
When we put the old-school baselines through this drift environment, the performance drop was staggering:
LRU hit rate: 10.5%
LFU hit rate: 3.0%
Smart baseline: 0.5%

Basically, the standard policies collapsed. Our trained RL agent, however, learned to pivot and maintained significantly higher performance across the board.
Technical Breakdown
If you're interested in the "under the hood" stuff:
The Framework: Built with the latest OpenEnv.

The Model: We used Qwen 1.5B with GRPO (Group Relative Policy Optimization) via the TRL library.
The Innovation: The environment isn't just a simulator; it's a "chaos generator" that patches the CDN logic in real-time to force the agent to generalize rather than just memorizing a specific pattern.
Why This Matters
At enterprise scale, this isn't just a technical flex. A 1% improvement in cache hit rate can save a company like Cloudflare millions in annual bandwidth costs. We're trying to bridge that gap between "cool AI research" and "real-world infrastructure that saves money."
Project for the Meta × Scaler OpenEnv Hackathon 2026