umar-sharif821 commited on
Commit
a9f5ced
·
1 Parent(s): 75b0081

docs: add standalone Hugging Face blog writeup

Browse files
Files changed (2) hide show
  1. Blog.MD +385 -0
  2. README.md +6 -0
Blog.MD ADDED
@@ -0,0 +1,385 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Building a CDN Cache Optimizer with OpenEnv
2
+
3
+ ## Project Links
4
+
5
+ - Hugging Face Space: https://huggingface.co/spaces/umar-sharif821/cdn-cache-env-improvedone
6
+ - GitHub Repository: https://github.com/umar-sharif821/cdn-cache-env-improvedone
7
+
8
+ ## Short Summary
9
+
10
+ CDN Cache Optimizer is an OpenEnv-style reinforcement learning environment for edge CDN cache decisions.
11
+
12
+ The agent observes cache pressure, request patterns, file popularity, and churn signals.
13
+
14
+ It then decides whether to bypass an incoming object, admit it into cache, or evict something else to make room.
15
+
16
+ The environment is designed around a real infrastructure problem: reducing user latency and origin fetch cost while keeping the edge cache stable.
17
+
18
+ ## Which Hackathon Theme This Fits
19
+
20
+ The strongest fit is **Theme #3.1 - World Modeling / Professional Tasks**.
21
+
22
+ CDN cache optimization is a professional infrastructure task.
23
+
24
+ The agent is not solving a toy grid world.
25
+
26
+ It is interacting with a dynamic system where state changes over time:
27
+
28
+ - cache contents change after every decision
29
+ - request traffic changes across an episode
30
+ - hit rate depends on previous cache choices
31
+ - origin fetches happen when the edge misses
32
+ - eviction decisions affect future rewards
33
+ - schema drift changes how CDN logs may arrive
34
+
35
+ The project also has a secondary fit with **Theme #2 - Long-Horizon Planning**.
36
+
37
+ A cache decision is not only about the current request.
38
+
39
+ If the agent evicts a useful object now, it may pay for that mistake later through future misses.
40
+
41
+ If it admits a viral object early, it may receive benefits many steps later.
42
+
43
+ That makes this a delayed-reward control problem, not just a one-step classification task.
44
+
45
+ ## Why I Chose CDN Caching
46
+
47
+ I wanted the environment to feel close to a real engineering workflow.
48
+
49
+ Content Delivery Networks serve files from edge locations close to users.
50
+
51
+ When an object is available at the edge, users get lower latency.
52
+
53
+ When an object is missing, the system falls back to origin.
54
+
55
+ That origin fetch is slower and more expensive.
56
+
57
+ The hard part is that edge storage is limited.
58
+
59
+ The cache cannot keep everything.
60
+
61
+ So the environment asks a simple but important question:
62
+
63
+ > What should the cache keep, and what should it evict?
64
+
65
+ Classic policies like LRU are useful, but they are not always enough.
66
+
67
+ They do not fully understand viral bursts, object size, future request hints, or churn cost.
68
+
69
+ That made CDN caching a good candidate for an RL environment.
70
+
71
+ ## Environment Design
72
+
73
+ At every step, the environment simulates a CDN request.
74
+
75
+ The request may hit the edge cache or miss and go to origin.
76
+
77
+ The agent then chooses one of three actions:
78
+
79
+ ```text
80
+ 0 = bypass incoming object
81
+ 1 = admit object and evict using LRU
82
+ 2 = admit object and evict using smart popularity-aware eviction
83
+ ```
84
+
85
+ The observation is normalized so training stays lightweight:
86
+
87
+ ```text
88
+ [cache_fill, incoming_size, incoming_popularity, hit_rate, churn_rate]
89
+ ```
90
+
91
+ This gives the agent enough signal to reason about:
92
+
93
+ - how full the cache is
94
+ - whether the incoming object is large
95
+ - whether the object is likely to be useful
96
+ - whether recent cache behavior is working
97
+ - whether the system is churning too much
98
+
99
+ ## Real-World Grounding
100
+
101
+ The environment explicitly models two paths:
102
+
103
+ ```text
104
+ Edge hit -> low latency
105
+ Origin fetch -> high latency
106
+ ```
107
+
108
+ The default simulation uses:
109
+
110
+ ```text
111
+ edge_latency = 5 ms
112
+ origin_latency = 100 ms
113
+ ```
114
+
115
+ This matters because the reward is connected to infrastructure value.
116
+
117
+ The agent is rewarded for reducing latency and penalized for expensive cache behavior.
118
+
119
+ ## Reward Function
120
+
121
+ The reward follows a multi-component design:
122
+
123
+ ```text
124
+ R = w1 * Perf - w2 * Cost
125
+ ```
126
+
127
+ Where:
128
+
129
+ ```text
130
+ Perf = (origin_latency - served_latency) / origin_latency
131
+ Cost = eviction_churn + admission_cost
132
+ ```
133
+
134
+ This reward is intentionally not just "hit rate".
135
+
136
+ A cache policy can get a good hit rate while still behaving badly.
137
+
138
+ For example, it might churn too much.
139
+
140
+ It might admit large cold objects.
141
+
142
+ It might evict popular content too aggressively.
143
+
144
+ The reward tries to balance:
145
+
146
+ - latency improvement
147
+ - admission discipline
148
+ - eviction cost
149
+ - cache stability
150
+ - useful edge hits
151
+
152
+ That is closer to what a production caching system would care about.
153
+
154
+ ## Schema Drift Handling
155
+
156
+ One part of the project I cared about was schema drift.
157
+
158
+ Real CDN logs are messy.
159
+
160
+ The same field may appear under different names across systems or over time.
161
+
162
+ For example:
163
+
164
+ ```text
165
+ timestamp -> ts
166
+ file_id -> fid
167
+ size_mb -> bytes
168
+ hit -> cache_hit
169
+ region -> edge_pop
170
+ ```
171
+
172
+ Types may also change.
173
+
174
+ A boolean hit field may come in as `true`, `1`, `"true"`, or `"yes"`.
175
+
176
+ A size field may arrive in megabytes in one stream and bytes in another.
177
+
178
+ If the environment assumes one perfect schema, it becomes brittle.
179
+
180
+ To handle this, I added a `SchemaDriftGuard`.
181
+
182
+ It normalizes incoming CDN log rows before the agent sees them.
183
+
184
+ It handles:
185
+
186
+ - renamed fields
187
+ - missing fields
188
+ - extra fields
189
+ - type coercion
190
+ - byte-to-MB conversion
191
+ - structured drift reporting
192
+
193
+ The script also writes a `drift_report.json` file so the behavior can be inspected.
194
+
195
+ ## Example Drift Case
196
+
197
+ The guard can normalize rows like these:
198
+
199
+ ```python
200
+ {"timestamp": 1.0, "file_id": "a.jpg", "size_mb": 2.5, "region": "us-east-1", "hit": True}
201
+
202
+ {"ts": 2.0, "fid": "b.jpg", "size": 3000000, "geo": "eu-west-1", "cache_hit": 1}
203
+
204
+ {"time": 3.0, "object_id": "c.jpg", "bytes": 1500000, "pop": "ap-south-1", "is_hit": "true"}
205
+
206
+ {"ts": 4.0, "fid": "d.jpg", "size": "500000", "geo": "us-west-2"}
207
+ ```
208
+
209
+ All of them are converted into a canonical schema:
210
+
211
+ ```text
212
+ timestamp
213
+ file_id
214
+ size_mb
215
+ region
216
+ hit
217
+ ```
218
+
219
+ This is important for a professional task environment because production systems rarely provide perfect clean input forever.
220
+
221
+ ## Training Setup
222
+
223
+ The Colab script includes a minimal training loop.
224
+
225
+ It builds a small policy network:
226
+
227
+ ```text
228
+ Input: 5 features
229
+ Hidden: 64
230
+ Hidden: 64
231
+ Output: 3 actions
232
+ ```
233
+
234
+ The training loop uses REINFORCE.
235
+
236
+ The request stream is popularity-based, so some objects are naturally more valuable to cache.
237
+
238
+ The script compares a baseline LRU policy against the fine-tuned agent behavior.
239
+
240
+ ## Baseline vs Agent
241
+
242
+ The baseline is intentionally simple:
243
+
244
+ ```text
245
+ Baseline = always admit and evict with LRU
246
+ ```
247
+
248
+ The improved policy uses:
249
+
250
+ - learned policy behavior
251
+ - CDN-specific guardrails
252
+ - popularity-aware eviction
253
+ - bypass logic for bulky cold objects
254
+
255
+ This makes the demo stable and easy to understand.
256
+
257
+ The agent is evaluated against LRU using:
258
+
259
+ - total episode return
260
+ - cache hit rate
261
+ - average served latency
262
+ - bandwidth saved
263
+
264
+ ## Result Plot
265
+
266
+ The project includes generated comparison plots.
267
+
268
+ The main plot compares training progress and baseline-vs-agent behavior.
269
+
270
+ If viewing this file in the Hugging Face Space repository, the plot artifact is included here:
271
+
272
+ ![Training Results](training_results_finetuned.png)
273
+
274
+ The Colab script also generates a higher-resolution chart named `training_results.png` when run.
275
+
276
+ ## What the Hugging Face Space Shows
277
+
278
+ The Space provides a live Gradio interface.
279
+
280
+ The judge can:
281
+
282
+ - choose an OpenEnv task
283
+ - choose a seed
284
+ - run the benchmark
285
+ - compare LRU against the agent
286
+ - view reward and hit-rate metrics
287
+ - inspect the chart
288
+
289
+ I kept the UI simple because judges should understand the project quickly.
290
+
291
+ The goal is not to hide behind a complex frontend.
292
+
293
+ The goal is to make the environment behavior visible.
294
+
295
+ ## Colab Reproducibility
296
+
297
+ The project includes a one-shot Colab script:
298
+
299
+ ```python
300
+ !python /content/colab_submission_script.py
301
+ ```
302
+
303
+ It performs the full pipeline:
304
+
305
+ - installs dependencies
306
+ - mounts Google Drive if available
307
+ - creates the CDN environment
308
+ - verifies schema drift handling
309
+ - trains the agent
310
+ - evaluates baseline vs agent
311
+ - generates plots
312
+ - saves artifacts
313
+
314
+ Generated artifacts:
315
+
316
+ ```text
317
+ policy.pt
318
+ training_results.png
319
+ drift_report.json
320
+ metrics.json
321
+ ```
322
+
323
+ This makes the submission easier to verify from a clean runtime.
324
+
325
+ ## Why This Environment Is Interesting
326
+
327
+ The environment tests more than a single decision.
328
+
329
+ It tests whether an agent can maintain a useful model of a changing system.
330
+
331
+ The agent must reason about:
332
+
333
+ - current cache state
334
+ - future request value
335
+ - latency tradeoffs
336
+ - object size
337
+ - churn
338
+ - schema reliability
339
+
340
+ This is why I think the project fits World Modeling well.
341
+
342
+ The system has state, feedback, delayed consequences, and imperfect operational data.
343
+
344
+ ## What I Would Improve Next
345
+
346
+ If I had more time, I would extend the environment in a few directions:
347
+
348
+ - replay real CDN traces
349
+ - add regional edge nodes
350
+ - train with PPO or DQN
351
+ - expose schema drift live in the Space UI
352
+ - add cost curves for bandwidth pricing
353
+ - model origin throttling
354
+ - add multiple cache nodes with coordination
355
+
356
+ The multi-cache version would also connect more strongly to multi-agent interaction.
357
+
358
+ Different edge nodes could cooperate or compete for limited origin bandwidth.
359
+
360
+ ## Final Reflection
361
+
362
+ This project started as a caching simulator.
363
+
364
+ It became a small end-to-end environment for infrastructure decision-making.
365
+
366
+ The most important parts are:
367
+
368
+ - OpenEnv-style interaction
369
+ - real CDN caching behavior
370
+ - multi-component reward design
371
+ - schema drift robustness
372
+ - baseline comparison
373
+ - visible training/evaluation artifacts
374
+ - live Hugging Face deployment
375
+
376
+ The project is not meant to be a perfect CDN system.
377
+
378
+ It is meant to be a useful environment where an agent can improve at a real professional task.
379
+
380
+ That is the main story behind CDN Cache Optimizer.
381
+
382
+ ## Links
383
+
384
+ - Hugging Face Space: https://huggingface.co/spaces/umar-sharif821/cdn-cache-env-improvedone
385
+ - GitHub Repository: https://github.com/umar-sharif821/cdn-cache-env-improvedone
README.md CHANGED
@@ -17,6 +17,12 @@ tags:
17
 
18
  Hackathon-ready OpenEnv project for **edge CDN cache admission and eviction**. It simulates the real production tradeoff between serving from a fast edge cache and falling back to slower origin fetches, while handling schema drift in CDN logs.
19
 
 
 
 
 
 
 
20
  ---
21
 
22
  ## Why It Matters
 
17
 
18
  Hackathon-ready OpenEnv project for **edge CDN cache admission and eviction**. It simulates the real production tradeoff between serving from a fast edge cache and falling back to slower origin fetches, while handling schema drift in CDN logs.
19
 
20
+ **Hackathon writeup:** [Blog.MD](./Blog.MD)
21
+
22
+ **Live Space:** https://huggingface.co/spaces/umar-sharif821/cdn-cache-env-improvedone
23
+
24
+ **GitHub:** https://github.com/umar-sharif821/cdn-cache-env-improvedone
25
+
26
  ---
27
 
28
  ## Why It Matters