Spaces:

pragunk
/

adaptive_cache_env

Sleeping

App Files Files Community

pragunk commited on Apr 3

Commit

91d9e5e

verified ·

1 Parent(s): ec6e336

Update README.md

Browse files

Files changed (1) hide show

README.md +13 -11

README.md CHANGED Viewed

@@ -66,17 +66,21 @@ The environment features three programmatic workloads (tasks) designed to challe
 ## 📊 Baseline Comparisons
-To demonstrate the necessity of intelligent eviction policies, this environment provides benchmark scores comparing traditional operating system algorithms against a zero-shot LLM baseline (Llama-3 8B). The table below displays the final **Hit Rate (0.0 to 1.0)**.
-| Task (Workload) | Random Eviction | LRU | LFU | LLM Agent (Zero-Shot) |
-| :--- | :--- | :--- | :--- | :--- |
-| **Easy (Zipfian)** | 0.64 | 0.18 | 0.44 | **0.67** |
-| **Medium (Sequential)** | 0.35 | 0.00 | 0.08 | **0.16** |
-| **Hard (Shifting)** | **0.35** | 0.04 | 0.13 | 0.12 |
 **Key Insights for Researchers:**
-* **The Sequential Trap:** As proven by the Medium task, standard LRU algorithms achieve a mathematical **0.00 hit rate** when faced with sequence loops larger than the cache size. The LLM demonstrates foundational reasoning to break this loop, outperforming both LRU and LFU.
-* **The Shifting Challenge:** The Hard task proves that static frequency counters (LFU) and smaller zero-shot LLMs both struggle to adapt to sudden data shifts. This sets a clear, rigorous benchmark for future Reinforcement Learning agents to conquer.
 ---
@@ -97,9 +101,7 @@ uv sync
 ```bash
 #create .env file in root directory
-LLM_API_KEY="model api key"
-LLM_BASE_URL="model api url"
-LLM_MODEL_NAME="model name"
 ```
 ### 2. Running the Inference Agent

 ## 📊 Baseline Comparisons
+To demonstrate the necessity of intelligent eviction policies, this environment provides benchmark scores comparing traditional operating system algorithms against various iterations of an LLM agent (Llama-3 8B). The table below displays the final **Hit Rate (0.0 to 1.0)**.
+| Task (Workload) | Random | LRU | LFU | LLM (Zero-Shot) | LLM (Memory, No CoT) | LLM (Memory + CoT) |
+| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
+| **Easy (Zipfian)** | 0.64 | 0.18 | 0.44 | **0.67** | 0.43 | 0.53 |
+| **Medium (Sequential)** | **0.35** | 0.00 | 0.08 | 0.16 | 0.06 | 0.29 |
+| **Hard (Shifting)** | **0.35** | 0.04 | 0.13 | 0.12 | 0.08 | 0.16 |
+*Note: While Random Eviction occasionally scores artificially high through pure statistical variance, it is non-deterministic and mathematically unsafe for production systems.*
 **Key Insights for Researchers:**
+* **The Sequential Trap (LRU Failure):** As proven by the Medium task, standard LRU algorithms achieve a mathematical **0.00 hit rate** when faced with sequence loops larger than the cache size.
+* **The Danger of Context Overload:** When the LLM was initially given a 15-step memory window without a reasoning space (`Memory, No CoT`), its performance *dropped* across all tasks. The model became overwhelmed by the dense history block, blinding it to immediate cache states.
+* **The Power of Chain-of-Thought (CoT):** By forcing the agent to output a JSON `"reasoning"` string prior to selecting an eviction index, the model gained the computational processing space needed to analyze its own memory. This single architectural change nearly quintupled its performance on the Medium task (0.06 → 0.29) and doubled its performance on the Hard task (0.08 → 0.16), proving the agent successfully learned to "pin" items to break loops and proactively flush obsolete data during phase shifts.
+* **The Parameter Bottleneck:** While the 8B parameter model successfully proves the agentic memory architecture works, the absolute scores indicate that smaller models struggle to flawlessly execute complex heuristics like Belady's MIN. This environment sets a rigorous, ready-made benchmark for Reinforcement Learning models and 70B+ reasoning models to conquer.
 ---
 ```bash
 #create .env file in root directory
+HF_TOKEN="you api key"
 ```
 ### 2. Running the Inference Agent