Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -66,17 +66,21 @@ The environment features three programmatic workloads (tasks) designed to challe
|
|
| 66 |
|
| 67 |
## 📊 Baseline Comparisons
|
| 68 |
|
| 69 |
-
To demonstrate the necessity of intelligent eviction policies, this environment provides benchmark scores comparing traditional operating system algorithms against
|
| 70 |
|
| 71 |
-
| Task (Workload) | Random
|
| 72 |
-
| :--- | :--- | :--- | :--- | :--- |
|
| 73 |
-
| **Easy (Zipfian)** | 0.64 | 0.18 | 0.44 | **0.67** |
|
| 74 |
-
| **Medium (Sequential)** | 0.35 | 0.00 | 0.08 |
|
| 75 |
-
| **Hard (Shifting)** | **0.35** | 0.04 | 0.13 | 0.12 |
|
|
|
|
|
|
|
| 76 |
|
| 77 |
**Key Insights for Researchers:**
|
| 78 |
-
* **The Sequential Trap:** As proven by the Medium task, standard LRU algorithms achieve a mathematical **0.00 hit rate** when faced with sequence loops larger than the cache size.
|
| 79 |
-
* **The
|
|
|
|
|
|
|
| 80 |
|
| 81 |
---
|
| 82 |
|
|
@@ -97,9 +101,7 @@ uv sync
|
|
| 97 |
|
| 98 |
```bash
|
| 99 |
#create .env file in root directory
|
| 100 |
-
|
| 101 |
-
LLM_BASE_URL="model api url"
|
| 102 |
-
LLM_MODEL_NAME="model name"
|
| 103 |
```
|
| 104 |
|
| 105 |
### 2. Running the Inference Agent
|
|
|
|
| 66 |
|
| 67 |
## 📊 Baseline Comparisons
|
| 68 |
|
| 69 |
+
To demonstrate the necessity of intelligent eviction policies, this environment provides benchmark scores comparing traditional operating system algorithms against various iterations of an LLM agent (Llama-3 8B). The table below displays the final **Hit Rate (0.0 to 1.0)**.
|
| 70 |
|
| 71 |
+
| Task (Workload) | Random | LRU | LFU | LLM (Zero-Shot) | LLM (Memory, No CoT) | LLM (Memory + CoT) |
|
| 72 |
+
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|
| 73 |
+
| **Easy (Zipfian)** | 0.64 | 0.18 | 0.44 | **0.67** | 0.43 | 0.53 |
|
| 74 |
+
| **Medium (Sequential)** | **0.35** | 0.00 | 0.08 | 0.16 | 0.06 | 0.29 |
|
| 75 |
+
| **Hard (Shifting)** | **0.35** | 0.04 | 0.13 | 0.12 | 0.08 | 0.16 |
|
| 76 |
+
|
| 77 |
+
*Note: While Random Eviction occasionally scores artificially high through pure statistical variance, it is non-deterministic and mathematically unsafe for production systems.*
|
| 78 |
|
| 79 |
**Key Insights for Researchers:**
|
| 80 |
+
* **The Sequential Trap (LRU Failure):** As proven by the Medium task, standard LRU algorithms achieve a mathematical **0.00 hit rate** when faced with sequence loops larger than the cache size.
|
| 81 |
+
* **The Danger of Context Overload:** When the LLM was initially given a 15-step memory window without a reasoning space (`Memory, No CoT`), its performance *dropped* across all tasks. The model became overwhelmed by the dense history block, blinding it to immediate cache states.
|
| 82 |
+
* **The Power of Chain-of-Thought (CoT):** By forcing the agent to output a JSON `"reasoning"` string prior to selecting an eviction index, the model gained the computational processing space needed to analyze its own memory. This single architectural change nearly quintupled its performance on the Medium task (0.06 → 0.29) and doubled its performance on the Hard task (0.08 → 0.16), proving the agent successfully learned to "pin" items to break loops and proactively flush obsolete data during phase shifts.
|
| 83 |
+
* **The Parameter Bottleneck:** While the 8B parameter model successfully proves the agentic memory architecture works, the absolute scores indicate that smaller models struggle to flawlessly execute complex heuristics like Belady's MIN. This environment sets a rigorous, ready-made benchmark for Reinforcement Learning models and 70B+ reasoning models to conquer.
|
| 84 |
|
| 85 |
---
|
| 86 |
|
|
|
|
| 101 |
|
| 102 |
```bash
|
| 103 |
#create .env file in root directory
|
| 104 |
+
HF_TOKEN="you api key"
|
|
|
|
|
|
|
| 105 |
```
|
| 106 |
|
| 107 |
### 2. Running the Inference Agent
|