Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,6 @@ tags:
|
|
| 11 |
- agents
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
| 15 |
# π§ Adaptive Cache Manager (OpenEnv)
|
| 16 |
|
| 17 |
An OpenEnv-compliant reinforcement learning and agentic AI environment that simulates a high-performance operating system memory manager.
|
|
@@ -27,7 +26,7 @@ However, standard algorithms fail when traffic patterns change abruptly or fall
|
|
| 27 |
|
| 28 |
## π Environment Design: Spaces & Rewards
|
| 29 |
|
| 30 |
-
The environment strictly implements the OpenEnv API via typed Pydantic models.
|
| 31 |
|
| 32 |
### Observation Space
|
| 33 |
The agent receives a lightweight, numerical snapshot of the memory system at the exact moment a cache miss occurs.
|
|
@@ -41,7 +40,7 @@ The agent must decide which slot to free up.
|
|
| 41 |
|
| 42 |
### Reward Function
|
| 43 |
The environment provides a dense, step-by-step reward signal directly correlated to system performance:
|
| 44 |
-
* **`+1.0`** for every Cache Hit
|
| 45 |
* **`-1.0`** for a Cache Miss (forcing the agent to step in and evict).
|
| 46 |
|
| 47 |
---
|
|
@@ -64,54 +63,86 @@ The environment features three programmatic workloads (tasks) designed to challe
|
|
| 64 |
|
| 65 |
---
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
## π Setup & Execution
|
| 68 |
|
| 69 |
-
### 1. Local
|
| 70 |
-
|
| 71 |
|
| 72 |
```bash
|
| 73 |
-
#
|
| 74 |
-
|
| 75 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
| 79 |
```
|
| 80 |
|
| 81 |
-
### 2. Running the
|
| 82 |
-
The
|
|
|
|
|
|
|
| 83 |
|
| 84 |
```bash
|
| 85 |
-
# Export your
|
| 86 |
export GROQ_API_KEY="your-api-key-here"
|
| 87 |
|
| 88 |
# Run the baseline evaluation across all 3 tasks
|
| 89 |
-
python
|
| 90 |
```
|
| 91 |
|
| 92 |
### 3. Docker & Hugging Face Deployment
|
| 93 |
-
This environment is fully containerized and designed for deployment as a Hugging Face Space.
|
| 94 |
|
| 95 |
```bash
|
| 96 |
-
# Build the image
|
| 97 |
docker build -t adaptive-cache-env .
|
| 98 |
|
| 99 |
-
# Run the container (
|
| 100 |
-
docker run -
|
| 101 |
```
|
| 102 |
|
| 103 |
## π Project Structure
|
| 104 |
|
| 105 |
```bash
|
| 106 |
adaptive-cache-env/
|
| 107 |
-
βββ Dockerfile # Container configuration
|
| 108 |
-
βββ
|
|
|
|
| 109 |
βββ openenv.yaml # OpenEnv task and metadata specifications
|
| 110 |
-
βββ
|
|
|
|
| 111 |
βββ README.md # Project documentation
|
|
|
|
|
|
|
| 112 |
βββ adaptive_cache/
|
| 113 |
βββ __init__.py
|
| 114 |
βββ simulator.py # Core OS-level array and memory simulation
|
| 115 |
βββ workloads.py # Deterministic task generators (Zipfian, Sequential, etc.)
|
| 116 |
βββ env.py # OpenEnv wrapper and Pydantic models
|
|
|
|
| 117 |
```
|
|
|
|
| 11 |
- agents
|
| 12 |
---
|
| 13 |
|
|
|
|
| 14 |
# π§ Adaptive Cache Manager (OpenEnv)
|
| 15 |
|
| 16 |
An OpenEnv-compliant reinforcement learning and agentic AI environment that simulates a high-performance operating system memory manager.
|
|
|
|
| 26 |
|
| 27 |
## π Environment Design: Spaces & Rewards
|
| 28 |
|
| 29 |
+
The environment strictly implements the OpenEnv API via typed Pydantic models and exposes standard `POST /reset` and `POST /step` web endpoints via FastAPI.
|
| 30 |
|
| 31 |
### Observation Space
|
| 32 |
The agent receives a lightweight, numerical snapshot of the memory system at the exact moment a cache miss occurs.
|
|
|
|
| 40 |
|
| 41 |
### Reward Function
|
| 42 |
The environment provides a dense, step-by-step reward signal directly correlated to system performance:
|
| 43 |
+
* **`+1.0`** for every Cache Hit.
|
| 44 |
* **`-1.0`** for a Cache Miss (forcing the agent to step in and evict).
|
| 45 |
|
| 46 |
---
|
|
|
|
| 63 |
|
| 64 |
---
|
| 65 |
|
| 66 |
+
|
| 67 |
+
## π Baseline Comparisons
|
| 68 |
+
|
| 69 |
+
To demonstrate the necessity of intelligent eviction policies, this environment provides benchmark scores comparing traditional operating system algorithms against a zero-shot LLM baseline (Llama-3 8B). The table below displays the final **Hit Rate (0.0 to 1.0)**.
|
| 70 |
+
|
| 71 |
+
| Task (Workload) | Random Eviction | LRU | LFU | LLM Agent (Zero-Shot) |
|
| 72 |
+
| :--- | :--- | :--- | :--- | :--- |
|
| 73 |
+
| **Easy (Zipfian)** | 0.64 | 0.18 | 0.44 | **0.67** |
|
| 74 |
+
| **Medium (Sequential)** | 0.35 | 0.00 | 0.08 | **0.16** |
|
| 75 |
+
| **Hard (Shifting)** | **0.35** | 0.04 | 0.13 | 0.12 |
|
| 76 |
+
|
| 77 |
+
**Key Insights for Researchers:**
|
| 78 |
+
* **The Sequential Trap:** As proven by the Medium task, standard LRU algorithms achieve a mathematical **0.00 hit rate** when faced with sequence loops larger than the cache size. The LLM demonstrates foundational reasoning to break this loop, outperforming both LRU and LFU.
|
| 79 |
+
* **The Shifting Challenge:** The Hard task proves that static frequency counters (LFU) and smaller zero-shot LLMs both struggle to adapt to sudden data shifts. This sets a clear, rigorous benchmark for future Reinforcement Learning agents to conquer.
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
## π Setup & Execution
|
| 84 |
|
| 85 |
+
### 1. Local Setup (Modern `uv` package manager)
|
| 86 |
+
This project uses modern Python packaging via `pyproject.toml` and `uv.lock`.
|
| 87 |
|
| 88 |
```bash
|
| 89 |
+
# Install the ultra-fast uv package manager
|
| 90 |
+
pip install uv
|
| 91 |
+
|
| 92 |
+
# Create virtual environment and install dependencies
|
| 93 |
+
uv venv
|
| 94 |
+
source .venv/bin/activate # On Windows use: .venv\Scripts\activate
|
| 95 |
+
uv sync
|
| 96 |
+
```
|
| 97 |
|
| 98 |
+
```bash
|
| 99 |
+
#create .env file in root directory
|
| 100 |
+
LLM_API_KEY="model api key"
|
| 101 |
+
LLM_BASE_URL="model api url"
|
| 102 |
+
LLM_MODEL_NAME="model name"
|
| 103 |
```
|
| 104 |
|
| 105 |
+
### 2. Running the Inference Agent
|
| 106 |
+
The inference.py script evaluates the environment using a zero-shot LLM baseline via the official OpenAI Python SDK.
|
| 107 |
+
|
| 108 |
+
(Note: To ensure tests can be run repeatedly without cost during development, the script reads from the strict OPENAI_API_KEY variable as per OpenEnv specs, but the base URL can be pointed to Groq's free models).
|
| 109 |
|
| 110 |
```bash
|
| 111 |
+
# Export your API key
|
| 112 |
export GROQ_API_KEY="your-api-key-here"
|
| 113 |
|
| 114 |
# Run the baseline evaluation across all 3 tasks
|
| 115 |
+
python inference.py
|
| 116 |
```
|
| 117 |
|
| 118 |
### 3. Docker & Hugging Face Deployment
|
| 119 |
+
This environment is fully containerized, web-server enabled (FastAPI/Uvicorn), and designed for multi-mode deployment as a Hugging Face Space.
|
| 120 |
|
| 121 |
```bash
|
| 122 |
+
# Build the image locally
|
| 123 |
docker build -t adaptive-cache-env .
|
| 124 |
|
| 125 |
+
# Run the container locally (boots the FastAPI server on port 7860)
|
| 126 |
+
docker run -p 7860:7860 adaptive-cache-env
|
| 127 |
```
|
| 128 |
|
| 129 |
## π Project Structure
|
| 130 |
|
| 131 |
```bash
|
| 132 |
adaptive-cache-env/
|
| 133 |
+
βββ Dockerfile # Container configuration pointing to server.app
|
| 134 |
+
βββ pyproject.toml # Modern build system & OpenEnv core dependencies
|
| 135 |
+
βββ uv.lock # Strict dependency lockfile
|
| 136 |
βββ openenv.yaml # OpenEnv task and metadata specifications
|
| 137 |
+
βββ inference.py # Baseline LLM inference script
|
| 138 |
+
βββ test_env.py # Deterministic grader bounds validation
|
| 139 |
βββ README.md # Project documentation
|
| 140 |
+
βββ server/
|
| 141 |
+
β βββ app.py # FastAPI web server and OpenEnv POST endpoints
|
| 142 |
βββ adaptive_cache/
|
| 143 |
βββ __init__.py
|
| 144 |
βββ simulator.py # Core OS-level array and memory simulation
|
| 145 |
βββ workloads.py # Deterministic task generators (Zipfian, Sequential, etc.)
|
| 146 |
βββ env.py # OpenEnv wrapper and Pydantic models
|
| 147 |
+
|
| 148 |
```
|