pragunk commited on
Commit
0e6ed14
Β·
verified Β·
1 Parent(s): 1b13f5d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -21
README.md CHANGED
@@ -11,7 +11,6 @@ tags:
11
  - agents
12
  ---
13
 
14
-
15
  # 🧠 Adaptive Cache Manager (OpenEnv)
16
 
17
  An OpenEnv-compliant reinforcement learning and agentic AI environment that simulates a high-performance operating system memory manager.
@@ -27,7 +26,7 @@ However, standard algorithms fail when traffic patterns change abruptly or fall
27
 
28
  ## πŸ›  Environment Design: Spaces & Rewards
29
 
30
- The environment strictly implements the OpenEnv API via typed Pydantic models.
31
 
32
  ### Observation Space
33
  The agent receives a lightweight, numerical snapshot of the memory system at the exact moment a cache miss occurs.
@@ -41,7 +40,7 @@ The agent must decide which slot to free up.
41
 
42
  ### Reward Function
43
  The environment provides a dense, step-by-step reward signal directly correlated to system performance:
44
- * **`+1.0`** for every Cache Hit (including consecutive hits safely fast-forwarded without agent intervention).
45
  * **`-1.0`** for a Cache Miss (forcing the agent to step in and evict).
46
 
47
  ---
@@ -64,54 +63,86 @@ The environment features three programmatic workloads (tasks) designed to challe
64
 
65
  ---
66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  ## πŸš€ Setup & Execution
68
 
69
- ### 1. Local Virtual Environment Setup
70
- Ensure you are using Python 3.10 or higher (Python 3.13 is fully supported).
71
 
72
  ```bash
73
- # Create and activate virtual environment
74
- python -m venv venv
75
- source venv/bin/activate # On Windows use: venv\Scripts\activate
 
 
 
 
 
76
 
77
- # Install dependencies
78
- pip install -r requirements.txt
 
 
 
79
  ```
80
 
81
- ### 2. Running the Baseline Agent
82
- The baseline script uses Groq's Llama-3 model to evaluate the environment via the official OpenAI Python SDK, satisfying the OpenEnv API client requirement while remaining 100% free and lightning-fast.
 
 
83
 
84
  ```bash
85
- # Export your free Groq API key (get one at console.groq.com)
86
  export GROQ_API_KEY="your-api-key-here"
87
 
88
  # Run the baseline evaluation across all 3 tasks
89
- python baseline.py
90
  ```
91
 
92
  ### 3. Docker & Hugging Face Deployment
93
- This environment is fully containerized and designed for deployment as a Hugging Face Space.
94
 
95
  ```bash
96
- # Build the image
97
  docker build -t adaptive-cache-env .
98
 
99
- # Run the container (pass your API key)
100
- docker run -e GROQ_API_KEY="your-api-key-here" adaptive-cache-env
101
  ```
102
 
103
  ## πŸ“‚ Project Structure
104
 
105
  ```bash
106
  adaptive-cache-env/
107
- β”œβ”€β”€ Dockerfile # Container configuration for HF Spaces
108
- β”œβ”€β”€ requirements.txt # Project dependencies (NumPy 2.x, Pydantic, OpenAI SDK)
 
109
  β”œβ”€β”€ openenv.yaml # OpenEnv task and metadata specifications
110
- β”œβ”€β”€ baseline.py # Baseline LLM inference script
 
111
  β”œβ”€β”€ README.md # Project documentation
 
 
112
  └── adaptive_cache/
113
  β”œβ”€β”€ __init__.py
114
  β”œβ”€β”€ simulator.py # Core OS-level array and memory simulation
115
  β”œβ”€β”€ workloads.py # Deterministic task generators (Zipfian, Sequential, etc.)
116
  └── env.py # OpenEnv wrapper and Pydantic models
 
117
  ```
 
11
  - agents
12
  ---
13
 
 
14
  # 🧠 Adaptive Cache Manager (OpenEnv)
15
 
16
  An OpenEnv-compliant reinforcement learning and agentic AI environment that simulates a high-performance operating system memory manager.
 
26
 
27
  ## πŸ›  Environment Design: Spaces & Rewards
28
 
29
+ The environment strictly implements the OpenEnv API via typed Pydantic models and exposes standard `POST /reset` and `POST /step` web endpoints via FastAPI.
30
 
31
  ### Observation Space
32
  The agent receives a lightweight, numerical snapshot of the memory system at the exact moment a cache miss occurs.
 
40
 
41
  ### Reward Function
42
  The environment provides a dense, step-by-step reward signal directly correlated to system performance:
43
+ * **`+1.0`** for every Cache Hit.
44
  * **`-1.0`** for a Cache Miss (forcing the agent to step in and evict).
45
 
46
  ---
 
63
 
64
  ---
65
 
66
+
67
+ ## πŸ“Š Baseline Comparisons
68
+
69
+ To demonstrate the necessity of intelligent eviction policies, this environment provides benchmark scores comparing traditional operating system algorithms against a zero-shot LLM baseline (Llama-3 8B). The table below displays the final **Hit Rate (0.0 to 1.0)**.
70
+
71
+ | Task (Workload) | Random Eviction | LRU | LFU | LLM Agent (Zero-Shot) |
72
+ | :--- | :--- | :--- | :--- | :--- |
73
+ | **Easy (Zipfian)** | 0.64 | 0.18 | 0.44 | **0.67** |
74
+ | **Medium (Sequential)** | 0.35 | 0.00 | 0.08 | **0.16** |
75
+ | **Hard (Shifting)** | **0.35** | 0.04 | 0.13 | 0.12 |
76
+
77
+ **Key Insights for Researchers:**
78
+ * **The Sequential Trap:** As proven by the Medium task, standard LRU algorithms achieve a mathematical **0.00 hit rate** when faced with sequence loops larger than the cache size. The LLM demonstrates foundational reasoning to break this loop, outperforming both LRU and LFU.
79
+ * **The Shifting Challenge:** The Hard task proves that static frequency counters (LFU) and smaller zero-shot LLMs both struggle to adapt to sudden data shifts. This sets a clear, rigorous benchmark for future Reinforcement Learning agents to conquer.
80
+
81
+ ---
82
+
83
  ## πŸš€ Setup & Execution
84
 
85
+ ### 1. Local Setup (Modern `uv` package manager)
86
+ This project uses modern Python packaging via `pyproject.toml` and `uv.lock`.
87
 
88
  ```bash
89
+ # Install the ultra-fast uv package manager
90
+ pip install uv
91
+
92
+ # Create virtual environment and install dependencies
93
+ uv venv
94
+ source .venv/bin/activate # On Windows use: .venv\Scripts\activate
95
+ uv sync
96
+ ```
97
 
98
+ ```bash
99
+ #create .env file in root directory
100
+ LLM_API_KEY="model api key"
101
+ LLM_BASE_URL="model api url"
102
+ LLM_MODEL_NAME="model name"
103
  ```
104
 
105
+ ### 2. Running the Inference Agent
106
+ The inference.py script evaluates the environment using a zero-shot LLM baseline via the official OpenAI Python SDK.
107
+
108
+ (Note: To ensure tests can be run repeatedly without cost during development, the script reads from the strict OPENAI_API_KEY variable as per OpenEnv specs, but the base URL can be pointed to Groq's free models).
109
 
110
  ```bash
111
+ # Export your API key
112
  export GROQ_API_KEY="your-api-key-here"
113
 
114
  # Run the baseline evaluation across all 3 tasks
115
+ python inference.py
116
  ```
117
 
118
  ### 3. Docker & Hugging Face Deployment
119
+ This environment is fully containerized, web-server enabled (FastAPI/Uvicorn), and designed for multi-mode deployment as a Hugging Face Space.
120
 
121
  ```bash
122
+ # Build the image locally
123
  docker build -t adaptive-cache-env .
124
 
125
+ # Run the container locally (boots the FastAPI server on port 7860)
126
+ docker run -p 7860:7860 adaptive-cache-env
127
  ```
128
 
129
  ## πŸ“‚ Project Structure
130
 
131
  ```bash
132
  adaptive-cache-env/
133
+ β”œβ”€β”€ Dockerfile # Container configuration pointing to server.app
134
+ β”œβ”€β”€ pyproject.toml # Modern build system & OpenEnv core dependencies
135
+ β”œβ”€β”€ uv.lock # Strict dependency lockfile
136
  β”œβ”€β”€ openenv.yaml # OpenEnv task and metadata specifications
137
+ β”œβ”€β”€ inference.py # Baseline LLM inference script
138
+ β”œβ”€β”€ test_env.py # Deterministic grader bounds validation
139
  β”œβ”€β”€ README.md # Project documentation
140
+ β”œβ”€β”€ server/
141
+ β”‚ └── app.py # FastAPI web server and OpenEnv POST endpoints
142
  └── adaptive_cache/
143
  β”œβ”€β”€ __init__.py
144
  β”œβ”€β”€ simulator.py # Core OS-level array and memory simulation
145
  β”œβ”€β”€ workloads.py # Deterministic task generators (Zipfian, Sequential, etc.)
146
  └── env.py # OpenEnv wrapper and Pydantic models
147
+
148
  ```