pragunk commited on
Commit
3aab892
Β·
verified Β·
1 Parent(s): 8884934

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -102
README.md CHANGED
@@ -1,103 +1,117 @@
1
- # 🧠 Adaptive Cache Manager (OpenEnv)
2
-
3
- An OpenEnv-compliant reinforcement learning and agentic AI environment that simulates a high-performance operating system memory manager.
4
-
5
- Instead of relying on static, heuristic-based algorithms like LRU (Least Recently Used) or LFU (Least Frequently Used), this environment challenges frontier AI agents to dynamically learn and execute optimal cache eviction policies against complex, shifting workloads.
6
-
7
- ## 🌍 Real-World Utility & Motivation
8
- Every modern operating system, database management system (DBMS), and CDN relies heavily on cache efficiency. A 1% increase in cache hit rates can save massive amounts of compute, bandwidth, and energy.
9
-
10
- However, standard algorithms fail when traffic patterns change abruptly or fall into sequential loops. This environment isolates that specific, high-value DevOps/DBA problem. It moves away from "toy" text-parsing tasks and provides a pure, mathematically grounded testbed for reasoning models and RL agents to prove their algorithmic optimization capabilities.
11
-
12
- ---
13
-
14
- ## πŸ›  Environment Design: Spaces & Rewards
15
-
16
- The environment strictly implements the OpenEnv API via typed Pydantic models.
17
-
18
- ### Observation Space
19
- The agent receives a lightweight, numerical snapshot of the memory system at the exact moment a cache miss occurs.
20
- * `incoming_request` (int): The ID of the data item currently requested by the system.
21
- * `cache_state` (List[int]): The current items residing in the cache slots (-1 indicates an empty slot).
22
- * `idle_times` (List[int]): The number of timesteps since each specific cache slot was last accessed.
23
-
24
- ### Action Space
25
- The agent must decide which slot to free up.
26
- * `evict_index` (int): A discrete integer (0 to capacity-1) representing the index of the cache slot to overwrite.
27
-
28
- ### Reward Function
29
- The environment provides a dense, step-by-step reward signal directly correlated to system performance:
30
- * **`+1.0`** for every Cache Hit (including consecutive hits safely fast-forwarded without agent intervention).
31
- * **`-1.0`** for a Cache Miss (forcing the agent to step in and evict).
32
-
33
- ---
34
-
35
- ## πŸ† Tasks & Difficulty Progression
36
-
37
- The environment features three programmatic workloads (tasks) designed to challenge agents with distinctly different access patterns. The **Grader** for all tasks deterministically calculates the final **Hit Rate (0.0 to 1.0)**.
38
-
39
- 1. **`cache-zipfian-easy` (Easy)**
40
- * **Workload:** A Zipfian (power-law) distribution simulating standard web traffic. A few items are requested constantly; a long tail is requested rarely.
41
- * **Goal:** Outperform random eviction by pinning the most frequently requested items.
42
-
43
- 2. **`cache-sequential-medium` (Medium)**
44
- * **Workload:** A looping sequential scan (e.g., requesting items 1 through 12 in a loop for a cache of size 10).
45
- * **Goal:** Standard LRU algorithms achieve a **0% hit rate** here. The agent must break static logic and learn to pin a subset of the sequence to guarantee hits.
46
-
47
- 3. **`cache-shifting-hard` (Hard)**
48
- * **Workload:** Abruptly shifting working sets. The first half heavily favors one block of data; the second half abruptly shifts entirely to a different block.
49
- * **Goal:** Requires rapid, aggressive adaptation to flush obsolete items. Often acts as a stumbling block for zero-shot LLMs, requiring true RL or deep reasoning.
50
-
51
- ---
52
-
53
- ## πŸš€ Setup & Execution
54
-
55
- ### 1. Local Virtual Environment Setup
56
- Ensure you are using Python 3.10 or higher (Python 3.13 is fully supported).
57
-
58
- ```bash
59
- # Create and activate virtual environment
60
- python -m venv venv
61
- source venv/bin/activate # On Windows use: venv\Scripts\activate
62
-
63
- # Install dependencies
64
- pip install -r requirements.txt
65
- ```
66
-
67
- ### 2. Running the Baseline Agent
68
- The baseline script uses Groq's Llama-3 model to evaluate the environment via the official OpenAI Python SDK, satisfying the OpenEnv API client requirement while remaining 100% free and lightning-fast.
69
-
70
- ```bash
71
- # Export your free Groq API key (get one at console.groq.com)
72
- export GROQ_API_KEY="your-api-key-here"
73
-
74
- # Run the baseline evaluation across all 3 tasks
75
- python baseline.py
76
- ```
77
-
78
- ### 3. Docker & Hugging Face Deployment
79
- This environment is fully containerized and designed for deployment as a Hugging Face Space.
80
-
81
- ```bash
82
- # Build the image
83
- docker build -t adaptive-cache-env .
84
-
85
- # Run the container (pass your API key)
86
- docker run -e GROQ_API_KEY="your-api-key-here" adaptive-cache-env
87
- ```
88
-
89
- ## πŸ“‚ Project Structure
90
-
91
- ```bash
92
- adaptive-cache-env/
93
- β”œβ”€β”€ Dockerfile # Container configuration for HF Spaces
94
- β”œβ”€β”€ requirements.txt # Project dependencies (NumPy 2.x, Pydantic, OpenAI SDK)
95
- β”œβ”€β”€ openenv.yaml # OpenEnv task and metadata specifications
96
- β”œβ”€β”€ baseline.py # Baseline LLM inference script
97
- β”œβ”€β”€ README.md # Project documentation
98
- └── adaptive_cache/
99
- β”œβ”€β”€ __init__.py
100
- β”œβ”€β”€ simulator.py # Core OS-level array and memory simulation
101
- β”œβ”€β”€ workloads.py # Deterministic task generators (Zipfian, Sequential, etc.)
102
- └── env.py # OpenEnv wrapper and Pydantic models
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
  ```
 
1
+ ---
2
+ title: Adaptive Cache Manager
3
+ emoji: 🧠
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: docker
7
+ pinned: false
8
+ tags:
9
+ - openenv
10
+ - reinforcement-learning
11
+ - agents
12
+ ---
13
+
14
+
15
+ # 🧠 Adaptive Cache Manager (OpenEnv)
16
+
17
+ An OpenEnv-compliant reinforcement learning and agentic AI environment that simulates a high-performance operating system memory manager.
18
+
19
+ Instead of relying on static, heuristic-based algorithms like LRU (Least Recently Used) or LFU (Least Frequently Used), this environment challenges frontier AI agents to dynamically learn and execute optimal cache eviction policies against complex, shifting workloads.
20
+
21
+ ## 🌍 Real-World Utility & Motivation
22
+ Every modern operating system, database management system (DBMS), and CDN relies heavily on cache efficiency. A 1% increase in cache hit rates can save massive amounts of compute, bandwidth, and energy.
23
+
24
+ However, standard algorithms fail when traffic patterns change abruptly or fall into sequential loops. This environment isolates that specific, high-value DevOps/DBA problem. It moves away from "toy" text-parsing tasks and provides a pure, mathematically grounded testbed for reasoning models and RL agents to prove their algorithmic optimization capabilities.
25
+
26
+ ---
27
+
28
+ ## πŸ›  Environment Design: Spaces & Rewards
29
+
30
+ The environment strictly implements the OpenEnv API via typed Pydantic models.
31
+
32
+ ### Observation Space
33
+ The agent receives a lightweight, numerical snapshot of the memory system at the exact moment a cache miss occurs.
34
+ * `incoming_request` (int): The ID of the data item currently requested by the system.
35
+ * `cache_state` (List[int]): The current items residing in the cache slots (-1 indicates an empty slot).
36
+ * `idle_times` (List[int]): The number of timesteps since each specific cache slot was last accessed.
37
+
38
+ ### Action Space
39
+ The agent must decide which slot to free up.
40
+ * `evict_index` (int): A discrete integer (0 to capacity-1) representing the index of the cache slot to overwrite.
41
+
42
+ ### Reward Function
43
+ The environment provides a dense, step-by-step reward signal directly correlated to system performance:
44
+ * **`+1.0`** for every Cache Hit (including consecutive hits safely fast-forwarded without agent intervention).
45
+ * **`-1.0`** for a Cache Miss (forcing the agent to step in and evict).
46
+
47
+ ---
48
+
49
+ ## πŸ† Tasks & Difficulty Progression
50
+
51
+ The environment features three programmatic workloads (tasks) designed to challenge agents with distinctly different access patterns. The **Grader** for all tasks deterministically calculates the final **Hit Rate (0.0 to 1.0)**.
52
+
53
+ 1. **`cache-zipfian-easy` (Easy)**
54
+ * **Workload:** A Zipfian (power-law) distribution simulating standard web traffic. A few items are requested constantly; a long tail is requested rarely.
55
+ * **Goal:** Outperform random eviction by pinning the most frequently requested items.
56
+
57
+ 2. **`cache-sequential-medium` (Medium)**
58
+ * **Workload:** A looping sequential scan (e.g., requesting items 1 through 12 in a loop for a cache of size 10).
59
+ * **Goal:** Standard LRU algorithms achieve a **0% hit rate** here. The agent must break static logic and learn to pin a subset of the sequence to guarantee hits.
60
+
61
+ 3. **`cache-shifting-hard` (Hard)**
62
+ * **Workload:** Abruptly shifting working sets. The first half heavily favors one block of data; the second half abruptly shifts entirely to a different block.
63
+ * **Goal:** Requires rapid, aggressive adaptation to flush obsolete items. Often acts as a stumbling block for zero-shot LLMs, requiring true RL or deep reasoning.
64
+
65
+ ---
66
+
67
+ ## πŸš€ Setup & Execution
68
+
69
+ ### 1. Local Virtual Environment Setup
70
+ Ensure you are using Python 3.10 or higher (Python 3.13 is fully supported).
71
+
72
+ ```bash
73
+ # Create and activate virtual environment
74
+ python -m venv venv
75
+ source venv/bin/activate # On Windows use: venv\Scripts\activate
76
+
77
+ # Install dependencies
78
+ pip install -r requirements.txt
79
+ ```
80
+
81
+ ### 2. Running the Baseline Agent
82
+ The baseline script uses Groq's Llama-3 model to evaluate the environment via the official OpenAI Python SDK, satisfying the OpenEnv API client requirement while remaining 100% free and lightning-fast.
83
+
84
+ ```bash
85
+ # Export your free Groq API key (get one at console.groq.com)
86
+ export GROQ_API_KEY="your-api-key-here"
87
+
88
+ # Run the baseline evaluation across all 3 tasks
89
+ python baseline.py
90
+ ```
91
+
92
+ ### 3. Docker & Hugging Face Deployment
93
+ This environment is fully containerized and designed for deployment as a Hugging Face Space.
94
+
95
+ ```bash
96
+ # Build the image
97
+ docker build -t adaptive-cache-env .
98
+
99
+ # Run the container (pass your API key)
100
+ docker run -e GROQ_API_KEY="your-api-key-here" adaptive-cache-env
101
+ ```
102
+
103
+ ## πŸ“‚ Project Structure
104
+
105
+ ```bash
106
+ adaptive-cache-env/
107
+ β”œβ”€β”€ Dockerfile # Container configuration for HF Spaces
108
+ β”œβ”€β”€ requirements.txt # Project dependencies (NumPy 2.x, Pydantic, OpenAI SDK)
109
+ β”œβ”€β”€ openenv.yaml # OpenEnv task and metadata specifications
110
+ β”œβ”€β”€ baseline.py # Baseline LLM inference script
111
+ β”œβ”€β”€ README.md # Project documentation
112
+ └── adaptive_cache/
113
+ β”œβ”€β”€ __init__.py
114
+ β”œβ”€β”€ simulator.py # Core OS-level array and memory simulation
115
+ β”œβ”€β”€ workloads.py # Deterministic task generators (Zipfian, Sequential, etc.)
116
+ └── env.py # OpenEnv wrapper and Pydantic models
117
  ```