adityss commited on
Commit
2787b1e
·
1 Parent(s): 4c1963b

feat: add GridMind-RL inference script and update documentation

Browse files
Files changed (2) hide show
  1. README.md +250 -45
  2. python/inference.py +7 -2
README.md CHANGED
@@ -4,13 +4,84 @@ GridMind-RL is an OpenEnv-compliant reinforcement learning environment simulatin
4
 
5
  An RL agent acts as the energy controller, shaping electrical load profiles by adjusting HVAC setpoints, managing thermal storage, and scheduling batch processes. The goal is to optimize operations in response to real-time electricity prices, grid carbon intensity, and utility demand-response signals.
6
 
7
- ## Architecture
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  ```text
10
  ┌──────────────────────┐ ┌─────────────────────────────┐
11
  │ │ │ │
12
  │ LLM RL Agent │◄───────┤ GridMind-RL Server │
13
- │ (Inference Script) │ POST │ (Go OpenEnv Backend) │
14
  │ ├───────►│ Port 7860 │
15
  └──────────────────────┘ Action │ │
16
  └──────────────┬──────────────┘
@@ -24,7 +95,162 @@ An RL agent acts as the energy controller, shaping electrical load profiles by a
24
  └─────────────────────────────┘
25
  ```
26
 
27
- ## Observation Space
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  | Name | Type | Range | Description |
30
  |------|------|-------|-------------|
@@ -40,7 +266,11 @@ An RL agent acts as the energy controller, shaping electrical load profiles by a
40
  | `step` | int | [0, 95] | Current episode timestep (15-min intervals over 24h). |
41
  | `building_id` | int | [0, 2] | ID of the building in multi-building federated mode. |
42
 
43
- ## Action Space
 
 
 
 
44
 
45
  | Name | Type | Range | Description |
46
  |------|------|-------|-------------|
@@ -50,18 +280,9 @@ An RL agent acts as the energy controller, shaping electrical load profiles by a
50
  | `load_shed_fraction` | float | [0.0, 0.5] | Fraction of non-critical load to shed (max 50%). |
51
  | `building_id` | int | [0, 2] | Select which building to apply this action to (federation). |
52
 
53
- ## Tasks
54
-
55
- GridMind-RL features 3 progressively difficult tasks:
56
 
57
- 1. **Task 1: Cost Minimization (Easy)**
58
- Minimize total energy costs by moving load to off-peak periods using thermal storage. No temperature constraints.
59
- 2. **Task 2: Temperature Management (Medium)**
60
- Minimize costs while keeping indoor temperatures strictly within 19°C – 23°C.
61
- 3. **Task 3: Full Demand Response (Hard)**
62
- Minimize cost, maintain temperature, successfully schedule batch jobs before deadlines, and shed loads when the grid stress signal exceeds 0.7.
63
-
64
- ## Reward Function
65
 
66
  The dense reward includes several components:
67
  * **Cost Savings:** Proportional to energy savings vs the baseline flat tariff policy.
@@ -73,38 +294,22 @@ The dense reward includes several components:
73
 
74
  *Exploit Detection:* The grader detects degenerate strategies (e.g. permanently shedding 40% load) and applies up to a 30% score penalty.
75
 
76
- ## Usage
77
-
78
- ### Local Docker Build
79
 
80
- ```bash
81
- docker build -t gridmind-rl .
82
- docker run -p 7860:7860 -p 7861:7861 gridmind-rl
83
- ```
84
 
85
- * Backend OpenEnv server: http://localhost:7860
86
- * Visualization Dashboard: http://localhost:7861
87
-
88
- ### Validating the Environment
89
-
90
- ```bash
91
- python python/validate.py --env-url http://localhost:7860
92
- ```
93
-
94
- ### Running Baseline Inference
95
-
96
- ```bash
97
- export API_BASE_URL=https://api-inference.huggingface.co/v1
98
- export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
99
- export HF_TOKEN=your_token
100
 
101
- # Install dependencies
102
- pip install -r python/requirements.txt
103
 
104
- # Run inference
105
- python python/inference.py --episodes 3
106
- ```
107
 
108
- ## Extensions
109
- * **Multi-building mode:** Switch the environment to 3 buildings via `POST /reset {"num_buildings": 3}` and output action arrays for coordinated dispatch.
110
- * **Add new tasks:** Edit `env/tasks.go` and implement a new `gradeTaskX` component.
 
 
 
 
 
4
 
5
  An RL agent acts as the energy controller, shaping electrical load profiles by adjusting HVAC setpoints, managing thermal storage, and scheduling batch processes. The goal is to optimize operations in response to real-time electricity prices, grid carbon intensity, and utility demand-response signals.
6
 
7
+ ---
8
+
9
+ ## 🙋 Beginner? Start Here
10
+
11
+ If you're new to this project, you probably have these questions:
12
+
13
+ ### ❓ Why do I need an API?
14
+
15
+ In this project, the "brain" that makes energy decisions is an **AI language model (LLM)** — like Llama.
16
+
17
+ Instead of running the full AI model on your own computer (which requires a powerful GPU), you connect to an **API** (Application Programming Interface) — a remote server that already has the model running. You send it the current building state (temperature, price, etc.) and it sends back what action to take (e.g. "charge thermal storage").
18
+
19
+ Think of it like this:
20
+ ```
21
+ Your Computer ──(asks question)──► API Server (has the AI) ──(sends answer)──► Your Computer
22
+ ```
23
+
24
+ Without an API key, your script has no way to reach the AI model and the inference won't work.
25
+
26
+ ---
27
+
28
+ ### ❓ How do I get an API key?
29
+
30
+ This project uses **Hugging Face** — a free platform that hosts AI models.
31
+
32
+ #### Step-by-step:
33
+
34
+ 1. **Create a free account** at [https://huggingface.co/join](https://huggingface.co/join)
35
+
36
+ 2. **Go to your profile → Settings → Access Tokens**
37
+ Direct link: [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
38
+
39
+ 3. Click **"New token"**, give it any name (e.g. `gridmind`), and select role **"Read"**
40
+
41
+ 4. Copy the token — it looks like: `hf_aBcDeFgHiJkLmNoPqRsTuVwXyZ`
42
+
43
+ 5. You'll paste this token in the terminal when running the project (shown below)
44
+
45
+ > **💡 It's free!** Hugging Face's inference API has a free tier that's enough to run this project.
46
+
47
+ ---
48
+
49
+ ### ❓ Why Llama? What even is Llama?
50
+
51
+ **Llama** (Large Language Model Meta AI) is an open-source AI model made by Meta (Facebook). Think of it like a smarter, programmable version of ChatGPT that you can use via an API.
52
+
53
+ **Why this project uses Llama specifically:**
54
+
55
+ | Reason | Explanation |
56
+ |--------|-------------|
57
+ | 🆓 Free to use | Available on Hugging Face at no cost |
58
+ | 📖 Open-source | The weights and code are public — no black box |
59
+ | 🧠 Smart enough | Llama 3.1 8B is capable of reading sensor data and outputting valid JSON actions |
60
+ | ⚡ Fast | The 8B (8 billion parameter) version is small enough to run quickly on Hugging Face's servers |
61
+ | 🔄 OpenAI-compatible | It uses the same API format as OpenAI, so the code works with many models |
62
+
63
+ The model reads the building state (temperature, electricity price, grid stress) and outputs a JSON action like:
64
+ ```json
65
+ {
66
+ "hvac_power_level": 0.4,
67
+ "thermal_charge_rate": 0.5,
68
+ "batch_job_slot": 2,
69
+ "load_shed_fraction": 0.0,
70
+ "building_id": 0
71
+ }
72
+ ```
73
+
74
+ > **You can also swap Llama for any other OpenAI-compatible model** (GPT-4, Mistral, etc.) by changing the environment variables.
75
+
76
+ ---
77
+
78
+ ## 🏗️ Architecture
79
 
80
  ```text
81
  ┌──────────────────────┐ ┌─────────────────────────────┐
82
  │ │ │ │
83
  │ LLM RL Agent │◄───────┤ GridMind-RL Server │
84
+ │ (Python Script) │ POST │ (Go OpenEnv Backend) │
85
  │ ├───────►│ Port 7860 │
86
  └──────────────────────┘ Action │ │
87
  └──────────────┬──────────────┘
 
95
  └─────────────────────────────┘
96
  ```
97
 
98
+ ---
99
+
100
+ ## 🚀 How to Run the Project (Step by Step)
101
+
102
+ There are **two ways** to run this project:
103
+ - **Option A** — Using Docker (recommended, easiest)
104
+ - **Option B** — Running manually without Docker
105
+
106
+ ---
107
+
108
+ ### Option A: Docker (Recommended)
109
+
110
+ Docker packages everything into a container so you don't need to install Go, Python versions, etc. separately.
111
+
112
+ #### Prerequisites
113
+
114
+ - Install Docker Desktop: [https://www.docker.com/products/docker-desktop](https://www.docker.com/products/docker-desktop)
115
+ - A Hugging Face API token (see above ☝️)
116
+
117
+ #### Step 1 — Build the Docker image
118
+
119
+ Open a terminal in the project folder and run:
120
+
121
+ ```bash
122
+ docker build -t gridmind-rl .
123
+ ```
124
+
125
+ This may take a few minutes the first time.
126
+
127
+ #### Step 2 — Start the environment server
128
+
129
+ ```bash
130
+ docker run -p 7860:7860 -p 7861:7861 gridmind-rl
131
+ ```
132
+
133
+ You should see the server start. Keep this terminal open.
134
+
135
+ - **Environment API:** http://localhost:7860
136
+ - **Visualization Dashboard:** http://localhost:7861
137
+
138
+ #### Step 3 — Install Python dependencies
139
+
140
+ Open a **new terminal** (keep the Docker one running) and run:
141
+
142
+ ```bash
143
+ pip install -r python/requirements.txt
144
+ ```
145
+
146
+ #### Step 4 — Set your API credentials
147
+
148
+ **On Windows (Command Prompt):**
149
+ ```cmd
150
+ set API_BASE_URL=https://router.huggingface.co/v1
151
+ set MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
152
+ set HF_TOKEN=hf_your_token_here
153
+ ```
154
+
155
+ **On Windows (PowerShell):**
156
+ ```powershell
157
+ $env:API_BASE_URL = "https://router.huggingface.co/v1"
158
+ $env:MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
159
+ $env:HF_TOKEN = "hf_your_token_here"
160
+ ```
161
+
162
+ **On Mac/Linux:**
163
+ ```bash
164
+ export API_BASE_URL=https://router.huggingface.co/v1
165
+ export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
166
+ export HF_TOKEN=hf_your_token_here
167
+ ```
168
+
169
+ Replace `hf_your_token_here` with your actual Hugging Face token.
170
+
171
+ #### Step 5 — Run the AI agent
172
+
173
+ ```bash
174
+ python python/inference.py --episodes 3
175
+ ```
176
+
177
+ You'll see the agent play through 3 episodes across all 3 tasks and print scores.
178
+
179
+ ---
180
+
181
+ ### Option B: Manual (Without Docker)
182
+
183
+ Use this if you don't have Docker installed.
184
+
185
+ #### Prerequisites
186
+
187
+ - [Go 1.21+](https://go.dev/dl/) — for running the environment server
188
+ - [Python 3.9+](https://www.python.org/downloads/) — for the AI agent script
189
+ - A Hugging Face API token (see above ☝️)
190
+
191
+ #### Step 1 — Start the Go environment server
192
+
193
+ ```bash
194
+ go run main.go
195
+ ```
196
+
197
+ The server starts on port `7860`. Keep this terminal open.
198
+
199
+ #### Step 2 — Open a new terminal and install Python dependencies
200
+
201
+ ```bash
202
+ pip install -r python/requirements.txt
203
+ ```
204
+
205
+ #### Step 3 — Set your API credentials (same as Option A, Step 4 above)
206
+
207
+ #### Step 4 — Validate the environment is working
208
+
209
+ ```bash
210
+ python python/validate.py --env-url http://localhost:7860
211
+ ```
212
+
213
+ You should see a series of checks pass. If they do, you're good to go.
214
+
215
+ #### Step 5 — Run the AI agent
216
+
217
+ ```bash
218
+ python python/inference.py --episodes 3
219
+ ```
220
+
221
+ ---
222
+
223
+ ## 📊 What Happens When You Run It
224
+
225
+ The agent runs through **3 tasks** (Easy → Medium → Hard), each for the number of episodes you specify:
226
+
227
+ | Task | Difficulty | Goal |
228
+ |------|-----------|------|
229
+ | Task 1 | Easy | Minimize energy costs only |
230
+ | Task 2 | Medium | Minimize costs + keep temperature 19°C–23°C |
231
+ | Task 3 | Hard | Costs + temperature + batch job deadlines + grid stress response |
232
+
233
+ At the end, you'll see a score table like:
234
+ ```
235
+ ============================================================
236
+ BASELINE SCORES SUMMARY
237
+ ============================================================
238
+ Task Model Score Episodes
239
+ ------------------------------------------------------------
240
+ Task 1 meta-llama/Llama-3.1-8B-Instruct 0.7823 3
241
+ Task 2 meta-llama/Llama-3.1-8B-Instruct 0.6541 3
242
+ Task 3 meta-llama/Llama-3.1-8B-Instruct 0.5102 3
243
+ ------------------------------------------------------------
244
+ Overall 0.6489
245
+ ```
246
+
247
+ Results are also saved to `baseline_scores.json`.
248
+
249
+ ---
250
+
251
+ ## 📐 Observation Space
252
+
253
+ These are the sensor readings the agent sees at each step:
254
 
255
  | Name | Type | Range | Description |
256
  |------|------|-------|-------------|
 
266
  | `step` | int | [0, 95] | Current episode timestep (15-min intervals over 24h). |
267
  | `building_id` | int | [0, 2] | ID of the building in multi-building federated mode. |
268
 
269
+ ---
270
+
271
+ ## 🕹️ Action Space
272
+
273
+ These are the controls the agent outputs at each step:
274
 
275
  | Name | Type | Range | Description |
276
  |------|------|-------|-------------|
 
280
  | `load_shed_fraction` | float | [0.0, 0.5] | Fraction of non-critical load to shed (max 50%). |
281
  | `building_id` | int | [0, 2] | Select which building to apply this action to (federation). |
282
 
283
+ ---
 
 
284
 
285
+ ## 🏆 Reward Function
 
 
 
 
 
 
 
286
 
287
  The dense reward includes several components:
288
  * **Cost Savings:** Proportional to energy savings vs the baseline flat tariff policy.
 
294
 
295
  *Exploit Detection:* The grader detects degenerate strategies (e.g. permanently shedding 40% load) and applies up to a 30% score penalty.
296
 
297
+ ---
 
 
298
 
299
+ ## 🔧 Extensions
 
 
 
300
 
301
+ * **Multi-building mode:** Switch the environment to 3 buildings via `POST /reset {"num_buildings": 3}` and output action arrays for coordinated dispatch.
302
+ * **Use a different model:** Just change `MODEL_NAME` to any OpenAI-compatible model (e.g. `mistralai/Mistral-7B-Instruct-v0.3`).
303
+ * **Add new tasks:** Edit `env/tasks.go` and implement a new `gradeTaskX` component.
 
 
 
 
 
 
 
 
 
 
 
 
304
 
305
+ ---
 
306
 
307
+ ## Troubleshooting
 
 
308
 
309
+ | Problem | Fix |
310
+ |---------|-----|
311
+ | `Connection refused` on port 7860 | Make sure the Docker container or `go run main.go` is still running |
312
+ | `401 Unauthorized` from Hugging Face | Your `HF_TOKEN` is wrong or expired — generate a new one |
313
+ | `Model not found` error | Some large models require you to accept terms on Hugging Face first. Go to the model page and click "Agree to terms" |
314
+ | Python package errors | Make sure you ran `pip install -r python/requirements.txt` |
315
+ | `docker: command not found` | Install Docker Desktop from [docker.com](https://www.docker.com/products/docker-desktop) |
python/inference.py CHANGED
@@ -5,7 +5,7 @@ Runs an LLM agent against all 3 tasks for N episodes each.
5
  Uses OpenAI-compatible API via API_BASE_URL / MODEL_NAME / HF_TOKEN environment variables.
6
 
7
  Usage:
8
- export API_BASE_URL=https://api-inference.huggingface.co/v1
9
  export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
10
  export HF_TOKEN=hf_xxxx
11
  python python/inference.py [--episodes 3] [--env-url http://localhost:7860]
@@ -26,7 +26,7 @@ from openai import OpenAI
26
  # ── Constants ──────────────────────────────────────────────────────────────
27
 
28
  ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
29
- API_BASE_URL = os.getenv("API_BASE_URL", "https://api-inference.huggingface.co/v1")
30
  MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.1-8B-Instruct")
31
  HF_TOKEN = os.getenv("HF_TOKEN", "")
32
  DEFAULT_EPISODES = 3
@@ -245,6 +245,11 @@ def run_episode(env_client: GridMindEnvClient, agent: LLMAgent,
245
  action = agent.choose_action(obs, task_id)
246
  step_resp = env_client.step(action)
247
 
 
 
 
 
 
248
  obs = step_resp["observation"]
249
  total_reward += step_resp["reward"]
250
  total_steps += 1
 
5
  Uses OpenAI-compatible API via API_BASE_URL / MODEL_NAME / HF_TOKEN environment variables.
6
 
7
  Usage:
8
+ export API_BASE_URL=https://router.huggingface.co/v1
9
  export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
10
  export HF_TOKEN=hf_xxxx
11
  python python/inference.py [--episodes 3] [--env-url http://localhost:7860]
 
26
  # ── Constants ──────────────────────────────────────────────────────────────
27
 
28
  ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
29
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
30
  MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.1-8B-Instruct")
31
  HF_TOKEN = os.getenv("HF_TOKEN", "")
32
  DEFAULT_EPISODES = 3
 
245
  action = agent.choose_action(obs, task_id)
246
  step_resp = env_client.step(action)
247
 
248
+ if step_resp is None or "observation" not in step_resp:
249
+ print(f" [WARN] step {_step}: server returned invalid response, skipping step")
250
+ _step += 1
251
+ break
252
+
253
  obs = step_resp["observation"]
254
  total_reward += step_resp["reward"]
255
  total_steps += 1