ShreeshantXD commited on
Commit
574589d
·
1 Parent(s): 6d74982

docs: update README for project overview, quick start instructions, and API details

Browse files
Files changed (1) hide show
  1. README.md +158 -143
README.md CHANGED
@@ -1,221 +1,236 @@
1
- # GridMind-RL 🏢⚡🤖
2
 
3
- **An AI-powered energy management simulator** - Watch an AI agent learn to control building energy systems using real-time electricity prices, temperature control, and grid demands.
4
-
5
- > **New to AI or coding?** No problem! This guide will get you running in 10 minutes.
6
 
7
  ---
8
 
9
- ## 🚀 Quick Start (3 Steps)
10
-
11
- 1. **Get a free AI API key** from [Hugging Face](https://huggingface.co/join) (takes 2 minutes)
12
- 2. **Run the simulator**: `docker run -p 7860:7860 -p 7861:7861 ghcr.io/your-repo/gridmind-rl:latest`
13
- 3. **Watch the AI learn**: `python inference.py --episodes 1` (or `--fast-mode` for a quick heuristic run, no API calls)
14
 
15
- That's it! The AI will start making energy decisions and you'll see live results.
16
-
17
- ---
18
 
19
- ## 📖 What is GridMind-RL?
20
 
21
- Imagine you're managing a commercial building's energy use. Electricity costs change every 15 minutes, the weather fluctuates, and the power grid sometimes needs help. Your job? Keep the building comfortable while saving money and helping the grid.
22
 
23
- **GridMind-RL** is a computer simulation where an AI "brain" (like ChatGPT) learns to make these decisions. It controls:
24
- - 🏭 HVAC cooling/heating
25
- - 🔋 Thermal energy storage
26
- - Batch process scheduling
27
- - Load shedding during grid emergencies
28
-
29
- The AI learns through trial and error, getting "rewards" for good decisions (saving money, staying comfortable) and "penalties" for bad ones (wasting energy, uncomfortable temperatures).
30
 
31
  ---
32
 
33
- ## 🛠️ Setup Guide
34
-
35
- ### Prerequisites (What You Need First)
36
-
37
- - **🐳 Docker** - Download from [docker.com](https://www.docker.com/products/docker-desktop) (free)
38
- - **🐍 Python 3.9+** - Download from [python.org](https://www.python.org/downloads/) (free)
39
- - **🔑 Hugging Face API Key** - Free account at [huggingface.co](https://huggingface.co/join)
40
 
41
- ### Step 1: Get Your Free AI API Key
42
 
43
- 1. Go to [https://huggingface.co/join](https://huggingface.co/join) and create a free account
44
- 2. Click your profile → Settings → Access Tokens
45
- 3. Click "New token", name it `gridmind`, select "Read" role
46
- 4. Copy the token (starts with `hf_...`)
47
 
48
- **This is free!** No credit card needed.
 
49
 
50
- ### Step 2: Download and Run the Simulator
51
 
52
- #### Option A: Docker (Easiest - Recommended)
53
 
54
- First, build the simulator:
55
  ```bash
56
  docker build -t gridmind-rl .
 
57
  ```
58
 
59
- Then run it:
60
- ```bash
61
- docker run -p 7860:7860 -p 7861:7861 gridmind-rl
62
- ```
63
 
64
- The simulator starts on:
65
- - **API Server**: http://localhost:7860 (for the AI)
66
- - **Live Dashboard**: http://localhost:7861 (watch in your browser!)
67
 
68
- #### Option B: Manual Setup (If Docker Doesn't Work)
69
 
70
- **Install Go** (for the simulator):
71
- - Download from [go.dev/dl](https://go.dev/dl/)
72
- - Install and restart your terminal
73
 
74
- **Run the simulator**:
75
  ```bash
76
- # Start the energy environment
77
- go run main.go
78
  ```
79
 
80
- **On Windows** (if you have the pre-built executable):
81
- ```powershell
82
- # Run the compiled version (faster startup)
83
- .\grid.exe
84
- ```
85
 
86
- **Install Python tools**:
87
  ```bash
88
- # Install required packages
89
  pip install -r python/requirements.txt
90
  ```
91
 
92
- **Start the Visualization Dashboard**:
93
- Since you're running manually, the visualization dashboard needs to be started in a new terminal window:
94
- ```bash
95
- python -m uvicorn dashboard.server:app --host 0.0.0.0 --port 7861
 
96
  ```
97
 
98
- ### Step 3: Configure the AI
99
 
100
- **On Windows (PowerShell - Recommended)**:
101
- ```powershell
102
- $env:API_BASE_URL = "https://router.huggingface.co/v1"
103
- $env:MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
104
- $env:HF_TOKEN = "hf_your_token_here" # Paste your token here
105
  ```
106
 
107
- **On Windows (Command Prompt)**:
108
- ```cmd
109
- set API_BASE_URL=https://router.huggingface.co/v1
110
- set MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
111
- set HF_TOKEN=hf_your_token_here
112
  ```
113
 
114
- **On Mac/Linux**:
 
 
 
115
  ```bash
 
116
  export API_BASE_URL=https://router.huggingface.co/v1
117
  export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
118
- export HF_TOKEN=hf_your_token_here
 
119
  ```
120
 
121
- ### Step 4: Watch the AI Learn!
122
 
123
- ```bash
124
- # Run 3 learning episodes (takes ~5 minutes)
125
- python inference.py --episodes 3
126
- ```
127
 
128
- You'll see output like:
129
- ```
130
- Episode 1/3 - Task 1 (Easy): Learning to save energy...
131
- AI Decision: Lowering HVAC to save $2.50
132
- Score: 0.85
133
 
134
- Episode 2/3 - Task 2 (Medium): Balancing cost + comfort...
135
- AI Decision: Using thermal storage during cheap hours
136
- Score: 0.72
137
- ```
 
 
 
138
 
139
  ---
140
 
141
- ## 📊 What the AI Learns
 
 
142
 
143
- The AI progresses through **3 difficulty levels**:
 
 
 
 
 
 
 
 
 
 
144
 
145
- | Level | Challenge | What It Learns |
146
- |-------|-----------|----------------|
147
- | **Easy** | Save money | Basic energy cost optimization |
148
- | **Medium** | Stay comfortable | Keep building 68-74°F (19-23°C) |
149
- | **Hard** | Handle emergencies | Respond to grid stress + meet production deadlines |
150
 
151
- **Scoring**: 1.0 = Perfect, 0.0 = Random guessing. Good scores are 0.6+.
152
 
153
  ---
154
 
155
- ## 🎮 Interactive Dashboard
156
 
157
- While the AI runs, open http://localhost:7861 in your browser to see:
158
- - 📈 Live energy usage charts
159
- - 🌡️ Temperature trends
160
- - 💰 Cost savings over time
161
- - ⚡ Grid stress responses
162
 
163
- ---
 
 
 
 
164
 
165
- ## 🔧 Troubleshooting
166
 
167
- | Problem | Solution |
168
- |---------|----------|
169
- | `docker: command not found` | Install Docker Desktop from [docker.com](https://www.docker.com/products/docker-desktop) |
170
- | `401 Unauthorized` | Your Hugging Face token is wrong - get a new one at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) |
171
- | `Connection refused` | Make sure the simulator is running (Docker or `go run main.go`) |
172
- | Python errors | Run `pip install -r python/requirements.txt` |
173
- | Model not found | Some models need you to accept terms on Hugging Face first |
 
174
 
175
  ---
176
 
177
- ## 🧠 Technical Details
178
-
179
- ### What the AI Sees (Sensors)
180
- - Current temperature, electricity price, grid stress level
181
- - Battery charge level, time of day, pending work deadlines
182
- - Running energy costs and carbon emissions
183
 
184
- ### What the AI Controls (Actions)
185
- - HVAC power level (0-100%)
186
- - Battery charge/discharge rate
187
- - When to run batch processes
188
- - How much load to shed during emergencies
189
 
190
- ### Reward System
191
- - **Bonus**: Saving money, staying comfortable, helping the grid
192
- - **Penalty**: Wasting energy, temperature extremes, missing deadlines
193
 
194
- ---
195
 
196
- ## 🚀 Advanced Usage
197
 
198
- **Try different AI models**:
199
- ```powershell
200
- $env:MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.3" # Faster but less accurate
 
 
 
 
201
  ```
202
 
203
- **Run longer training**:
204
- ```bash
205
- python inference.py --episodes 10 --llm-every 4 # Scale LLM calls via --llm-every; use --fast-mode for tests
206
- ```
207
 
208
- **Test the environment manually**:
209
- ```bash
210
- python python/validate.py --env-url http://localhost:7860
 
 
 
 
 
 
 
 
 
 
211
  ```
212
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
213
  ---
214
 
215
- ## 📚 Learn More
 
 
 
 
 
 
 
 
 
 
 
 
216
 
217
- - **Reinforcement Learning**: How AI learns through trial and error
218
- - **Energy Management**: Real-world smart grid technologies
219
- - **Hugging Face**: Free platform for AI models and datasets
220
 
221
- **Happy learning!** 🎉 The AI will surprise you with how well it learns to manage energy.
 
1
+ # GridMind-RL
2
 
3
+ **OpenEnv-style environment** for reinforcement learning and LLM agents on **building energy management**: HVAC, thermal storage, demand response, batch job scheduling, and load shedding under time-varying electricity prices and grid stress.
 
 
4
 
5
  ---
6
 
7
+ ## Project overview
 
 
 
 
8
 
9
+ GridMind-RL simulates a **24-hour** control horizon at **15-minute resolution** (96 steps per episode). The agent observes prices, temperature, storage, process load, grid stress, carbon intensity, and batch job deadlines; it acts with continuous and discrete controls aligned with real **demand response** and **industrial/commercial** load-shaping problems.
 
 
10
 
11
+ **Why it matters:** Optimizing flexible loads against **time-of-use pricing** and **grid signals** reduces cost and emissions while respecting comfort and process constraints—an active area for RL and LLM-based control research.
12
 
13
+ **Strengths for judges**
14
 
15
+ | Area | Detail |
16
+ |------|--------|
17
+ | Spec | `openenv.yaml` documents server port, schemas, tasks, and endpoints |
18
+ | API | REST: reset, step, state, grade, health, ping, replay, tasks, metrics |
19
+ | Tasks | Three levels (easy / medium / hard) with deterministic episode grading |
20
+ | Baseline | Root `inference.py` + OpenAI-compatible LLM client and heuristic fallback |
21
+ | Ops | Multi-stage **Docker** image: Go environment + Python dashboard + deps |
22
 
23
  ---
24
 
25
+ ## Quick start (copy-paste)
 
 
 
 
 
 
26
 
27
+ **Minimal flow** (API on **7860** only; keep Docker running, then run `python` in a **second** terminal from the repo root with `pip install -r python/requirements.txt` already done):
28
 
29
+ ```bash
30
+ docker build -t gridmind-rl .
31
+ docker run -p 7860:7860 gridmind-rl
 
32
 
33
+ python inference.py --fast-mode --episodes 1
34
+ ```
35
 
36
+ ### 1. Build and run (Docker)
37
 
38
+ From the **repository root**:
39
 
 
40
  ```bash
41
  docker build -t gridmind-rl .
42
+ docker run --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl
43
  ```
44
 
45
+ - **7860** — Environment API (OpenEnv / agent traffic)
46
+ - **7861** — Web dashboard (optional)
 
 
47
 
48
+ **Windows (PowerShell)** same commands in a terminal with Docker Desktop running.
 
 
49
 
50
+ ### 2. Validate the API (optional)
51
 
52
+ With the container running, from the repo root (host Python with `requests`):
 
 
53
 
 
54
  ```bash
55
+ pip install requests
56
+ python python/validate.py --env-url http://localhost:7860
57
  ```
58
 
59
+ ### 3. Run baseline inference
60
+
61
+ On the **host** (not inside the container unless you set `--env-url` to the env server):
 
 
62
 
 
63
  ```bash
 
64
  pip install -r python/requirements.txt
65
  ```
66
 
67
+ **Windows PowerShell:**
68
+
69
+ ```powershell
70
+ $env:ENV_URL="http://localhost:7860"
71
+ python inference.py --fast-mode --episodes 1
72
  ```
73
 
74
+ **Windows Command Prompt (cmd):**
75
 
76
+ ```bat
77
+ set ENV_URL=http://localhost:7860
78
+ python inference.py --fast-mode --episodes 1
 
 
79
  ```
80
 
81
+ **Linux / macOS:**
82
+
83
+ ```bash
84
+ export ENV_URL=http://localhost:7860
85
+ python inference.py --fast-mode --episodes 1
86
  ```
87
 
88
+ You can run the same entrypoint directly with `python python/inference.py` (e.g. `python python/inference.py --fast-mode`); flags match the root `inference.py` wrapper.
89
+
90
+ **LLM baseline** (requires Hugging Face or other OpenAI-compatible API credentials):
91
+
92
  ```bash
93
+ export ENV_URL=http://localhost:7860
94
  export API_BASE_URL=https://router.huggingface.co/v1
95
  export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
96
+ export HF_TOKEN=your_token_here
97
+ python inference.py --episodes 1 --llm-every 4
98
  ```
99
 
100
+ Results are written to `baseline_scores.json` by default (`--output` to change).
101
 
102
+ ---
 
 
 
103
 
104
+ ## Tasks
 
 
 
 
105
 
106
+ | ID | Difficulty | Name | Objective |
107
+ |----|------------|------|-----------|
108
+ | 1 | Easy | Cost minimization | Minimize total energy cost over the episode. No temperature or batch-job objectives in the grade. |
109
+ | 2 | Medium | Constrained temperature | Minimize cost while keeping indoor temperature within **±2 °C** of setpoint (19–23 °C) for graded temperature compliance. |
110
+ | 3 | Hard | Full demand response | Minimize cost, maintain temperature, respond to **grid stress** (e.g. shed load when stress is high), complete **batch jobs** on time, and reduce **carbon** vs a baseline policy in the composite score. |
111
+
112
+ Episode **grade** is returned by `GET /grade` after the episode completes (or after a partial run if you stopped stepping early). Sub-scores are task-dependent and documented in code (`env/tasks.go`).
113
 
114
  ---
115
 
116
+ ## HTTP API
117
+
118
+ Base URL: `http://<host>:7860` (default in container: port **7860**).
119
 
120
+ | Method | Path | Purpose |
121
+ |--------|------|---------|
122
+ | GET | `/health` | Liveness; JSON `status`, `version` |
123
+ | GET | `/ping` | Lightweight liveness; JSON `status` |
124
+ | POST | `/reset` | Start episode: body e.g. `{"task_id": 1, "seed": 42, "num_buildings": 1}` |
125
+ | POST | `/step` | Advance one step: JSON action or array of actions (multi-building) |
126
+ | GET | `/state` | Full snapshot: buildings, downsampled price/carbon curves, step, task, etc. |
127
+ | GET | `/grade` | Episode score in `[0, 1]`, sub-scores, exploit flags |
128
+ | GET | `/replay` | Step replay list |
129
+ | GET | `/tasks` | Task metadata and grader weights |
130
+ | GET | `/metrics` | Prometheus-style text metrics |
131
 
132
+ **Action JSON fields** (single building): `hvac_power_level`, `thermal_charge_rate`, `batch_job_slot`, `load_shed_fraction`, optional `building_id`.
 
 
 
 
133
 
134
+ Schemas and primary endpoints: **`openenv.yaml`** at repo root (see Notes for additional endpoints like `/metrics`).
135
 
136
  ---
137
 
138
+ ## Evaluation modes (`inference.py`)
139
 
140
+ There is **no** `--judge-mode` flag in this repository. Use the modes below.
 
 
 
 
141
 
142
+ | Mode | Command pattern | Behavior |
143
+ |------|-----------------|----------|
144
+ | **Fast (heuristic)** | `python inference.py --fast-mode` | No LLM calls; deterministic given env seed; fastest for CI or smoke tests. |
145
+ | **Default LLM** | `python inference.py` | Uses OpenAI-compatible API (`API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`); default `--llm-every 4` reuses each LLM action for 4 steps to limit API cost. |
146
+ | **Recommended for automated evaluation / judging** | `python inference.py --fast-mode --episodes 1` | Recommended when automated pipelines need **reproducibility** and **no external API** dependency. |
147
 
148
+ Other useful flags:
149
 
150
+ | Flag | Default | Meaning |
151
+ |------|---------|---------|
152
+ | `--episodes` | `1` | Episodes per task (tasks 1–3 run in sequence) |
153
+ | `--env-url` | `ENV_URL` or `http://localhost:7860` | Environment base URL |
154
+ | `--llm-every` | `4` | Steps per LLM call (ignored in `--fast-mode`) |
155
+ | `--max-steps` | full episode | Stop after N steps; grade reflects **partial** episode |
156
+ | `--output` | `baseline_scores.json` | Results path |
157
+ | `--verbose` | off | Extra step logs |
158
 
159
  ---
160
 
161
+ ## Logging format (baseline)
 
 
 
 
 
162
 
163
+ For each episode the script prints, in order:
 
 
 
 
164
 
165
+ 1. **`[START]`** — episode beginning (after `reset`)
166
+ 2. **`[STEP1]` … `[STEP96]`** (full episode) — one line per successful `POST /step`; a full episode has **96** steps (`[STEP1]` through `[STEP96]`) unless `--max-steps` or an early error stops the loop
167
+ 3. **`[END]`** after `GET /grade` for that episode
168
 
169
+ Additional lines (banners, task headers, `[OK]` / `[WARN]`) may appear; parsers should match the bracketed markers above.
170
 
171
+ Example shape:
172
 
173
+ ```text
174
+ [START]
175
+ [STEP1]
176
+ [STEP2]
177
+ ...
178
+ [STEP96]
179
+ [END]
180
  ```
181
 
182
+ ---
 
 
 
183
 
184
+ ## Architecture
185
+
186
+ ```text
187
+ ┌─────────────────────────────────────────────────────────────┐
188
+ │ Client: python inference.py (LLM or heuristic) │
189
+ │ │ HTTP (reset / step / grade) │
190
+ │ ▼ │
191
+ │ ┌──────────────────┐ ┌─────────────────────────────┐ │
192
+ │ │ gridmind-server │ │ Dashboard (optional) │ │
193
+ │ │ Go :7860 │◄────│ FastAPI + static UI :7861 │ │
194
+ │ │ env/* simulation│ │ proxies /api → :7860 │ │
195
+ │ └──────────────────┘ └─────────────────────────────┘ │
196
+ └─────────────────────────────────────────────────────────────┘
197
  ```
198
 
199
+ - **Core:** `main.go` + `env/` (physics, rewards, tasks, grading)
200
+ - **Baseline:** `inference.py` (root) → `python/inference.py`
201
+ - **Dashboard:** `dashboard/server.py`, `dashboard/static/`
202
+ - **Spec:** `openenv.yaml`
203
+
204
+ ---
205
+
206
+ ## Docker (detailed)
207
+
208
+ | Step | Command |
209
+ |------|---------|
210
+ | Build | `docker build -t gridmind-rl .` |
211
+ | Run (foreground) | `docker run --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
212
+ | Run (background) | `docker run -d --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
213
+ | Stop (background) | `docker stop gridmind` |
214
+ | Inference **inside** container | `docker exec -it gridmind python /app/inference.py --fast-mode --env-url http://127.0.0.1:7860` |
215
+
216
+ The image runs **supervisord** as a non-root user with two programs: Go server (`PORT=7860`) and uvicorn dashboard (`7861`).
217
+
218
  ---
219
 
220
+ ## Notes for judges and operators
221
+
222
+ | Topic | Detail |
223
+ |-------|--------|
224
+ | **Ports** | **7860** = environment API; **7861** = dashboard. Some hosts only expose one public port—API is the required one for OpenEnv-style evaluation. |
225
+ | **Episode length** | **96 steps** = 24 h at 15 min/step. Observation `step` is **0–95** for a full episode. |
226
+ | **`openenv.yaml`** | Lists main endpoints; **`/metrics`** exists at runtime but may not appear in the YAML block—treat as an extra ops endpoint. |
227
+ | **Reproducibility** | Env is seed-controlled. LLM outputs may still vary by provider even at `temperature=0`. |
228
+ | **`--max-steps`** | Produces a **partial** episode; final `GET /grade` reflects that partial trajectory. |
229
+ | **Manual run (no Docker)** | Install Go 1.21+, `go run .` from repo root (default port 7860); install Python deps and run `python inference.py` as above. |
230
+ | **Runtime** | The baseline completes within typical hackathon limits (<20 minutes). |
231
+
232
+ ---
233
 
234
+ ## License
 
 
235
 
236
+ See `LICENSE` in the repository.