Spaces:
Running
Running
Commit ·
574589d
1
Parent(s): 6d74982
docs: update README for project overview, quick start instructions, and API details
Browse files
README.md
CHANGED
|
@@ -1,221 +1,236 @@
|
|
| 1 |
-
# GridMind-RL
|
| 2 |
|
| 3 |
-
**
|
| 4 |
-
|
| 5 |
-
> **New to AI or coding?** No problem! This guide will get you running in 10 minutes.
|
| 6 |
|
| 7 |
---
|
| 8 |
|
| 9 |
-
##
|
| 10 |
-
|
| 11 |
-
1. **Get a free AI API key** from [Hugging Face](https://huggingface.co/join) (takes 2 minutes)
|
| 12 |
-
2. **Run the simulator**: `docker run -p 7860:7860 -p 7861:7861 ghcr.io/your-repo/gridmind-rl:latest`
|
| 13 |
-
3. **Watch the AI learn**: `python inference.py --episodes 1` (or `--fast-mode` for a quick heuristic run, no API calls)
|
| 14 |
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
---
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
| 24 |
-
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
|
| 31 |
---
|
| 32 |
|
| 33 |
-
##
|
| 34 |
-
|
| 35 |
-
### Prerequisites (What You Need First)
|
| 36 |
-
|
| 37 |
-
- **🐳 Docker** - Download from [docker.com](https://www.docker.com/products/docker-desktop) (free)
|
| 38 |
-
- **🐍 Python 3.9+** - Download from [python.org](https://www.python.org/downloads/) (free)
|
| 39 |
-
- **🔑 Hugging Face API Key** - Free account at [huggingface.co](https://huggingface.co/join)
|
| 40 |
|
| 41 |
-
|
| 42 |
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
4. Copy the token (starts with `hf_...`)
|
| 47 |
|
| 48 |
-
|
|
|
|
| 49 |
|
| 50 |
-
###
|
| 51 |
|
| 52 |
-
|
| 53 |
|
| 54 |
-
First, build the simulator:
|
| 55 |
```bash
|
| 56 |
docker build -t gridmind-rl .
|
|
|
|
| 57 |
```
|
| 58 |
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
docker run -p 7860:7860 -p 7861:7861 gridmind-rl
|
| 62 |
-
```
|
| 63 |
|
| 64 |
-
|
| 65 |
-
- **API Server**: http://localhost:7860 (for the AI)
|
| 66 |
-
- **Live Dashboard**: http://localhost:7861 (watch in your browser!)
|
| 67 |
|
| 68 |
-
###
|
| 69 |
|
| 70 |
-
|
| 71 |
-
- Download from [go.dev/dl](https://go.dev/dl/)
|
| 72 |
-
- Install and restart your terminal
|
| 73 |
|
| 74 |
-
**Run the simulator**:
|
| 75 |
```bash
|
| 76 |
-
|
| 77 |
-
|
| 78 |
```
|
| 79 |
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
.\grid.exe
|
| 84 |
-
```
|
| 85 |
|
| 86 |
-
**Install Python tools**:
|
| 87 |
```bash
|
| 88 |
-
# Install required packages
|
| 89 |
pip install -r python/requirements.txt
|
| 90 |
```
|
| 91 |
|
| 92 |
-
**
|
| 93 |
-
|
| 94 |
-
```
|
| 95 |
-
|
|
|
|
| 96 |
```
|
| 97 |
|
| 98 |
-
|
| 99 |
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
$env:MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
|
| 104 |
-
$env:HF_TOKEN = "hf_your_token_here" # Paste your token here
|
| 105 |
```
|
| 106 |
|
| 107 |
-
**
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
```
|
| 113 |
|
| 114 |
-
|
|
|
|
|
|
|
|
|
|
| 115 |
```bash
|
|
|
|
| 116 |
export API_BASE_URL=https://router.huggingface.co/v1
|
| 117 |
export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
|
| 118 |
-
export HF_TOKEN=
|
|
|
|
| 119 |
```
|
| 120 |
|
| 121 |
-
|
| 122 |
|
| 123 |
-
|
| 124 |
-
# Run 3 learning episodes (takes ~5 minutes)
|
| 125 |
-
python inference.py --episodes 3
|
| 126 |
-
```
|
| 127 |
|
| 128 |
-
|
| 129 |
-
```
|
| 130 |
-
Episode 1/3 - Task 1 (Easy): Learning to save energy...
|
| 131 |
-
AI Decision: Lowering HVAC to save $2.50
|
| 132 |
-
Score: 0.85
|
| 133 |
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
|
|
|
|
|
|
|
|
|
| 138 |
|
| 139 |
---
|
| 140 |
|
| 141 |
-
##
|
|
|
|
|
|
|
| 142 |
|
| 143 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 144 |
|
| 145 |
-
|
| 146 |
-
|-------|-----------|----------------|
|
| 147 |
-
| **Easy** | Save money | Basic energy cost optimization |
|
| 148 |
-
| **Medium** | Stay comfortable | Keep building 68-74°F (19-23°C) |
|
| 149 |
-
| **Hard** | Handle emergencies | Respond to grid stress + meet production deadlines |
|
| 150 |
|
| 151 |
-
**
|
| 152 |
|
| 153 |
---
|
| 154 |
|
| 155 |
-
##
|
| 156 |
|
| 157 |
-
|
| 158 |
-
- 📈 Live energy usage charts
|
| 159 |
-
- 🌡️ Temperature trends
|
| 160 |
-
- 💰 Cost savings over time
|
| 161 |
-
- ⚡ Grid stress responses
|
| 162 |
|
| 163 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
|
| 165 |
-
|
| 166 |
|
| 167 |
-
|
|
| 168 |
-
|---------
|
| 169 |
-
| `
|
| 170 |
-
| `
|
| 171 |
-
| `
|
| 172 |
-
|
|
| 173 |
-
|
|
|
|
|
| 174 |
|
| 175 |
---
|
| 176 |
|
| 177 |
-
##
|
| 178 |
-
|
| 179 |
-
### What the AI Sees (Sensors)
|
| 180 |
-
- Current temperature, electricity price, grid stress level
|
| 181 |
-
- Battery charge level, time of day, pending work deadlines
|
| 182 |
-
- Running energy costs and carbon emissions
|
| 183 |
|
| 184 |
-
|
| 185 |
-
- HVAC power level (0-100%)
|
| 186 |
-
- Battery charge/discharge rate
|
| 187 |
-
- When to run batch processes
|
| 188 |
-
- How much load to shed during emergencies
|
| 189 |
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
|
| 194 |
-
|
| 195 |
|
| 196 |
-
|
| 197 |
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 201 |
```
|
| 202 |
|
| 203 |
-
|
| 204 |
-
```bash
|
| 205 |
-
python inference.py --episodes 10 --llm-every 4 # Scale LLM calls via --llm-every; use --fast-mode for tests
|
| 206 |
-
```
|
| 207 |
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 211 |
```
|
| 212 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 213 |
---
|
| 214 |
|
| 215 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 216 |
|
| 217 |
-
|
| 218 |
-
- **Energy Management**: Real-world smart grid technologies
|
| 219 |
-
- **Hugging Face**: Free platform for AI models and datasets
|
| 220 |
|
| 221 |
-
|
|
|
|
| 1 |
+
# GridMind-RL
|
| 2 |
|
| 3 |
+
**OpenEnv-style environment** for reinforcement learning and LLM agents on **building energy management**: HVAC, thermal storage, demand response, batch job scheduling, and load shedding under time-varying electricity prices and grid stress.
|
|
|
|
|
|
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
+
## Project overview
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
GridMind-RL simulates a **24-hour** control horizon at **15-minute resolution** (96 steps per episode). The agent observes prices, temperature, storage, process load, grid stress, carbon intensity, and batch job deadlines; it acts with continuous and discrete controls aligned with real **demand response** and **industrial/commercial** load-shaping problems.
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
**Why it matters:** Optimizing flexible loads against **time-of-use pricing** and **grid signals** reduces cost and emissions while respecting comfort and process constraints—an active area for RL and LLM-based control research.
|
| 12 |
|
| 13 |
+
**Strengths for judges**
|
| 14 |
|
| 15 |
+
| Area | Detail |
|
| 16 |
+
|------|--------|
|
| 17 |
+
| Spec | `openenv.yaml` documents server port, schemas, tasks, and endpoints |
|
| 18 |
+
| API | REST: reset, step, state, grade, health, ping, replay, tasks, metrics |
|
| 19 |
+
| Tasks | Three levels (easy / medium / hard) with deterministic episode grading |
|
| 20 |
+
| Baseline | Root `inference.py` + OpenAI-compatible LLM client and heuristic fallback |
|
| 21 |
+
| Ops | Multi-stage **Docker** image: Go environment + Python dashboard + deps |
|
| 22 |
|
| 23 |
---
|
| 24 |
|
| 25 |
+
## Quick start (copy-paste)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
+
**Minimal flow** (API on **7860** only; keep Docker running, then run `python` in a **second** terminal from the repo root with `pip install -r python/requirements.txt` already done):
|
| 28 |
|
| 29 |
+
```bash
|
| 30 |
+
docker build -t gridmind-rl .
|
| 31 |
+
docker run -p 7860:7860 gridmind-rl
|
|
|
|
| 32 |
|
| 33 |
+
python inference.py --fast-mode --episodes 1
|
| 34 |
+
```
|
| 35 |
|
| 36 |
+
### 1. Build and run (Docker)
|
| 37 |
|
| 38 |
+
From the **repository root**:
|
| 39 |
|
|
|
|
| 40 |
```bash
|
| 41 |
docker build -t gridmind-rl .
|
| 42 |
+
docker run --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl
|
| 43 |
```
|
| 44 |
|
| 45 |
+
- **7860** — Environment API (OpenEnv / agent traffic)
|
| 46 |
+
- **7861** — Web dashboard (optional)
|
|
|
|
|
|
|
| 47 |
|
| 48 |
+
**Windows (PowerShell)** — same commands in a terminal with Docker Desktop running.
|
|
|
|
|
|
|
| 49 |
|
| 50 |
+
### 2. Validate the API (optional)
|
| 51 |
|
| 52 |
+
With the container running, from the repo root (host Python with `requests`):
|
|
|
|
|
|
|
| 53 |
|
|
|
|
| 54 |
```bash
|
| 55 |
+
pip install requests
|
| 56 |
+
python python/validate.py --env-url http://localhost:7860
|
| 57 |
```
|
| 58 |
|
| 59 |
+
### 3. Run baseline inference
|
| 60 |
+
|
| 61 |
+
On the **host** (not inside the container unless you set `--env-url` to the env server):
|
|
|
|
|
|
|
| 62 |
|
|
|
|
| 63 |
```bash
|
|
|
|
| 64 |
pip install -r python/requirements.txt
|
| 65 |
```
|
| 66 |
|
| 67 |
+
**Windows — PowerShell:**
|
| 68 |
+
|
| 69 |
+
```powershell
|
| 70 |
+
$env:ENV_URL="http://localhost:7860"
|
| 71 |
+
python inference.py --fast-mode --episodes 1
|
| 72 |
```
|
| 73 |
|
| 74 |
+
**Windows — Command Prompt (cmd):**
|
| 75 |
|
| 76 |
+
```bat
|
| 77 |
+
set ENV_URL=http://localhost:7860
|
| 78 |
+
python inference.py --fast-mode --episodes 1
|
|
|
|
|
|
|
| 79 |
```
|
| 80 |
|
| 81 |
+
**Linux / macOS:**
|
| 82 |
+
|
| 83 |
+
```bash
|
| 84 |
+
export ENV_URL=http://localhost:7860
|
| 85 |
+
python inference.py --fast-mode --episodes 1
|
| 86 |
```
|
| 87 |
|
| 88 |
+
You can run the same entrypoint directly with `python python/inference.py` (e.g. `python python/inference.py --fast-mode`); flags match the root `inference.py` wrapper.
|
| 89 |
+
|
| 90 |
+
**LLM baseline** (requires Hugging Face or other OpenAI-compatible API credentials):
|
| 91 |
+
|
| 92 |
```bash
|
| 93 |
+
export ENV_URL=http://localhost:7860
|
| 94 |
export API_BASE_URL=https://router.huggingface.co/v1
|
| 95 |
export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
|
| 96 |
+
export HF_TOKEN=your_token_here
|
| 97 |
+
python inference.py --episodes 1 --llm-every 4
|
| 98 |
```
|
| 99 |
|
| 100 |
+
Results are written to `baseline_scores.json` by default (`--output` to change).
|
| 101 |
|
| 102 |
+
---
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
+
## Tasks
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
|
| 106 |
+
| ID | Difficulty | Name | Objective |
|
| 107 |
+
|----|------------|------|-----------|
|
| 108 |
+
| 1 | Easy | Cost minimization | Minimize total energy cost over the episode. No temperature or batch-job objectives in the grade. |
|
| 109 |
+
| 2 | Medium | Constrained temperature | Minimize cost while keeping indoor temperature within **±2 °C** of setpoint (19–23 °C) for graded temperature compliance. |
|
| 110 |
+
| 3 | Hard | Full demand response | Minimize cost, maintain temperature, respond to **grid stress** (e.g. shed load when stress is high), complete **batch jobs** on time, and reduce **carbon** vs a baseline policy in the composite score. |
|
| 111 |
+
|
| 112 |
+
Episode **grade** is returned by `GET /grade` after the episode completes (or after a partial run if you stopped stepping early). Sub-scores are task-dependent and documented in code (`env/tasks.go`).
|
| 113 |
|
| 114 |
---
|
| 115 |
|
| 116 |
+
## HTTP API
|
| 117 |
+
|
| 118 |
+
Base URL: `http://<host>:7860` (default in container: port **7860**).
|
| 119 |
|
| 120 |
+
| Method | Path | Purpose |
|
| 121 |
+
|--------|------|---------|
|
| 122 |
+
| GET | `/health` | Liveness; JSON `status`, `version` |
|
| 123 |
+
| GET | `/ping` | Lightweight liveness; JSON `status` |
|
| 124 |
+
| POST | `/reset` | Start episode: body e.g. `{"task_id": 1, "seed": 42, "num_buildings": 1}` |
|
| 125 |
+
| POST | `/step` | Advance one step: JSON action or array of actions (multi-building) |
|
| 126 |
+
| GET | `/state` | Full snapshot: buildings, downsampled price/carbon curves, step, task, etc. |
|
| 127 |
+
| GET | `/grade` | Episode score in `[0, 1]`, sub-scores, exploit flags |
|
| 128 |
+
| GET | `/replay` | Step replay list |
|
| 129 |
+
| GET | `/tasks` | Task metadata and grader weights |
|
| 130 |
+
| GET | `/metrics` | Prometheus-style text metrics |
|
| 131 |
|
| 132 |
+
**Action JSON fields** (single building): `hvac_power_level`, `thermal_charge_rate`, `batch_job_slot`, `load_shed_fraction`, optional `building_id`.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
+
Schemas and primary endpoints: **`openenv.yaml`** at repo root (see Notes for additional endpoints like `/metrics`).
|
| 135 |
|
| 136 |
---
|
| 137 |
|
| 138 |
+
## Evaluation modes (`inference.py`)
|
| 139 |
|
| 140 |
+
There is **no** `--judge-mode` flag in this repository. Use the modes below.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
|
| 142 |
+
| Mode | Command pattern | Behavior |
|
| 143 |
+
|------|-----------------|----------|
|
| 144 |
+
| **Fast (heuristic)** | `python inference.py --fast-mode` | No LLM calls; deterministic given env seed; fastest for CI or smoke tests. |
|
| 145 |
+
| **Default LLM** | `python inference.py` | Uses OpenAI-compatible API (`API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`); default `--llm-every 4` reuses each LLM action for 4 steps to limit API cost. |
|
| 146 |
+
| **Recommended for automated evaluation / judging** | `python inference.py --fast-mode --episodes 1` | Recommended when automated pipelines need **reproducibility** and **no external API** dependency. |
|
| 147 |
|
| 148 |
+
Other useful flags:
|
| 149 |
|
| 150 |
+
| Flag | Default | Meaning |
|
| 151 |
+
|------|---------|---------|
|
| 152 |
+
| `--episodes` | `1` | Episodes per task (tasks 1–3 run in sequence) |
|
| 153 |
+
| `--env-url` | `ENV_URL` or `http://localhost:7860` | Environment base URL |
|
| 154 |
+
| `--llm-every` | `4` | Steps per LLM call (ignored in `--fast-mode`) |
|
| 155 |
+
| `--max-steps` | full episode | Stop after N steps; grade reflects **partial** episode |
|
| 156 |
+
| `--output` | `baseline_scores.json` | Results path |
|
| 157 |
+
| `--verbose` | off | Extra step logs |
|
| 158 |
|
| 159 |
---
|
| 160 |
|
| 161 |
+
## Logging format (baseline)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 162 |
|
| 163 |
+
For each episode the script prints, in order:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
|
| 165 |
+
1. **`[START]`** — episode beginning (after `reset`)
|
| 166 |
+
2. **`[STEP1]` … `[STEP96]`** (full episode) — one line per successful `POST /step`; a full episode has **96** steps (`[STEP1]` through `[STEP96]`) unless `--max-steps` or an early error stops the loop
|
| 167 |
+
3. **`[END]`** — after `GET /grade` for that episode
|
| 168 |
|
| 169 |
+
Additional lines (banners, task headers, `[OK]` / `[WARN]`) may appear; parsers should match the bracketed markers above.
|
| 170 |
|
| 171 |
+
Example shape:
|
| 172 |
|
| 173 |
+
```text
|
| 174 |
+
[START]
|
| 175 |
+
[STEP1]
|
| 176 |
+
[STEP2]
|
| 177 |
+
...
|
| 178 |
+
[STEP96]
|
| 179 |
+
[END]
|
| 180 |
```
|
| 181 |
|
| 182 |
+
---
|
|
|
|
|
|
|
|
|
|
| 183 |
|
| 184 |
+
## Architecture
|
| 185 |
+
|
| 186 |
+
```text
|
| 187 |
+
┌─────────────────────────────────────────────────────────────┐
|
| 188 |
+
│ Client: python inference.py (LLM or heuristic) │
|
| 189 |
+
│ │ HTTP (reset / step / grade) │
|
| 190 |
+
│ ▼ │
|
| 191 |
+
│ ┌──────────────────┐ ┌─────────────────────────────┐ │
|
| 192 |
+
│ │ gridmind-server │ │ Dashboard (optional) │ │
|
| 193 |
+
│ │ Go :7860 │◄────│ FastAPI + static UI :7861 │ │
|
| 194 |
+
│ │ env/* simulation│ │ proxies /api → :7860 │ │
|
| 195 |
+
│ └──────────────────┘ └─────────────────────────────┘ │
|
| 196 |
+
└─────────────────────────────────────────────────────────────┘
|
| 197 |
```
|
| 198 |
|
| 199 |
+
- **Core:** `main.go` + `env/` (physics, rewards, tasks, grading)
|
| 200 |
+
- **Baseline:** `inference.py` (root) → `python/inference.py`
|
| 201 |
+
- **Dashboard:** `dashboard/server.py`, `dashboard/static/`
|
| 202 |
+
- **Spec:** `openenv.yaml`
|
| 203 |
+
|
| 204 |
+
---
|
| 205 |
+
|
| 206 |
+
## Docker (detailed)
|
| 207 |
+
|
| 208 |
+
| Step | Command |
|
| 209 |
+
|------|---------|
|
| 210 |
+
| Build | `docker build -t gridmind-rl .` |
|
| 211 |
+
| Run (foreground) | `docker run --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
|
| 212 |
+
| Run (background) | `docker run -d --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
|
| 213 |
+
| Stop (background) | `docker stop gridmind` |
|
| 214 |
+
| Inference **inside** container | `docker exec -it gridmind python /app/inference.py --fast-mode --env-url http://127.0.0.1:7860` |
|
| 215 |
+
|
| 216 |
+
The image runs **supervisord** as a non-root user with two programs: Go server (`PORT=7860`) and uvicorn dashboard (`7861`).
|
| 217 |
+
|
| 218 |
---
|
| 219 |
|
| 220 |
+
## Notes for judges and operators
|
| 221 |
+
|
| 222 |
+
| Topic | Detail |
|
| 223 |
+
|-------|--------|
|
| 224 |
+
| **Ports** | **7860** = environment API; **7861** = dashboard. Some hosts only expose one public port—API is the required one for OpenEnv-style evaluation. |
|
| 225 |
+
| **Episode length** | **96 steps** = 24 h at 15 min/step. Observation `step` is **0–95** for a full episode. |
|
| 226 |
+
| **`openenv.yaml`** | Lists main endpoints; **`/metrics`** exists at runtime but may not appear in the YAML block—treat as an extra ops endpoint. |
|
| 227 |
+
| **Reproducibility** | Env is seed-controlled. LLM outputs may still vary by provider even at `temperature=0`. |
|
| 228 |
+
| **`--max-steps`** | Produces a **partial** episode; final `GET /grade` reflects that partial trajectory. |
|
| 229 |
+
| **Manual run (no Docker)** | Install Go 1.21+, `go run .` from repo root (default port 7860); install Python deps and run `python inference.py` as above. |
|
| 230 |
+
| **Runtime** | The baseline completes within typical hackathon limits (<20 minutes). |
|
| 231 |
+
|
| 232 |
+
---
|
| 233 |
|
| 234 |
+
## License
|
|
|
|
|
|
|
| 235 |
|
| 236 |
+
See `LICENSE` in the repository.
|