feat: initialize project with documentation, inference script, and environment configuration
Browse files- .gitignore +20 -0
- README.md +296 -213
- python/inference.py +23 -10
- python/requirements.txt +1 -0
.gitignore
CHANGED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Secrets
|
| 2 |
+
.env
|
| 3 |
+
|
| 4 |
+
# Python
|
| 5 |
+
__pycache__/
|
| 6 |
+
*.pyc
|
| 7 |
+
*.pyo
|
| 8 |
+
.pytest_cache/
|
| 9 |
+
|
| 10 |
+
# Go
|
| 11 |
+
*.exe
|
| 12 |
+
gridmind-out.exe
|
| 13 |
+
|
| 14 |
+
# IDE
|
| 15 |
+
.vscode/
|
| 16 |
+
.idea/
|
| 17 |
+
|
| 18 |
+
# OS
|
| 19 |
+
.DS_Store
|
| 20 |
+
Thumbs.db
|
README.md
CHANGED
|
@@ -1,325 +1,408 @@
|
|
| 1 |
-
# GridMind-RL
|
| 2 |
|
| 3 |
-
**
|
| 4 |
|
| 5 |
-
-
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
-
|------|--------|
|
| 17 |
-
| Spec | `openenv.yaml` documents server port, schemas, tasks, and endpoints |
|
| 18 |
-
| API | REST: reset, step, state, grade, health, ping, replay, tasks, metrics |
|
| 19 |
-
| Tasks | Three levels (easy / medium / hard) with deterministic episode grading |
|
| 20 |
-
| Baseline | Root `inference.py` + OpenAI-compatible LLM client and heuristic fallback |
|
| 21 |
-
| Ops | Multi-stage **Docker** image: Go environment + Python dashboard + deps |
|
| 22 |
|
| 23 |
---
|
| 24 |
|
| 25 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
| 30 |
-
docker build -t gridmind-rl .
|
| 31 |
-
docker run -p 7860:7860 gridmind-rl
|
| 32 |
|
| 33 |
-
|
| 34 |
-
```
|
| 35 |
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
-
|
| 39 |
|
| 40 |
-
|
| 41 |
-
docker build -t gridmind-rl .
|
| 42 |
-
docker run --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl
|
| 43 |
-
```
|
| 44 |
|
| 45 |
-
|
| 46 |
-
- **7861** — Web dashboard (optional); http://localhost:7861
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
-
|
|
|
|
|
|
|
| 51 |
|
| 52 |
-
|
| 53 |
|
| 54 |
```bash
|
| 55 |
-
|
| 56 |
-
|
| 57 |
```
|
| 58 |
|
| 59 |
-
###
|
| 60 |
-
|
| 61 |
-
On the **host** (not inside the container unless you set `--env-url` to the env server):
|
| 62 |
|
| 63 |
```bash
|
| 64 |
-
|
|
|
|
| 65 |
```
|
| 66 |
|
| 67 |
-
**
|
| 68 |
|
| 69 |
-
```
|
| 70 |
-
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
| 72 |
```
|
| 73 |
|
| 74 |
-
|
| 75 |
|
| 76 |
-
|
| 77 |
-
set ENV_URL=http://localhost:7860
|
| 78 |
-
python inference.py --fast-mode --episodes 1
|
| 79 |
-
```
|
| 80 |
|
| 81 |
-
**
|
| 82 |
|
| 83 |
```bash
|
| 84 |
-
|
| 85 |
-
python inference.py --fast-mode --episodes 1
|
| 86 |
```
|
| 87 |
|
| 88 |
-
|
| 89 |
|
| 90 |
-
|
| 91 |
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
```
|
| 99 |
|
| 100 |
-
> **Note:** The
|
| 101 |
|
| 102 |
-
|
| 103 |
|
| 104 |
-
|
|
|
|
|
|
|
| 105 |
|
| 106 |
-
#
|
|
|
|
|
|
|
| 107 |
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
|
| 114 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
|
| 116 |
---
|
| 117 |
|
| 118 |
-
##
|
| 119 |
|
| 120 |
-
|
| 121 |
|
| 122 |
-
|
|
| 123 |
-
|-----------
|
| 124 |
-
|
|
| 125 |
-
|
|
| 126 |
-
|
|
| 127 |
-
| **
|
| 128 |
-
| **Stability Reward** | `stability_penalty` | All tasks | Positive reward for smooth, stable control; penalty for rapid HVAC/storage oscillation. |
|
| 129 |
-
| **Deadline Penalty** | `deadline_penalty` | Task 2, 3 | Penalty per missed batch job deadline (-1.5 each). Positive bonus for keeping jobs on track. |
|
| 130 |
-
| **Carbon Reward** | `carbon_reward` | Task 3 | Baseline bonus for low-carbon operation, reduced by carbon-heavy consumption. Extra bonus during clean grid periods. |
|
| 131 |
|
| 132 |
-
|
| 133 |
|
| 134 |
---
|
| 135 |
|
| 136 |
-
## HTTP API
|
| 137 |
|
| 138 |
-
Base URL: `http://
|
| 139 |
|
| 140 |
| Method | Path | Purpose |
|
| 141 |
|--------|------|---------|
|
| 142 |
-
| GET | `/health` |
|
| 143 |
-
| GET | `/ping` | Lightweight liveness
|
| 144 |
-
| POST | `/reset` | Start
|
| 145 |
-
| POST | `/step` |
|
| 146 |
-
| GET | `/state` | Full
|
| 147 |
-
| GET | `/grade` | Episode score
|
| 148 |
-
| GET | `/replay` |
|
| 149 |
-
| GET | `/tasks` |
|
| 150 |
-
| GET | `/metrics` | Prometheus-
|
|
|
|
|
|
|
| 151 |
|
| 152 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
|
| 154 |
-
|
|
|
|
|
|
|
|
|
|
| 155 |
|
| 156 |
-
|
|
|
|
|
|
|
| 157 |
|
| 158 |
-
|
| 159 |
|
| 160 |
-
|
| 161 |
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
|
| 168 |
-
|
| 169 |
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
|
| 179 |
---
|
| 180 |
|
| 181 |
-
##
|
| 182 |
|
| 183 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
Additional lines (banners, task headers, `[OK]` / `[WARN]`) may appear; parsers should match the bracketed markers above.
|
| 190 |
-
|
| 191 |
-
Example shape:
|
| 192 |
-
|
| 193 |
-
```text
|
| 194 |
-
[START]
|
| 195 |
-
[STEP1]
|
| 196 |
-
[STEP2]
|
| 197 |
-
...
|
| 198 |
-
[STEP96]
|
| 199 |
-
[END]
|
| 200 |
-
```
|
| 201 |
|
| 202 |
---
|
| 203 |
|
| 204 |
-
##
|
| 205 |
-
|
| 206 |
-
```text
|
| 207 |
-
┌─────────────────────────────────────────────────────────────┐
|
| 208 |
-
│ Client: python inference.py (LLM or heuristic) │
|
| 209 |
-
│ │ HTTP (reset / step / grade) │
|
| 210 |
-
│ ▼ │
|
| 211 |
-
│ ┌──────────────────┐ ┌─────────────────────────────┐ │
|
| 212 |
-
│ │ gridmind-server │ │ Dashboard (optional) │ │
|
| 213 |
-
│ │ Go :7860 │◄────│ FastAPI + static UI :7861 │ │
|
| 214 |
-
│ │ env/* simulation│ │ proxies /api → :7860 │ │
|
| 215 |
-
│ └──────────────────┘ └─────────────────────────────┘ │
|
| 216 |
-
└─────────────────────────────────────────────────────────────┘
|
| 217 |
-
```
|
| 218 |
|
| 219 |
-
|
| 220 |
-
- **Baseline:** `inference.py` (root) → `python/inference.py`
|
| 221 |
-
- **Dashboard:** `dashboard/server.py`, `dashboard/static/`
|
| 222 |
-
- **Spec:** `openenv.yaml`
|
| 223 |
|
| 224 |
-
--
|
|
|
|
|
|
|
| 225 |
|
| 226 |
-
##
|
| 227 |
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
| Run (background) | `docker run -d --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
|
| 233 |
-
| Stop (background) | `docker stop gridmind` |
|
| 234 |
-
| Inference **inside** container | `docker exec -it gridmind python /app/inference.py --fast-mode --env-url http://127.0.0.1:7860` |
|
| 235 |
|
| 236 |
-
|
| 237 |
|
| 238 |
-
|
|
|
|
|
|
|
| 239 |
|
| 240 |
-
|
|
|
|
|
|
|
|
|
|
| 241 |
|
| 242 |
-
|
| 243 |
-
|-------|--------|
|
| 244 |
-
| **Ports** | **7860** = environment API; **7861** = dashboard. Some hosts only expose one public port—API is the required one for OpenEnv-style evaluation. |
|
| 245 |
-
| **Episode length** | **96 steps** = 24 h at 15 min/step. Observation `step` is **0–95** for a full episode. |
|
| 246 |
-
| **`openenv.yaml`** | Lists main endpoints; **`/metrics`** exists at runtime but may not appear in the YAML block—treat as an extra ops endpoint. |
|
| 247 |
-
| **Reproducibility** | Env is seed-controlled. LLM outputs may still vary by provider even at `temperature=0`. |
|
| 248 |
-
| **`--max-steps`** | Produces a **partial** episode; final `GET /grade` reflects that partial trajectory. |
|
| 249 |
-
| **Manual run (no Docker)** | Install Go 1.21+, `go run .` from repo root (default port 7860); install Python deps and run `python inference.py` as above. |
|
| 250 |
-
| **Runtime** | The baseline completes within typical hackathon limits (<20 minutes). |
|
| 251 |
|
| 252 |
---
|
| 253 |
|
| 254 |
-
##
|
| 255 |
|
| 256 |
-
|
| 257 |
|
| 258 |
```bash
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
|
| 262 |
|
| 263 |
-
#
|
| 264 |
-
curl -X POST http://localhost:7860/reset \
|
| 265 |
-
-H "Content-Type: application/json" \
|
| 266 |
-
-d '{"task_id": 3, "seed": 42, "num_buildings": 1}'
|
| 267 |
-
# → {"observations":[{"indoor_temperature":21.3,...}],"episode":1,"task_id":3,"seed":42}
|
| 268 |
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
-d '{"hvac_power_level": 0.6, "thermal_charge_rate": 0.3, "batch_job_slot": 1, "load_shed_fraction": 0.1}'
|
| 273 |
-
# → {"observation":{...},"reward":2.15,"done":false,"info":{"reward_components":{...},...}}
|
| 274 |
-
|
| 275 |
-
# Get current state
|
| 276 |
-
curl http://localhost:7860/state
|
| 277 |
-
# → {"buildings":[...],"price_curve_episode":[...],"step":1,"task_id":3,...}
|
| 278 |
|
| 279 |
-
#
|
| 280 |
-
curl http://localhost:7860/grade
|
| 281 |
-
# → {"task_id":3,"score":0.3115,"sub_scores":{"cost":0.25,"temperature":0.14,...},...}
|
| 282 |
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
# → [{"id":1,"name":"Cost Minimization","difficulty":"easy",...},...]
|
| 286 |
```
|
| 287 |
|
| 288 |
---
|
| 289 |
|
| 290 |
-
##
|
| 291 |
|
| 292 |
-
|
| 293 |
|
| 294 |
-
|
| 295 |
-
- **SDK:** Docker
|
| 296 |
-
- **Hardware:** CPU Basic (2 vCPU, 16GB)
|
| 297 |
|
| 298 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 299 |
|
| 300 |
-
|
| 301 |
-
# Clone and push
|
| 302 |
-
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/gridmind-rl
|
| 303 |
-
git push hf main
|
| 304 |
-
```
|
| 305 |
|
| 306 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 307 |
|
| 308 |
-
|
| 309 |
|
| 310 |
-
|
| 311 |
-
curl https://YOUR_USERNAME-gridmind-rl.hf.space/health
|
| 312 |
-
# → {"status":"ok","version":"1.0.0"}
|
| 313 |
|
| 314 |
-
curl -X POST https://YOUR_USERNAME-gridmind-rl.hf.space/reset \
|
| 315 |
-
-H "Content-Type: application/json" \
|
| 316 |
-
-d '{"task_id":1,"seed":42}'
|
| 317 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 318 |
|
| 319 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 320 |
|
| 321 |
---
|
| 322 |
|
| 323 |
-
## License
|
| 324 |
|
| 325 |
See `LICENSE` in the repository.
|
|
|
|
| 1 |
+
# ⚡ GridMind-RL
|
| 2 |
|
| 3 |
+
**A real-world RL environment for building energy management** — control HVAC systems, thermal storage, batch job scheduling, and demand response under stochastic electricity prices and grid stress events.
|
| 4 |
|
| 5 |
+
Built on the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) specification. Containerized. Ready for Hugging Face Spaces.
|
| 6 |
|
| 7 |
+
---
|
| 8 |
|
| 9 |
+
## 🎯 Why GridMind-RL?
|
| 10 |
|
| 11 |
+
Optimizing building energy use is a **real problem** that utilities, building managers, and industrial operators face every day. An agent must balance:
|
| 12 |
|
| 13 |
+
- **Cost** — buy electricity when it's cheap, avoid peak pricing
|
| 14 |
+
- **Comfort** — keep indoor temperature within comfortable bounds
|
| 15 |
+
- **Grid compliance** — shed load when the grid signals demand-response events
|
| 16 |
+
- **Scheduling** — complete batch processing jobs before their deadlines
|
| 17 |
+
- **Carbon** — minimize carbon emissions by timing consumption to clean-grid periods
|
| 18 |
|
| 19 |
+
This isn't a toy or a game. It's a simulation of decisions that **humans actually make** in industrial energy management, packaged as an RL environment where agents can learn to do it better.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
---
|
| 22 |
|
| 23 |
+
## 📐 Observation Space
|
| 24 |
+
|
| 25 |
+
Each timestep (15 minutes of simulated time), the agent receives:
|
| 26 |
+
|
| 27 |
+
| Field | Type | Range | Description |
|
| 28 |
+
|-------|------|-------|-------------|
|
| 29 |
+
| `indoor_temperature` | float | 10–40 °C | Current building temperature |
|
| 30 |
+
| `thermal_storage_level` | float | 0.0–1.0 | Thermal tank fill level (0=empty, 1=full) |
|
| 31 |
+
| `process_demand` | float | ≥ 0 kW | Current industrial power demand |
|
| 32 |
+
| `current_price` | float | > 0 $/kWh | Real-time electricity price |
|
| 33 |
+
| `grid_stress_signal` | float | 0.0–1.0 | Utility demand-response urgency (>0.7 = critical) |
|
| 34 |
+
| `carbon_intensity` | float | ≥ 0 gCO₂/kWh | Grid carbon intensity |
|
| 35 |
+
| `hour_of_day` | int | 0–23 | Current hour |
|
| 36 |
+
| `batch_queue` | int[] | — | Deadline slots of pending batch jobs |
|
| 37 |
+
| `cumulative_cost` | float | ≥ 0 $ | Total energy cost so far this episode |
|
| 38 |
+
| `step` | int | 0–95 | Current timestep (96 steps = 24 hours) |
|
| 39 |
+
| `building_id` | int | 0+ | Building index in multi-building mode |
|
| 40 |
+
|
| 41 |
+
## 🕹️ Action Space
|
| 42 |
+
|
| 43 |
+
Each timestep, the agent sends:
|
| 44 |
+
|
| 45 |
+
| Field | Type | Range | Description |
|
| 46 |
+
|-------|------|-------|-------------|
|
| 47 |
+
| `hvac_power_level` | float | 0.0–1.0 | Fraction of max HVAC power (0=off, 1=full) |
|
| 48 |
+
| `thermal_charge_rate` | float | -1.0–1.0 | Charge (+) or discharge (-) thermal storage |
|
| 49 |
+
| `batch_job_slot` | int | 0–4 | Schedule next batch job: 0=now, 1–4=defer |
|
| 50 |
+
| `load_shed_fraction` | float | 0.0–0.5 | Fraction of non-critical load to shed |
|
| 51 |
+
| `building_id` | int | 0+ | Which building this action targets |
|
| 52 |
+
|
| 53 |
+
## 💰 Reward Structure
|
| 54 |
+
|
| 55 |
+
The environment provides a **dense, multi-component reward** every step — not just a binary win/lose at the end. Each step returns a scalar `reward` (the sum) plus a detailed `reward_components` breakdown:
|
| 56 |
+
|
| 57 |
+
| Component | Key | Description |
|
| 58 |
+
|-----------|-----|-------------|
|
| 59 |
+
| Cost Savings | `cost_savings` | Rewards reducing energy spend vs baseline |
|
| 60 |
+
| Temperature | `temp_constraint` | Gaussian bonus near setpoint, penalty outside bounds |
|
| 61 |
+
| Grid Response | `grid_response` | Bonus for shedding load during grid stress |
|
| 62 |
+
| Efficiency | `efficiency_bonus` | Thermal storage arbitrage + balanced usage |
|
| 63 |
+
| Stability | `stability_penalty` | Rewards smooth control, penalizes oscillation |
|
| 64 |
+
| Deadlines | `deadline_penalty` | Penalty for missed batch jobs |
|
| 65 |
+
| Carbon | `carbon_reward` | Bonus for low-carbon operation |
|
| 66 |
|
| 67 |
+
---
|
| 68 |
|
| 69 |
+
## 📋 Tasks (3 difficulty levels)
|
|
|
|
|
|
|
| 70 |
|
| 71 |
+
Each task defines a concrete objective with a **deterministic programmatic grader** that scores performance from **0.0 to 1.0**.
|
|
|
|
| 72 |
|
| 73 |
+
| ID | Difficulty | Name | What the Agent Must Do | Grader Weights |
|
| 74 |
+
|----|:----------:|------|------------------------|----------------|
|
| 75 |
+
| 1 | 🟢 Easy | **Cost Minimization** | Minimize total energy cost over 24 hours. No temperature or scheduling constraints. | cost: 100% |
|
| 76 |
+
| 2 | 🟡 Medium | **Constrained Temperature** | Minimize cost **and** keep temperature within 19–23°C at all times. | cost: 60%, temperature: 40% |
|
| 77 |
+
| 3 | 🔴 Hard | **Full Demand Response** | Minimize cost, maintain temperature, respond to grid stress, complete batch jobs on time, minimize carbon. | cost: 28%, temperature: 20%, grid: 20%, batch: 12%, carbon: 20% |
|
| 78 |
|
| 79 |
+
**Graders are deterministic**: given the same seed, the same actions always produce the same score.
|
| 80 |
|
| 81 |
+
---
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
## 🚀 Getting Started (Step by Step)
|
|
|
|
| 84 |
|
| 85 |
+
### Prerequisites
|
| 86 |
|
| 87 |
+
- **Docker** — [Install Docker Desktop](https://www.docker.com/products/docker-desktop/)
|
| 88 |
+
- **Python 3.9+** — [Download Python](https://www.python.org/downloads/)
|
| 89 |
+
- **Git** — [Download Git](https://git-scm.com/downloads)
|
| 90 |
|
| 91 |
+
### Step 1: Clone the Repository
|
| 92 |
|
| 93 |
```bash
|
| 94 |
+
git clone https://github.com/LO-Kyu/gridmind.git
|
| 95 |
+
cd gridmind
|
| 96 |
```
|
| 97 |
|
| 98 |
+
### Step 2: Build and Start the Environment Server
|
|
|
|
|
|
|
| 99 |
|
| 100 |
```bash
|
| 101 |
+
docker build -t gridmind-rl .
|
| 102 |
+
docker run --rm -d -p 7860:7860 --name gridmind gridmind-rl
|
| 103 |
```
|
| 104 |
|
| 105 |
+
This starts the GridMind-RL environment server on port **7860**. Verify it's running:
|
| 106 |
|
| 107 |
+
```bash
|
| 108 |
+
# Linux/macOS
|
| 109 |
+
curl http://localhost:7860/health
|
| 110 |
+
|
| 111 |
+
# Windows (PowerShell)
|
| 112 |
+
Invoke-RestMethod -Uri http://localhost:7860/health
|
| 113 |
```
|
| 114 |
|
| 115 |
+
You should see: `{"status":"ok","version":"1.0.0"}`
|
| 116 |
|
| 117 |
+
### Step 3: Install Python Dependencies
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
+
Open a **new terminal** (keep Docker running) and install:
|
| 120 |
|
| 121 |
```bash
|
| 122 |
+
pip install -r python/requirements.txt
|
|
|
|
| 123 |
```
|
| 124 |
|
| 125 |
+
### Step 4: Get a Free API Key
|
| 126 |
|
| 127 |
+
The inference script uses an LLM to make decisions. You need a **free** API key:
|
| 128 |
|
| 129 |
+
1. Go to [openrouter.ai/keys](https://openrouter.ai/keys)
|
| 130 |
+
2. Sign in with Google or GitHub (free)
|
| 131 |
+
3. Click **"Create Key"** and copy it
|
| 132 |
+
|
| 133 |
+
### Step 5: Configure Your API Key
|
| 134 |
+
|
| 135 |
+
Open the `.env` file in the project root and paste your key:
|
| 136 |
+
|
| 137 |
+
```env
|
| 138 |
+
API_BASE_URL=https://openrouter.ai/api/v1
|
| 139 |
+
MODEL_NAME=meta-llama/llama-3.1-8b-instruct:free
|
| 140 |
+
OPENAI_API_KEY=sk-or-v1-paste-your-actual-key-here
|
| 141 |
+
ENV_URL=http://localhost:7860
|
| 142 |
```
|
| 143 |
|
| 144 |
+
> **Note:** The model `meta-llama/llama-3.1-8b-instruct:free` is **completely free** on OpenRouter. No credit card needed.
|
| 145 |
|
| 146 |
+
### Step 6: Run the Baseline Inference
|
| 147 |
|
| 148 |
+
```bash
|
| 149 |
+
# Run LLM agent on all 3 tasks
|
| 150 |
+
python inference.py --episodes 1
|
| 151 |
|
| 152 |
+
# Or run without LLM (fast heuristic mode — no API key needed)
|
| 153 |
+
python inference.py --fast-mode --episodes 1
|
| 154 |
+
```
|
| 155 |
|
| 156 |
+
The script will:
|
| 157 |
+
1. Connect to the environment server
|
| 158 |
+
2. Run the agent on Task 1 (easy), Task 2 (medium), Task 3 (hard)
|
| 159 |
+
3. Print `[START]`, `[STEP1]`...`[STEP96]`, `[END]` for each episode
|
| 160 |
+
4. Save results to `baseline_scores.json`
|
| 161 |
|
| 162 |
+
### Step 7: Stop the Server (When Done)
|
| 163 |
+
|
| 164 |
+
```bash
|
| 165 |
+
docker stop gridmind
|
| 166 |
+
```
|
| 167 |
|
| 168 |
---
|
| 169 |
|
| 170 |
+
## 📊 Baseline Scores
|
| 171 |
|
| 172 |
+
Produced by running `python inference.py --fast-mode --episodes 1` (heuristic policy):
|
| 173 |
|
| 174 |
+
| Task | Difficulty | Score | Details |
|
| 175 |
+
|------|:----------:|:-----:|---------|
|
| 176 |
+
| 1 — Cost Minimization | 🟢 Easy | **0.7063** | cost: 0.706 |
|
| 177 |
+
| 2 — Temperature Management | 🟡 Medium | **0.6333** | cost: 0.701, temperature: 0.531 |
|
| 178 |
+
| 3 — Full Demand Response | 🔴 Hard | **0.5966** | cost: 0.670, temp: 0.573, grid: 0.214, batch: 1.000, carbon: 0.657 |
|
| 179 |
+
| **Overall Average** | | **0.6454** | |
|
|
|
|
|
|
|
|
|
|
| 180 |
|
| 181 |
+
Scores are in the **0.0–1.0** range. Higher is better.
|
| 182 |
|
| 183 |
---
|
| 184 |
|
| 185 |
+
## 🔌 HTTP API Reference
|
| 186 |
|
| 187 |
+
Base URL: `http://localhost:7860`
|
| 188 |
|
| 189 |
| Method | Path | Purpose |
|
| 190 |
|--------|------|---------|
|
| 191 |
+
| `GET` | `/health` | Health check → `{"status":"ok","version":"1.0.0"}` |
|
| 192 |
+
| `GET` | `/ping` | Lightweight liveness check |
|
| 193 |
+
| `POST` | `/reset` | Start a new episode. Body: `{"task_id": 1, "seed": 42}` |
|
| 194 |
+
| `POST` | `/step` | Take one action. Body: action JSON (see Action Space above) |
|
| 195 |
+
| `GET` | `/state` | Full environment state snapshot |
|
| 196 |
+
| `GET` | `/grade` | Episode score (0.0–1.0) with sub-scores |
|
| 197 |
+
| `GET` | `/replay` | Full step-by-step replay of the episode |
|
| 198 |
+
| `GET` | `/tasks` | List all task definitions and grader weights |
|
| 199 |
+
| `GET` | `/metrics` | Prometheus-format operational metrics |
|
| 200 |
+
|
| 201 |
+
### Example API Calls
|
| 202 |
|
| 203 |
+
```bash
|
| 204 |
+
# Reset to Task 1 (easy) with seed 42
|
| 205 |
+
curl -X POST http://localhost:7860/reset \
|
| 206 |
+
-H "Content-Type: application/json" \
|
| 207 |
+
-d '{"task_id": 1, "seed": 42}'
|
| 208 |
|
| 209 |
+
# Take one step
|
| 210 |
+
curl -X POST http://localhost:7860/step \
|
| 211 |
+
-H "Content-Type: application/json" \
|
| 212 |
+
-d '{"hvac_power_level": 0.5, "thermal_charge_rate": 0.1, "batch_job_slot": 1, "load_shed_fraction": 0.0}'
|
| 213 |
|
| 214 |
+
# Check score after episode
|
| 215 |
+
curl http://localhost:7860/grade
|
| 216 |
+
```
|
| 217 |
|
| 218 |
+
---
|
| 219 |
|
| 220 |
+
## 🏗️ Architecture
|
| 221 |
|
| 222 |
+
```
|
| 223 |
+
┌─────────────────────────────────────────────────────────────────┐
|
| 224 |
+
│ inference.py (LLM Agent or Heuristic) │
|
| 225 |
+
│ │ │
|
| 226 |
+
│ │ HTTP: POST /reset, /step · GET /grade, /state │
|
| 227 |
+
│ ▼ │
|
| 228 |
+
│ ┌───────────────────────────────────────────────────────────┐ │
|
| 229 |
+
│ │ Docker Container │ │
|
| 230 |
+
│ │ │ │
|
| 231 |
+
│ │ ┌─────────────────────┐ ┌───────────────────────────┐ │ │
|
| 232 |
+
│ │ │ Go Environment │ │ Python Dashboard │ │ │
|
| 233 |
+
│ │ │ Server (:7860) │ │ FastAPI + UI (:7861) │ │ │
|
| 234 |
+
│ │ │ │ │ │ │ │
|
| 235 |
+
│ │ │ • Physics engine │ │ • Proxies /api → :7860 │ │ │
|
| 236 |
+
│ │ │ • Reward function │◄──│ • Real-time charts │ │ │
|
| 237 |
+
│ │ │ • Task graders │ │ • State visualization │ │ │
|
| 238 |
+
│ │ └─────────────────────┘ └───────────────────────────┘ │ │
|
| 239 |
+
│ │ │ │
|
| 240 |
+
│ │ Isolated · Reproducible · Non-root user │ │
|
| 241 |
+
│ └───────────────────────────────────────────────────────────┘ │
|
| 242 |
+
└─────────────────────────────────────────────────────────────────┘
|
| 243 |
+
```
|
| 244 |
|
| 245 |
+
### Project Structure
|
| 246 |
|
| 247 |
+
```
|
| 248 |
+
gridmind/
|
| 249 |
+
├── inference.py ← Hackathon entrypoint (root)
|
| 250 |
+
├── openenv.yaml ← OpenEnv spec manifest
|
| 251 |
+
├── Dockerfile ← Multi-stage build (Go + Python)
|
| 252 |
+
├── .env ← API credentials (git-ignored)
|
| 253 |
+
├── baseline_scores.json ← Produced by inference.py
|
| 254 |
+
│
|
| 255 |
+
├── main.go ← HTTP server (routes, middleware, metrics)
|
| 256 |
+
├── env/ ← Core environment logic (Go)
|
| 257 |
+
│ ├── environment.go ← Simulation: physics, thermal dynamics
|
| 258 |
+
│ ├── models.go ← All data types (Observation, Action, etc.)
|
| 259 |
+
│ ├── rewards.go ← 7-component dense reward function
|
| 260 |
+
│ └── tasks.go ← 3 task definitions + deterministic graders
|
| 261 |
+
│
|
| 262 |
+
├── python/ ← Python support layer
|
| 263 |
+
│ ├── inference.py ← Full LLM agent + heuristic fallback
|
| 264 |
+
│ ├── models.py ← Typed Pydantic models (mirrors Go structs)
|
| 265 |
+
│ ├── validate.py ← OpenEnv spec validation suite
|
| 266 |
+
│ └── requirements.txt ← Python dependencies
|
| 267 |
+
│
|
| 268 |
+
├── tests/ ← Automated tests
|
| 269 |
+
│ ├── environment_test.go ← Go unit tests (determinism, bounds, etc.)
|
| 270 |
+
│ └── test_graders.py ← Python grader tests (pytest)
|
| 271 |
+
│
|
| 272 |
+
└── dashboard/ ← Optional web dashboard
|
| 273 |
+
├── server.py ← FastAPI server
|
| 274 |
+
└── static/ ← Frontend assets
|
| 275 |
+
```
|
| 276 |
|
| 277 |
---
|
| 278 |
|
| 279 |
+
## 🐳 Docker
|
| 280 |
|
| 281 |
+
| Action | Command |
|
| 282 |
+
|--------|---------|
|
| 283 |
+
| **Build** | `docker build -t gridmind-rl .` |
|
| 284 |
+
| **Run (foreground)** | `docker run --rm -p 7860:7860 --name gridmind gridmind-rl` |
|
| 285 |
+
| **Run (background)** | `docker run --rm -d -p 7860:7860 --name gridmind gridmind-rl` |
|
| 286 |
+
| **Stop** | `docker stop gridmind` |
|
| 287 |
+
| **Run inference inside container** | `docker exec -it gridmind python /app/inference.py --fast-mode` |
|
| 288 |
|
| 289 |
+
The Dockerfile uses a **multi-stage build**:
|
| 290 |
+
1. **Stage 1** — Go 1.21 Alpine: compiles the environment server binary
|
| 291 |
+
2. **Stage 2** — Python 3.11 slim: runs the Go binary + Python dashboard via Supervisor
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 292 |
|
| 293 |
---
|
| 294 |
|
| 295 |
+
## ☁️ Hugging Face Space Deployment
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 296 |
|
| 297 |
+
### 1. Create a New Space
|
|
|
|
|
|
|
|
|
|
| 298 |
|
| 299 |
+
Go to [huggingface.co/new-space](https://huggingface.co/new-space):
|
| 300 |
+
- **SDK:** Docker
|
| 301 |
+
- **Hardware:** CPU Basic (2 vCPU, 16 GB — free tier)
|
| 302 |
|
| 303 |
+
### 2. Push to HF
|
| 304 |
|
| 305 |
+
```bash
|
| 306 |
+
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/gridmind-rl
|
| 307 |
+
git push hf main
|
| 308 |
+
```
|
|
|
|
|
|
|
|
|
|
| 309 |
|
| 310 |
+
### 3. Verify
|
| 311 |
|
| 312 |
+
```bash
|
| 313 |
+
curl https://YOUR_USERNAME-gridmind-rl.hf.space/health
|
| 314 |
+
# → {"status":"ok","version":"1.0.0"}
|
| 315 |
|
| 316 |
+
curl -X POST https://YOUR_USERNAME-gridmind-rl.hf.space/reset \
|
| 317 |
+
-H "Content-Type: application/json" \
|
| 318 |
+
-d '{"task_id":1,"seed":42}'
|
| 319 |
+
```
|
| 320 |
|
| 321 |
+
> **Note:** HF Spaces exposes port **7860** publicly. The dashboard (7861) is for local development only.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 322 |
|
| 323 |
---
|
| 324 |
|
| 325 |
+
## 🧪 Testing
|
| 326 |
|
| 327 |
+
### Run Go Unit Tests
|
| 328 |
|
| 329 |
```bash
|
| 330 |
+
cd gridmind
|
| 331 |
+
go test ./tests/ -v
|
| 332 |
+
```
|
| 333 |
|
| 334 |
+
### Run Python Grader Tests (requires server running)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 335 |
|
| 336 |
+
```bash
|
| 337 |
+
pytest tests/test_graders.py -v
|
| 338 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 339 |
|
| 340 |
+
### Run Full OpenEnv Validation
|
|
|
|
|
|
|
| 341 |
|
| 342 |
+
```bash
|
| 343 |
+
python python/validate.py --env-url http://localhost:7860
|
|
|
|
| 344 |
```
|
| 345 |
|
| 346 |
---
|
| 347 |
|
| 348 |
+
## 📝 Inference Script Reference
|
| 349 |
|
| 350 |
+
The `inference.py` script at the project root is the **hackathon entrypoint**.
|
| 351 |
|
| 352 |
+
### Environment Variables
|
|
|
|
|
|
|
| 353 |
|
| 354 |
+
| Variable | Default | Description |
|
| 355 |
+
|----------|---------|-------------|
|
| 356 |
+
| `API_BASE_URL` | `https://openrouter.ai/api/v1` | LLM API endpoint |
|
| 357 |
+
| `MODEL_NAME` | `meta-llama/llama-3.1-8b-instruct:free` | Model to use |
|
| 358 |
+
| `OPENAI_API_KEY` | — | API key (any OpenAI-compatible provider) |
|
| 359 |
+
| `ENV_URL` | `http://localhost:7860` | Environment server URL |
|
| 360 |
|
| 361 |
+
### Command-Line Flags
|
|
|
|
|
|
|
|
|
|
|
|
|
| 362 |
|
| 363 |
+
| Flag | Default | Description |
|
| 364 |
+
|------|---------|-------------|
|
| 365 |
+
| `--episodes N` | 1 | Episodes per task (tasks 1–3 run in sequence) |
|
| 366 |
+
| `--fast-mode` | off | Use heuristic policy only (no LLM, fully reproducible) |
|
| 367 |
+
| `--llm-every N` | 4 | Reuse each LLM action for N steps (reduces API calls) |
|
| 368 |
+
| `--max-steps N` | 96 | Stop early after N steps |
|
| 369 |
+
| `--env-url URL` | from env | Override environment URL |
|
| 370 |
+
| `--output FILE` | `baseline_scores.json` | Output results file |
|
| 371 |
+
| `--verbose` | off | Print detailed step logs |
|
| 372 |
|
| 373 |
+
### Stdout Log Format
|
| 374 |
|
| 375 |
+
Each episode emits structured markers for automated evaluation:
|
|
|
|
|
|
|
| 376 |
|
|
|
|
|
|
|
|
|
|
| 377 |
```
|
| 378 |
+
[START]
|
| 379 |
+
[STEP1]
|
| 380 |
+
[STEP2]
|
| 381 |
+
...
|
| 382 |
+
[STEP96]
|
| 383 |
+
[END]
|
| 384 |
+
```
|
| 385 |
+
|
| 386 |
+
---
|
| 387 |
|
| 388 |
+
## 📎 OpenEnv Spec Compliance
|
| 389 |
+
|
| 390 |
+
| Requirement | Status |
|
| 391 |
+
|-------------|--------|
|
| 392 |
+
| `openenv.yaml` with metadata | ✅ |
|
| 393 |
+
| Typed Pydantic models (Observation, Action, Reward) | ✅ |
|
| 394 |
+
| `step(action)` → observation, reward, done, info | ✅ |
|
| 395 |
+
| `reset()` → initial observation | ✅ |
|
| 396 |
+
| `state()` → current state | ✅ |
|
| 397 |
+
| 3 tasks with programmatic graders (0.0–1.0) | ✅ |
|
| 398 |
+
| Dense reward function (not binary) | ✅ |
|
| 399 |
+
| Baseline inference using OpenAI client | ✅ |
|
| 400 |
+
| Working Dockerfile | ✅ |
|
| 401 |
+
| Deterministic with seed | ✅ |
|
| 402 |
+
| Exploit detection | ✅ |
|
| 403 |
|
| 404 |
---
|
| 405 |
|
| 406 |
+
## 📄 License
|
| 407 |
|
| 408 |
See `LICENSE` in the repository.
|
python/inference.py
CHANGED
|
@@ -4,17 +4,23 @@ GridMind-RL Baseline Inference Script
|
|
| 4 |
Runs an LLM agent against all 3 tasks for N episodes each.
|
| 5 |
Uses the OpenAI Python client pointed at any OpenAI-compatible endpoint.
|
| 6 |
|
| 7 |
-
Required environment variables:
|
| 8 |
-
API_BASE_URL
|
| 9 |
-
MODEL_NAME
|
| 10 |
-
OPENAI_API_KEY
|
| 11 |
|
| 12 |
Usage:
|
| 13 |
-
|
| 14 |
-
export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
|
| 15 |
-
export OPENAI_API_KEY=hf_xxxx # or HF_TOKEN=hf_xxxx
|
| 16 |
python inference.py
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
"""
|
| 19 |
|
| 20 |
from __future__ import annotations
|
|
@@ -29,11 +35,18 @@ from typing import Any
|
|
| 29 |
import requests
|
| 30 |
from openai import OpenAI
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
# ── Constants ──────────────────────────────────────────────────────────────
|
| 33 |
|
| 34 |
ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
|
| 35 |
-
MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/
|
| 36 |
-
API_BASE_URL = os.getenv("API_BASE_URL", "https://
|
| 37 |
# Accept OPENAI_API_KEY (hackathon standard) or HF_TOKEN (HuggingFace convention)
|
| 38 |
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "") or os.getenv("HF_TOKEN", "")
|
| 39 |
DEFAULT_EPISODES = 1
|
|
|
|
| 4 |
Runs an LLM agent against all 3 tasks for N episodes each.
|
| 5 |
Uses the OpenAI Python client pointed at any OpenAI-compatible endpoint.
|
| 6 |
|
| 7 |
+
Required environment variables (set in .env or shell):
|
| 8 |
+
API_BASE_URL — The API endpoint for the LLM (default: OpenRouter)
|
| 9 |
+
MODEL_NAME — The model identifier to use for inference
|
| 10 |
+
OPENAI_API_KEY — API key for authentication (works with any provider)
|
| 11 |
|
| 12 |
Usage:
|
| 13 |
+
# Option 1: Use .env file (recommended — just paste your key)
|
|
|
|
|
|
|
| 14 |
python inference.py
|
| 15 |
+
|
| 16 |
+
# Option 2: Set env vars manually
|
| 17 |
+
export API_BASE_URL=https://openrouter.ai/api/v1
|
| 18 |
+
export MODEL_NAME=meta-llama/llama-3.1-8b-instruct:free
|
| 19 |
+
export OPENAI_API_KEY=sk-or-v1-xxxx
|
| 20 |
+
python inference.py
|
| 21 |
+
|
| 22 |
+
# Option 3: Fast mode (no LLM, heuristic only)
|
| 23 |
+
python inference.py --fast-mode --episodes 1
|
| 24 |
"""
|
| 25 |
|
| 26 |
from __future__ import annotations
|
|
|
|
| 35 |
import requests
|
| 36 |
from openai import OpenAI
|
| 37 |
|
| 38 |
+
# ── Load .env file (if present) ────────────────────────────────────────────
|
| 39 |
+
try:
|
| 40 |
+
from dotenv import load_dotenv
|
| 41 |
+
load_dotenv() # reads .env from current directory or project root
|
| 42 |
+
except ImportError:
|
| 43 |
+
pass # python-dotenv not installed — env vars must be set manually
|
| 44 |
+
|
| 45 |
# ── Constants ──────────────────────────────────────────────────────────────
|
| 46 |
|
| 47 |
ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
|
| 48 |
+
MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/llama-3.3-70b-instruct:free")
|
| 49 |
+
API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
|
| 50 |
# Accept OPENAI_API_KEY (hackathon standard) or HF_TOKEN (HuggingFace convention)
|
| 51 |
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "") or os.getenv("HF_TOKEN", "")
|
| 52 |
DEFAULT_EPISODES = 1
|
python/requirements.txt
CHANGED
|
@@ -5,3 +5,4 @@ pydantic>=2.0.0
|
|
| 5 |
requests>=2.31.0
|
| 6 |
httpx>=0.24.0
|
| 7 |
pytest>=7.0.0
|
|
|
|
|
|
| 5 |
requests>=2.31.0
|
| 6 |
httpx>=0.24.0
|
| 7 |
pytest>=7.0.0
|
| 8 |
+
python-dotenv>=1.0.0
|