adityss commited on
Commit
b4281fc
·
1 Parent(s): 5569b4d

feat: initialize project with documentation, inference script, and environment configuration

Browse files
Files changed (4) hide show
  1. .gitignore +20 -0
  2. README.md +296 -213
  3. python/inference.py +23 -10
  4. python/requirements.txt +1 -0
.gitignore CHANGED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Secrets
2
+ .env
3
+
4
+ # Python
5
+ __pycache__/
6
+ *.pyc
7
+ *.pyo
8
+ .pytest_cache/
9
+
10
+ # Go
11
+ *.exe
12
+ gridmind-out.exe
13
+
14
+ # IDE
15
+ .vscode/
16
+ .idea/
17
+
18
+ # OS
19
+ .DS_Store
20
+ Thumbs.db
README.md CHANGED
@@ -1,325 +1,408 @@
1
- # GridMind-RL
2
 
3
- **OpenEnv-style environment** for reinforcement learning and LLM agents on **building energy management**: HVAC, thermal storage, demand response, batch job scheduling, and load shedding under time-varying electricity prices and grid stress.
4
 
5
- ---
6
 
7
- ## Project overview
8
 
9
- GridMind-RL simulates a **24-hour** control horizon at **15-minute resolution** (96 steps per episode). The agent observes prices, temperature, storage, process load, grid stress, carbon intensity, and batch job deadlines; it acts with continuous and discrete controls aligned with real **demand response** and **industrial/commercial** load-shaping problems.
10
 
11
- **Why it matters:** Optimizing flexible loads against **time-of-use pricing** and **grid signals** reduces cost and emissions while respecting comfort and process constraints—an active area for RL and LLM-based control research.
12
 
13
- **Strengths for judges**
 
 
 
 
14
 
15
- | Area | Detail |
16
- |------|--------|
17
- | Spec | `openenv.yaml` documents server port, schemas, tasks, and endpoints |
18
- | API | REST: reset, step, state, grade, health, ping, replay, tasks, metrics |
19
- | Tasks | Three levels (easy / medium / hard) with deterministic episode grading |
20
- | Baseline | Root `inference.py` + OpenAI-compatible LLM client and heuristic fallback |
21
- | Ops | Multi-stage **Docker** image: Go environment + Python dashboard + deps |
22
 
23
  ---
24
 
25
- ## Quick start (copy-paste)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
- **Minimal flow** (API on **7860** only; keep Docker running, then run `python` in a **second** terminal from the repo root with `pip install -r python/requirements.txt` already done):
28
 
29
- ```bash
30
- docker build -t gridmind-rl .
31
- docker run -p 7860:7860 gridmind-rl
32
 
33
- python inference.py --fast-mode --episodes 1
34
- ```
35
 
36
- ### 1. Build and run (Docker)
 
 
 
 
37
 
38
- From the **repository root**:
39
 
40
- ```bash
41
- docker build -t gridmind-rl .
42
- docker run --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl
43
- ```
44
 
45
- - **7860** Environment API (OpenEnv / agent traffic); http://localhost:7860
46
- - **7861** — Web dashboard (optional); http://localhost:7861
47
 
48
- **Windows (PowerShell)** — same commands in a terminal with Docker Desktop running.
49
 
50
- ### 2. Validate the API (optional)
 
 
51
 
52
- With the container running, from the repo root (host Python with `requests`):
53
 
54
  ```bash
55
- pip install requests
56
- python python/validate.py --env-url http://localhost:7860
57
  ```
58
 
59
- ### 3. Run baseline inference
60
-
61
- On the **host** (not inside the container unless you set `--env-url` to the env server):
62
 
63
  ```bash
64
- pip install -r python/requirements.txt
 
65
  ```
66
 
67
- **Windows PowerShell:**
68
 
69
- ```powershell
70
- $env:ENV_URL="http://localhost:7860"
71
- python inference.py --fast-mode --episodes 1
 
 
 
72
  ```
73
 
74
- **Windows Command Prompt (cmd):**
75
 
76
- ```bat
77
- set ENV_URL=http://localhost:7860
78
- python inference.py --fast-mode --episodes 1
79
- ```
80
 
81
- **Linux / macOS:**
82
 
83
  ```bash
84
- export ENV_URL=http://localhost:7860
85
- python inference.py --fast-mode --episodes 1
86
  ```
87
 
88
- You can run the same entrypoint directly with `python python/inference.py` (e.g. `python python/inference.py --fast-mode`); flags match the root `inference.py` wrapper.
89
 
90
- **LLM baseline** (requires any OpenAI-compatible API credentials HuggingFace, Groq, etc.):
91
 
92
- ```bash
93
- export ENV_URL=http://localhost:7860
94
- export API_BASE_URL=https://router.huggingface.co/v1
95
- export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
96
- export OPENAI_API_KEY=your_token_here # or HF_TOKEN=your_token_here
97
- python inference.py --episodes 1 --llm-every 4
 
 
 
 
 
 
 
98
  ```
99
 
100
- > **Note:** The script accepts either `OPENAI_API_KEY` (hackathon standard) or `HF_TOKEN` (HuggingFace convention). You do **not** need a paid OpenAI key — any OpenAI-compatible provider works.
101
 
102
- Results are written to `baseline_scores.json` by default (`--output` to change).
103
 
104
- ---
 
 
105
 
106
- ## Tasks
 
 
107
 
108
- | ID | Difficulty | Name | Objective |
109
- |----|------------|------|-----------|
110
- | 1 | Easy | Cost minimization | Minimize total energy cost over the episode. No temperature or batch-job objectives in the grade. |
111
- | 2 | Medium | Constrained temperature | Minimize cost while keeping indoor temperature within **±2 °C** of setpoint (19–23 °C) for graded temperature compliance. |
112
- | 3 | Hard | Full demand response | Minimize cost, maintain temperature, respond to **grid stress** (e.g. shed load when stress is high), complete **batch jobs** on time, and reduce **carbon** vs a baseline policy in the composite score. |
113
 
114
- Episode **grade** is returned by `GET /grade` after the episode completes (or after a partial run if you stopped stepping early). Sub-scores are task-dependent and documented in code (`env/tasks.go`).
 
 
 
 
115
 
116
  ---
117
 
118
- ## Reward Structure
119
 
120
- The environment uses a **dense, multi-component reward** signal. Each step returns a scalar `reward` (the sum) and a detailed `reward_components` breakdown in `info`:
121
 
122
- | Component | Key | Active | Description |
123
- |-----------|-----|--------|-------------|
124
- | **Cost Savings** | `cost_savings` | All tasks | Positive baseline (1.5) minus relative step cost. Smart agents that reduce energy use earn higher rewards. |
125
- | **Temperature Constraint** | `temp_constraint` | Task 2, 3 | Gaussian bonus for staying near setpoint (21°C). Max +1.5 at setpoint, degrades toward bounds, penalty outside [19°C, 23°C]. |
126
- | **Grid Demand Response** | `grid_response` | Task 3 | Bonus for shedding load during high grid stress (>0.7), readiness bonus during moderate stress, penalty for unnecessary shedding. |
127
- | **Efficiency Bonus** | `efficiency_bonus` | All tasks | Rewards thermal storage arbitrage (charge during cheap prices, discharge during expensive) and maintaining useful storage levels. |
128
- | **Stability Reward** | `stability_penalty` | All tasks | Positive reward for smooth, stable control; penalty for rapid HVAC/storage oscillation. |
129
- | **Deadline Penalty** | `deadline_penalty` | Task 2, 3 | Penalty per missed batch job deadline (-1.5 each). Positive bonus for keeping jobs on track. |
130
- | **Carbon Reward** | `carbon_reward` | Task 3 | Baseline bonus for low-carbon operation, reduced by carbon-heavy consumption. Extra bonus during clean grid periods. |
131
 
132
- **Grading weights (Task 3):** cost 28%, temperature 20%, grid_response 20%, batch_deadline 12%, carbon 20%.
133
 
134
  ---
135
 
136
- ## HTTP API
137
 
138
- Base URL: `http://<host>:7860` (default in container: port **7860**).
139
 
140
  | Method | Path | Purpose |
141
  |--------|------|---------|
142
- | GET | `/health` | Liveness; JSON `status`, `version` |
143
- | GET | `/ping` | Lightweight liveness; JSON `status` |
144
- | POST | `/reset` | Start episode: body e.g. `{"task_id": 1, "seed": 42, "num_buildings": 1}` |
145
- | POST | `/step` | Advance one step: JSON action or array of actions (multi-building) |
146
- | GET | `/state` | Full snapshot: buildings, downsampled price/carbon curves, step, task, etc. |
147
- | GET | `/grade` | Episode score in `[0, 1]`, sub-scores, exploit flags |
148
- | GET | `/replay` | Step replay list |
149
- | GET | `/tasks` | Task metadata and grader weights |
150
- | GET | `/metrics` | Prometheus-style text metrics |
 
 
151
 
152
- **Action JSON fields** (single building): `hvac_power_level`, `thermal_charge_rate`, `batch_job_slot`, `load_shed_fraction`, optional `building_id`.
 
 
 
 
153
 
154
- Schemas and primary endpoints: **`openenv.yaml`** at repo root (see Notes for additional endpoints like `/metrics`).
 
 
 
155
 
156
- ---
 
 
157
 
158
- ## Evaluation modes (`inference.py`)
159
 
160
- There is **no** `--judge-mode` flag in this repository. Use the modes below.
161
 
162
- | Mode | Command pattern | Behavior |
163
- |------|-----------------|----------|
164
- | **Fast (heuristic)** | `python inference.py --fast-mode` | No LLM calls; deterministic given env seed; fastest for CI or smoke tests. |
165
- | **Default LLM** | `python inference.py` | Uses OpenAI Python client (`API_BASE_URL`, `MODEL_NAME`, `OPENAI_API_KEY` or `HF_TOKEN`); default `--llm-every 4` reuses each LLM action for 4 steps to limit API cost. |
166
- | **Recommended for automated evaluation / judging** | `python inference.py --fast-mode --episodes 1` | Recommended when automated pipelines need **reproducibility** and **no external API** dependency. |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
 
168
- Other useful flags:
169
 
170
- | Flag | Default | Meaning |
171
- |------|---------|---------|
172
- | `--episodes` | `1` | Episodes per task (tasks 1–3 run in sequence) |
173
- | `--env-url` | `ENV_URL` or `http://localhost:7860` | Environment base URL |
174
- | `--llm-every` | `4` | Steps per LLM call (ignored in `--fast-mode`) |
175
- | `--max-steps` | full episode | Stop after N steps; grade reflects **partial** episode |
176
- | `--output` | `baseline_scores.json` | Results path |
177
- | `--verbose` | off | Extra step logs |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
 
179
  ---
180
 
181
- ## Logging format (baseline)
182
 
183
- For each episode the script prints, in order:
 
 
 
 
 
 
184
 
185
- 1. **`[START]`** episode beginning (after `reset`)
186
- 2. **`[STEP1]` … `[STEP96]`** (full episode) one line per successful `POST /step`; a full episode has **96** steps (`[STEP1]` through `[STEP96]`) unless `--max-steps` or an early error stops the loop
187
- 3. **`[END]`** — after `GET /grade` for that episode
188
-
189
- Additional lines (banners, task headers, `[OK]` / `[WARN]`) may appear; parsers should match the bracketed markers above.
190
-
191
- Example shape:
192
-
193
- ```text
194
- [START]
195
- [STEP1]
196
- [STEP2]
197
- ...
198
- [STEP96]
199
- [END]
200
- ```
201
 
202
  ---
203
 
204
- ## Architecture
205
-
206
- ```text
207
- ┌─────────────────────────────────────────────────────────────┐
208
- │ Client: python inference.py (LLM or heuristic) │
209
- │ │ HTTP (reset / step / grade) │
210
- │ ▼ │
211
- │ ┌──────────────────┐ ┌─────────────────────────────┐ │
212
- │ │ gridmind-server │ │ Dashboard (optional) │ │
213
- │ │ Go :7860 │◄────│ FastAPI + static UI :7861 │ │
214
- │ │ env/* simulation│ │ proxies /api → :7860 │ │
215
- │ └──────────────────┘ └─────────────────────────────┘ │
216
- └─────────────────────────────────────────────────────────────┘
217
- ```
218
 
219
- - **Core:** `main.go` + `env/` (physics, rewards, tasks, grading)
220
- - **Baseline:** `inference.py` (root) → `python/inference.py`
221
- - **Dashboard:** `dashboard/server.py`, `dashboard/static/`
222
- - **Spec:** `openenv.yaml`
223
 
224
- ---
 
 
225
 
226
- ## Docker (detailed)
227
 
228
- | Step | Command |
229
- |------|---------|
230
- | Build | `docker build -t gridmind-rl .` |
231
- | Run (foreground) | `docker run --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
232
- | Run (background) | `docker run -d --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
233
- | Stop (background) | `docker stop gridmind` |
234
- | Inference **inside** container | `docker exec -it gridmind python /app/inference.py --fast-mode --env-url http://127.0.0.1:7860` |
235
 
236
- The image runs **supervisord** as a non-root user with two programs: Go server (`PORT=7860`) and uvicorn dashboard (`7861`).
237
 
238
- ---
 
 
239
 
240
- ## Notes for judges and operators
 
 
 
241
 
242
- | Topic | Detail |
243
- |-------|--------|
244
- | **Ports** | **7860** = environment API; **7861** = dashboard. Some hosts only expose one public port—API is the required one for OpenEnv-style evaluation. |
245
- | **Episode length** | **96 steps** = 24 h at 15 min/step. Observation `step` is **0–95** for a full episode. |
246
- | **`openenv.yaml`** | Lists main endpoints; **`/metrics`** exists at runtime but may not appear in the YAML block—treat as an extra ops endpoint. |
247
- | **Reproducibility** | Env is seed-controlled. LLM outputs may still vary by provider even at `temperature=0`. |
248
- | **`--max-steps`** | Produces a **partial** episode; final `GET /grade` reflects that partial trajectory. |
249
- | **Manual run (no Docker)** | Install Go 1.21+, `go run .` from repo root (default port 7860); install Python deps and run `python inference.py` as above. |
250
- | **Runtime** | The baseline completes within typical hackathon limits (<20 minutes). |
251
 
252
  ---
253
 
254
- ## Example API Calls
255
 
256
- With the container running (`docker run -p 7860:7860 gridmind-rl`):
257
 
258
  ```bash
259
- # Health check
260
- curl http://localhost:7860/health
261
- # → {"status":"ok","version":"1.0.0"}
262
 
263
- # Reset to Task 3 (hard) with seed 42
264
- curl -X POST http://localhost:7860/reset \
265
- -H "Content-Type: application/json" \
266
- -d '{"task_id": 3, "seed": 42, "num_buildings": 1}'
267
- # → {"observations":[{"indoor_temperature":21.3,...}],"episode":1,"task_id":3,"seed":42}
268
 
269
- # Take one step
270
- curl -X POST http://localhost:7860/step \
271
- -H "Content-Type: application/json" \
272
- -d '{"hvac_power_level": 0.6, "thermal_charge_rate": 0.3, "batch_job_slot": 1, "load_shed_fraction": 0.1}'
273
- # → {"observation":{...},"reward":2.15,"done":false,"info":{"reward_components":{...},...}}
274
-
275
- # Get current state
276
- curl http://localhost:7860/state
277
- # → {"buildings":[...],"price_curve_episode":[...],"step":1,"task_id":3,...}
278
 
279
- # Grade after episode ends (run 96 steps first)
280
- curl http://localhost:7860/grade
281
- # → {"task_id":3,"score":0.3115,"sub_scores":{"cost":0.25,"temperature":0.14,...},...}
282
 
283
- # List all tasks
284
- curl http://localhost:7860/tasks
285
- # → [{"id":1,"name":"Cost Minimization","difficulty":"easy",...},...]
286
  ```
287
 
288
  ---
289
 
290
- ## Hugging Face Space Deployment
291
 
292
- ### 1. Create a new HF Space
293
 
294
- Go to [huggingface.co/new-space](https://huggingface.co/new-space) and create a **Docker** space. Select:
295
- - **SDK:** Docker
296
- - **Hardware:** CPU Basic (2 vCPU, 16GB)
297
 
298
- ### 2. Push code to HF
 
 
 
 
 
299
 
300
- ```bash
301
- # Clone and push
302
- git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/gridmind-rl
303
- git push hf main
304
- ```
305
 
306
- ### 3. Verify deployment
 
 
 
 
 
 
 
 
307
 
308
- Once the Space builds, verify at your Space URL:
309
 
310
- ```bash
311
- curl https://YOUR_USERNAME-gridmind-rl.hf.space/health
312
- # → {"status":"ok","version":"1.0.0"}
313
 
314
- curl -X POST https://YOUR_USERNAME-gridmind-rl.hf.space/reset \
315
- -H "Content-Type: application/json" \
316
- -d '{"task_id":1,"seed":42}'
317
  ```
 
 
 
 
 
 
 
 
 
318
 
319
- > **Note:** HF Spaces only exposes **one port** publicly. Port **7860** (the OpenEnv API) is the primary port and will be the public endpoint. The dashboard on port 7861 is for local development only.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
320
 
321
  ---
322
 
323
- ## License
324
 
325
  See `LICENSE` in the repository.
 
1
+ # GridMind-RL
2
 
3
+ **A real-world RL environment for building energy management** — control HVAC systems, thermal storage, batch job scheduling, and demand response under stochastic electricity prices and grid stress events.
4
 
5
+ Built on the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) specification. Containerized. Ready for Hugging Face Spaces.
6
 
7
+ ---
8
 
9
+ ## 🎯 Why GridMind-RL?
10
 
11
+ Optimizing building energy use is a **real problem** that utilities, building managers, and industrial operators face every day. An agent must balance:
12
 
13
+ - **Cost** — buy electricity when it's cheap, avoid peak pricing
14
+ - **Comfort** — keep indoor temperature within comfortable bounds
15
+ - **Grid compliance** — shed load when the grid signals demand-response events
16
+ - **Scheduling** — complete batch processing jobs before their deadlines
17
+ - **Carbon** — minimize carbon emissions by timing consumption to clean-grid periods
18
 
19
+ This isn't a toy or a game. It's a simulation of decisions that **humans actually make** in industrial energy management, packaged as an RL environment where agents can learn to do it better.
 
 
 
 
 
 
20
 
21
  ---
22
 
23
+ ## 📐 Observation Space
24
+
25
+ Each timestep (15 minutes of simulated time), the agent receives:
26
+
27
+ | Field | Type | Range | Description |
28
+ |-------|------|-------|-------------|
29
+ | `indoor_temperature` | float | 10–40 °C | Current building temperature |
30
+ | `thermal_storage_level` | float | 0.0–1.0 | Thermal tank fill level (0=empty, 1=full) |
31
+ | `process_demand` | float | ≥ 0 kW | Current industrial power demand |
32
+ | `current_price` | float | > 0 $/kWh | Real-time electricity price |
33
+ | `grid_stress_signal` | float | 0.0–1.0 | Utility demand-response urgency (>0.7 = critical) |
34
+ | `carbon_intensity` | float | ≥ 0 gCO₂/kWh | Grid carbon intensity |
35
+ | `hour_of_day` | int | 0–23 | Current hour |
36
+ | `batch_queue` | int[] | — | Deadline slots of pending batch jobs |
37
+ | `cumulative_cost` | float | ≥ 0 $ | Total energy cost so far this episode |
38
+ | `step` | int | 0–95 | Current timestep (96 steps = 24 hours) |
39
+ | `building_id` | int | 0+ | Building index in multi-building mode |
40
+
41
+ ## 🕹️ Action Space
42
+
43
+ Each timestep, the agent sends:
44
+
45
+ | Field | Type | Range | Description |
46
+ |-------|------|-------|-------------|
47
+ | `hvac_power_level` | float | 0.0–1.0 | Fraction of max HVAC power (0=off, 1=full) |
48
+ | `thermal_charge_rate` | float | -1.0–1.0 | Charge (+) or discharge (-) thermal storage |
49
+ | `batch_job_slot` | int | 0–4 | Schedule next batch job: 0=now, 1–4=defer |
50
+ | `load_shed_fraction` | float | 0.0–0.5 | Fraction of non-critical load to shed |
51
+ | `building_id` | int | 0+ | Which building this action targets |
52
+
53
+ ## 💰 Reward Structure
54
+
55
+ The environment provides a **dense, multi-component reward** every step — not just a binary win/lose at the end. Each step returns a scalar `reward` (the sum) plus a detailed `reward_components` breakdown:
56
+
57
+ | Component | Key | Description |
58
+ |-----------|-----|-------------|
59
+ | Cost Savings | `cost_savings` | Rewards reducing energy spend vs baseline |
60
+ | Temperature | `temp_constraint` | Gaussian bonus near setpoint, penalty outside bounds |
61
+ | Grid Response | `grid_response` | Bonus for shedding load during grid stress |
62
+ | Efficiency | `efficiency_bonus` | Thermal storage arbitrage + balanced usage |
63
+ | Stability | `stability_penalty` | Rewards smooth control, penalizes oscillation |
64
+ | Deadlines | `deadline_penalty` | Penalty for missed batch jobs |
65
+ | Carbon | `carbon_reward` | Bonus for low-carbon operation |
66
 
67
+ ---
68
 
69
+ ## 📋 Tasks (3 difficulty levels)
 
 
70
 
71
+ Each task defines a concrete objective with a **deterministic programmatic grader** that scores performance from **0.0 to 1.0**.
 
72
 
73
+ | ID | Difficulty | Name | What the Agent Must Do | Grader Weights |
74
+ |----|:----------:|------|------------------------|----------------|
75
+ | 1 | 🟢 Easy | **Cost Minimization** | Minimize total energy cost over 24 hours. No temperature or scheduling constraints. | cost: 100% |
76
+ | 2 | 🟡 Medium | **Constrained Temperature** | Minimize cost **and** keep temperature within 19–23°C at all times. | cost: 60%, temperature: 40% |
77
+ | 3 | 🔴 Hard | **Full Demand Response** | Minimize cost, maintain temperature, respond to grid stress, complete batch jobs on time, minimize carbon. | cost: 28%, temperature: 20%, grid: 20%, batch: 12%, carbon: 20% |
78
 
79
+ **Graders are deterministic**: given the same seed, the same actions always produce the same score.
80
 
81
+ ---
 
 
 
82
 
83
+ ## 🚀 Getting Started (Step by Step)
 
84
 
85
+ ### Prerequisites
86
 
87
+ - **Docker** [Install Docker Desktop](https://www.docker.com/products/docker-desktop/)
88
+ - **Python 3.9+** — [Download Python](https://www.python.org/downloads/)
89
+ - **Git** — [Download Git](https://git-scm.com/downloads)
90
 
91
+ ### Step 1: Clone the Repository
92
 
93
  ```bash
94
+ git clone https://github.com/LO-Kyu/gridmind.git
95
+ cd gridmind
96
  ```
97
 
98
+ ### Step 2: Build and Start the Environment Server
 
 
99
 
100
  ```bash
101
+ docker build -t gridmind-rl .
102
+ docker run --rm -d -p 7860:7860 --name gridmind gridmind-rl
103
  ```
104
 
105
+ This starts the GridMind-RL environment server on port **7860**. Verify it's running:
106
 
107
+ ```bash
108
+ # Linux/macOS
109
+ curl http://localhost:7860/health
110
+
111
+ # Windows (PowerShell)
112
+ Invoke-RestMethod -Uri http://localhost:7860/health
113
  ```
114
 
115
+ You should see: `{"status":"ok","version":"1.0.0"}`
116
 
117
+ ### Step 3: Install Python Dependencies
 
 
 
118
 
119
+ Open a **new terminal** (keep Docker running) and install:
120
 
121
  ```bash
122
+ pip install -r python/requirements.txt
 
123
  ```
124
 
125
+ ### Step 4: Get a Free API Key
126
 
127
+ The inference script uses an LLM to make decisions. You need a **free** API key:
128
 
129
+ 1. Go to [openrouter.ai/keys](https://openrouter.ai/keys)
130
+ 2. Sign in with Google or GitHub (free)
131
+ 3. Click **"Create Key"** and copy it
132
+
133
+ ### Step 5: Configure Your API Key
134
+
135
+ Open the `.env` file in the project root and paste your key:
136
+
137
+ ```env
138
+ API_BASE_URL=https://openrouter.ai/api/v1
139
+ MODEL_NAME=meta-llama/llama-3.1-8b-instruct:free
140
+ OPENAI_API_KEY=sk-or-v1-paste-your-actual-key-here
141
+ ENV_URL=http://localhost:7860
142
  ```
143
 
144
+ > **Note:** The model `meta-llama/llama-3.1-8b-instruct:free` is **completely free** on OpenRouter. No credit card needed.
145
 
146
+ ### Step 6: Run the Baseline Inference
147
 
148
+ ```bash
149
+ # Run LLM agent on all 3 tasks
150
+ python inference.py --episodes 1
151
 
152
+ # Or run without LLM (fast heuristic mode — no API key needed)
153
+ python inference.py --fast-mode --episodes 1
154
+ ```
155
 
156
+ The script will:
157
+ 1. Connect to the environment server
158
+ 2. Run the agent on Task 1 (easy), Task 2 (medium), Task 3 (hard)
159
+ 3. Print `[START]`, `[STEP1]`...`[STEP96]`, `[END]` for each episode
160
+ 4. Save results to `baseline_scores.json`
161
 
162
+ ### Step 7: Stop the Server (When Done)
163
+
164
+ ```bash
165
+ docker stop gridmind
166
+ ```
167
 
168
  ---
169
 
170
+ ## 📊 Baseline Scores
171
 
172
+ Produced by running `python inference.py --fast-mode --episodes 1` (heuristic policy):
173
 
174
+ | Task | Difficulty | Score | Details |
175
+ |------|:----------:|:-----:|---------|
176
+ | 1 Cost Minimization | 🟢 Easy | **0.7063** | cost: 0.706 |
177
+ | 2 Temperature Management | 🟡 Medium | **0.6333** | cost: 0.701, temperature: 0.531 |
178
+ | 3 — Full Demand Response | 🔴 Hard | **0.5966** | cost: 0.670, temp: 0.573, grid: 0.214, batch: 1.000, carbon: 0.657 |
179
+ | **Overall Average** | | **0.6454** | |
 
 
 
180
 
181
+ Scores are in the **0.0–1.0** range. Higher is better.
182
 
183
  ---
184
 
185
+ ## 🔌 HTTP API Reference
186
 
187
+ Base URL: `http://localhost:7860`
188
 
189
  | Method | Path | Purpose |
190
  |--------|------|---------|
191
+ | `GET` | `/health` | Health check `{"status":"ok","version":"1.0.0"}` |
192
+ | `GET` | `/ping` | Lightweight liveness check |
193
+ | `POST` | `/reset` | Start a new episode. Body: `{"task_id": 1, "seed": 42}` |
194
+ | `POST` | `/step` | Take one action. Body: action JSON (see Action Space above) |
195
+ | `GET` | `/state` | Full environment state snapshot |
196
+ | `GET` | `/grade` | Episode score (0.0–1.0) with sub-scores |
197
+ | `GET` | `/replay` | Full step-by-step replay of the episode |
198
+ | `GET` | `/tasks` | List all task definitions and grader weights |
199
+ | `GET` | `/metrics` | Prometheus-format operational metrics |
200
+
201
+ ### Example API Calls
202
 
203
+ ```bash
204
+ # Reset to Task 1 (easy) with seed 42
205
+ curl -X POST http://localhost:7860/reset \
206
+ -H "Content-Type: application/json" \
207
+ -d '{"task_id": 1, "seed": 42}'
208
 
209
+ # Take one step
210
+ curl -X POST http://localhost:7860/step \
211
+ -H "Content-Type: application/json" \
212
+ -d '{"hvac_power_level": 0.5, "thermal_charge_rate": 0.1, "batch_job_slot": 1, "load_shed_fraction": 0.0}'
213
 
214
+ # Check score after episode
215
+ curl http://localhost:7860/grade
216
+ ```
217
 
218
+ ---
219
 
220
+ ## 🏗️ Architecture
221
 
222
+ ```
223
+ ┌─────────────────────────────────────────────────────────────────┐
224
+ inference.py (LLM Agent or Heuristic) │
225
+ │ │ │
226
+ │ │ HTTP: POST /reset, /step · GET /grade, /state │
227
+ │ ▼ │
228
+ │ ┌───────────────────────────────────────────────────────────┐ │
229
+ │ │ Docker Container │ │
230
+ │ │ │ │
231
+ │ │ ┌─────────────────────┐ ┌───────────────────────────┐ │ │
232
+ │ │ │ Go Environment │ │ Python Dashboard │ │ │
233
+ │ │ │ Server (:7860) │ │ FastAPI + UI (:7861) │ │ │
234
+ │ │ │ │ │ │ │ │
235
+ │ │ │ • Physics engine │ │ • Proxies /api → :7860 │ │ │
236
+ │ │ │ • Reward function │◄──│ • Real-time charts │ │ │
237
+ │ │ │ • Task graders │ │ • State visualization │ │ │
238
+ │ │ └─────────────────────┘ └───────────────────────────┘ │ │
239
+ │ │ │ │
240
+ │ │ Isolated · Reproducible · Non-root user │ │
241
+ │ └───────────────────────────────────────────────────────────┘ │
242
+ └─────────────────────────────────────────────────────────────────┘
243
+ ```
244
 
245
+ ### Project Structure
246
 
247
+ ```
248
+ gridmind/
249
+ ├── inference.py ← Hackathon entrypoint (root)
250
+ ├── openenv.yaml ← OpenEnv spec manifest
251
+ ├── Dockerfile ← Multi-stage build (Go + Python)
252
+ ├── .env ← API credentials (git-ignored)
253
+ ├── baseline_scores.json Produced by inference.py
254
+
255
+ ├── main.go ← HTTP server (routes, middleware, metrics)
256
+ ├── env/ ← Core environment logic (Go)
257
+ │ ├── environment.go ← Simulation: physics, thermal dynamics
258
+ │ ├── models.go ← All data types (Observation, Action, etc.)
259
+ │ ├── rewards.go ← 7-component dense reward function
260
+ │ └── tasks.go ← 3 task definitions + deterministic graders
261
+
262
+ ├── python/ ← Python support layer
263
+ │ ├── inference.py ← Full LLM agent + heuristic fallback
264
+ │ ├── models.py ← Typed Pydantic models (mirrors Go structs)
265
+ │ ├── validate.py ← OpenEnv spec validation suite
266
+ │ └── requirements.txt ← Python dependencies
267
+
268
+ ├── tests/ ← Automated tests
269
+ │ ├── environment_test.go ← Go unit tests (determinism, bounds, etc.)
270
+ │ └── test_graders.py ← Python grader tests (pytest)
271
+
272
+ └── dashboard/ ← Optional web dashboard
273
+ ├── server.py ← FastAPI server
274
+ └── static/ ← Frontend assets
275
+ ```
276
 
277
  ---
278
 
279
+ ## 🐳 Docker
280
 
281
+ | Action | Command |
282
+ |--------|---------|
283
+ | **Build** | `docker build -t gridmind-rl .` |
284
+ | **Run (foreground)** | `docker run --rm -p 7860:7860 --name gridmind gridmind-rl` |
285
+ | **Run (background)** | `docker run --rm -d -p 7860:7860 --name gridmind gridmind-rl` |
286
+ | **Stop** | `docker stop gridmind` |
287
+ | **Run inference inside container** | `docker exec -it gridmind python /app/inference.py --fast-mode` |
288
 
289
+ The Dockerfile uses a **multi-stage build**:
290
+ 1. **Stage 1** — Go 1.21 Alpine: compiles the environment server binary
291
+ 2. **Stage 2** — Python 3.11 slim: runs the Go binary + Python dashboard via Supervisor
 
 
 
 
 
 
 
 
 
 
 
 
 
292
 
293
  ---
294
 
295
+ ## ☁️ Hugging Face Space Deployment
 
 
 
 
 
 
 
 
 
 
 
 
 
296
 
297
+ ### 1. Create a New Space
 
 
 
298
 
299
+ Go to [huggingface.co/new-space](https://huggingface.co/new-space):
300
+ - **SDK:** Docker
301
+ - **Hardware:** CPU Basic (2 vCPU, 16 GB — free tier)
302
 
303
+ ### 2. Push to HF
304
 
305
+ ```bash
306
+ git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/gridmind-rl
307
+ git push hf main
308
+ ```
 
 
 
309
 
310
+ ### 3. Verify
311
 
312
+ ```bash
313
+ curl https://YOUR_USERNAME-gridmind-rl.hf.space/health
314
+ # → {"status":"ok","version":"1.0.0"}
315
 
316
+ curl -X POST https://YOUR_USERNAME-gridmind-rl.hf.space/reset \
317
+ -H "Content-Type: application/json" \
318
+ -d '{"task_id":1,"seed":42}'
319
+ ```
320
 
321
+ > **Note:** HF Spaces exposes port **7860** publicly. The dashboard (7861) is for local development only.
 
 
 
 
 
 
 
 
322
 
323
  ---
324
 
325
+ ## 🧪 Testing
326
 
327
+ ### Run Go Unit Tests
328
 
329
  ```bash
330
+ cd gridmind
331
+ go test ./tests/ -v
332
+ ```
333
 
334
+ ### Run Python Grader Tests (requires server running)
 
 
 
 
335
 
336
+ ```bash
337
+ pytest tests/test_graders.py -v
338
+ ```
 
 
 
 
 
 
339
 
340
+ ### Run Full OpenEnv Validation
 
 
341
 
342
+ ```bash
343
+ python python/validate.py --env-url http://localhost:7860
 
344
  ```
345
 
346
  ---
347
 
348
+ ## 📝 Inference Script Reference
349
 
350
+ The `inference.py` script at the project root is the **hackathon entrypoint**.
351
 
352
+ ### Environment Variables
 
 
353
 
354
+ | Variable | Default | Description |
355
+ |----------|---------|-------------|
356
+ | `API_BASE_URL` | `https://openrouter.ai/api/v1` | LLM API endpoint |
357
+ | `MODEL_NAME` | `meta-llama/llama-3.1-8b-instruct:free` | Model to use |
358
+ | `OPENAI_API_KEY` | — | API key (any OpenAI-compatible provider) |
359
+ | `ENV_URL` | `http://localhost:7860` | Environment server URL |
360
 
361
+ ### Command-Line Flags
 
 
 
 
362
 
363
+ | Flag | Default | Description |
364
+ |------|---------|-------------|
365
+ | `--episodes N` | 1 | Episodes per task (tasks 1–3 run in sequence) |
366
+ | `--fast-mode` | off | Use heuristic policy only (no LLM, fully reproducible) |
367
+ | `--llm-every N` | 4 | Reuse each LLM action for N steps (reduces API calls) |
368
+ | `--max-steps N` | 96 | Stop early after N steps |
369
+ | `--env-url URL` | from env | Override environment URL |
370
+ | `--output FILE` | `baseline_scores.json` | Output results file |
371
+ | `--verbose` | off | Print detailed step logs |
372
 
373
+ ### Stdout Log Format
374
 
375
+ Each episode emits structured markers for automated evaluation:
 
 
376
 
 
 
 
377
  ```
378
+ [START]
379
+ [STEP1]
380
+ [STEP2]
381
+ ...
382
+ [STEP96]
383
+ [END]
384
+ ```
385
+
386
+ ---
387
 
388
+ ## 📎 OpenEnv Spec Compliance
389
+
390
+ | Requirement | Status |
391
+ |-------------|--------|
392
+ | `openenv.yaml` with metadata | ✅ |
393
+ | Typed Pydantic models (Observation, Action, Reward) | ✅ |
394
+ | `step(action)` → observation, reward, done, info | ✅ |
395
+ | `reset()` → initial observation | ✅ |
396
+ | `state()` → current state | ✅ |
397
+ | 3 tasks with programmatic graders (0.0–1.0) | ✅ |
398
+ | Dense reward function (not binary) | ✅ |
399
+ | Baseline inference using OpenAI client | ✅ |
400
+ | Working Dockerfile | ✅ |
401
+ | Deterministic with seed | ✅ |
402
+ | Exploit detection | ✅ |
403
 
404
  ---
405
 
406
+ ## 📄 License
407
 
408
  See `LICENSE` in the repository.
python/inference.py CHANGED
@@ -4,17 +4,23 @@ GridMind-RL Baseline Inference Script
4
  Runs an LLM agent against all 3 tasks for N episodes each.
5
  Uses the OpenAI Python client pointed at any OpenAI-compatible endpoint.
6
 
7
- Required environment variables:
8
- API_BASE_URL — The API endpoint for the LLM (default: HuggingFace router)
9
- MODEL_NAME — The model identifier to use for inference
10
- OPENAI_API_KEY or HF_TOKEN — API key for authentication (any provider)
11
 
12
  Usage:
13
- export API_BASE_URL=https://router.huggingface.co/v1
14
- export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
15
- export OPENAI_API_KEY=hf_xxxx # or HF_TOKEN=hf_xxxx
16
  python inference.py
17
- # or: python inference.py --fast-mode --episodes 1
 
 
 
 
 
 
 
 
18
  """
19
 
20
  from __future__ import annotations
@@ -29,11 +35,18 @@ from typing import Any
29
  import requests
30
  from openai import OpenAI
31
 
 
 
 
 
 
 
 
32
  # ── Constants ──────────────────────────────────────────────────────────────
33
 
34
  ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
35
- MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.1-8B-Instruct")
36
- API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
37
  # Accept OPENAI_API_KEY (hackathon standard) or HF_TOKEN (HuggingFace convention)
38
  OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "") or os.getenv("HF_TOKEN", "")
39
  DEFAULT_EPISODES = 1
 
4
  Runs an LLM agent against all 3 tasks for N episodes each.
5
  Uses the OpenAI Python client pointed at any OpenAI-compatible endpoint.
6
 
7
+ Required environment variables (set in .env or shell):
8
+ API_BASE_URL — The API endpoint for the LLM (default: OpenRouter)
9
+ MODEL_NAME — The model identifier to use for inference
10
+ OPENAI_API_KEY — API key for authentication (works with any provider)
11
 
12
  Usage:
13
+ # Option 1: Use .env file (recommended — just paste your key)
 
 
14
  python inference.py
15
+
16
+ # Option 2: Set env vars manually
17
+ export API_BASE_URL=https://openrouter.ai/api/v1
18
+ export MODEL_NAME=meta-llama/llama-3.1-8b-instruct:free
19
+ export OPENAI_API_KEY=sk-or-v1-xxxx
20
+ python inference.py
21
+
22
+ # Option 3: Fast mode (no LLM, heuristic only)
23
+ python inference.py --fast-mode --episodes 1
24
  """
25
 
26
  from __future__ import annotations
 
35
  import requests
36
  from openai import OpenAI
37
 
38
+ # ── Load .env file (if present) ────────────────────────────────────────────
39
+ try:
40
+ from dotenv import load_dotenv
41
+ load_dotenv() # reads .env from current directory or project root
42
+ except ImportError:
43
+ pass # python-dotenv not installed — env vars must be set manually
44
+
45
  # ── Constants ──────────────────────────────────────────────────────────────
46
 
47
  ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
48
+ MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/llama-3.3-70b-instruct:free")
49
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
50
  # Accept OPENAI_API_KEY (hackathon standard) or HF_TOKEN (HuggingFace convention)
51
  OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "") or os.getenv("HF_TOKEN", "")
52
  DEFAULT_EPISODES = 1
python/requirements.txt CHANGED
@@ -5,3 +5,4 @@ pydantic>=2.0.0
5
  requests>=2.31.0
6
  httpx>=0.24.0
7
  pytest>=7.0.0
 
 
5
  requests>=2.31.0
6
  httpx>=0.24.0
7
  pytest>=7.0.0
8
+ python-dotenv>=1.0.0