adityss commited on
Commit
4c68447
·
1 Parent(s): 5c4340e

Refactor project structure and remove unused files; update inference script for clarity and compliance with environment variable handling.

Browse files
.gitignore CHANGED
@@ -1,20 +1,63 @@
1
  # Secrets
2
  .env
 
3
 
4
  # Python
5
  __pycache__/
6
  *.pyc
7
  *.pyo
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  .pytest_cache/
9
 
10
  # Go
11
  *.exe
12
  gridmind-out.exe
 
 
13
 
14
  # IDE
15
  .vscode/
16
  .idea/
 
 
 
17
 
18
  # OS
19
  .DS_Store
20
  Thumbs.db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Secrets
2
  .env
3
+ !.env.example
4
 
5
  # Python
6
  __pycache__/
7
  *.pyc
8
  *.pyo
9
+ *.so
10
+ .Python
11
+ env/
12
+ venv/
13
+ ENV/
14
+ build/
15
+ develop-eggs/
16
+ dist/
17
+ downloads/
18
+ eggs/
19
+ .eggs/
20
+ lib/
21
+ lib64/
22
+ parts/
23
+ sdist/
24
+ var/
25
+ wheels/
26
+ *.egg-info/
27
+ .installed.cfg
28
+ *.egg
29
  .pytest_cache/
30
 
31
  # Go
32
  *.exe
33
  gridmind-out.exe
34
+ gridmind-test.exe
35
+ gridmind-server
36
 
37
  # IDE
38
  .vscode/
39
  .idea/
40
+ *.swp
41
+ *.swo
42
+ *~
43
 
44
  # OS
45
  .DS_Store
46
  Thumbs.db
47
+
48
+ # Build and test artifacts
49
+ build/
50
+ dist/
51
+ *.log
52
+ *.orig
53
+ test_output.json
54
+ test_suite_results.json
55
+
56
+ # Old architecture (removed)
57
+ server/
58
+ embed/
59
+ gridmind_rl.egg-info/
60
+ HACKATHON_COMPLIANCE_REPORT.md
61
+ STRICT_QA_COMPLIANCE_REPORT.md
62
+ qa_strict_tests.py
63
+ qa_test_suite.py
README.md CHANGED
@@ -1,626 +1,240 @@
1
- # 🏢 GridMind-RL — Energy Management Reinforcement Learning Environment
2
 
3
- **A real-world RL environment for intelligent building energy optimization.** Control HVAC systems, thermal storage, batch job scheduling, and demand-response under stochastic electricity prices and grid stress events.
4
 
5
- Built on the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) specification. Containerized. Ready for Hugging Face Spaces deployment.
 
 
 
 
6
 
7
  ---
8
 
9
- ## 📖 Overview & Motivation
10
 
11
- Building energy management is a **real-world optimization problem** facing utilities, facility operators, and industrial sites globally. Traditional rule-based controls waste billions in energy costs and miss opportunities for grid participation.
12
 
13
- **GridMind-RL** simulates decisions that facility operators must make daily:
 
 
 
 
 
 
14
 
15
- - **Cost Optimization** Buy electricity when prices are low, avoid peak surcharges
16
- - **Comfort & Safety** — Maintain indoor temperature within acceptable ranges while managing thermal inertia
17
- - **Grid Participation** — Respond to demand-response signals and grid stress events
18
- - **Batch Scheduling** — Coordinate industrial process timings to meet deadlines and minimize energy cost
19
- - **Carbon Minimization** — Shift consumption to periods when grid carbon intensity is low
20
-
21
- **Why this matters:** An RL agent trained in this environment can learn strategies that would be difficult or impossible for humans to hand-craft. The combination of continuous control (HVAC power, thermal storage), discrete decisions (batch scheduling), and multiple simultaneous objectives (cost, comfort, grid, deadlines, carbon) creates a realistic, challenging benchmark.
22
-
23
- **Episode Length:** 96 steps = 24 hours at 15-minute resolution. A complete episode requires strategic decision-making across a full day-night cycle.
24
-
25
- ---
26
-
27
- ## � Observation Space
28
-
29
- At each timestep, the environment provides the following observations. **Episode length: 96 steps** (15-minute intervals = 24 hours).
30
-
31
- | Field | Data Type | Range / Values | Description |
32
- |-------|-----------|-----------------|-------------|
33
- | `indoor_temperature` | float | 10–40 °C | Current building interior temperature |
34
- | `thermal_storage_level` | float | 0.0–1.0 | Thermal tank charge state (0 = empty, 1 = full) |
35
- | `process_demand` | float | ≥ 0 kW | Current industrial batch process power draw |
36
- | `current_price` | float | > 0 $/kWh | Real-time spot electricity price |
37
- | `grid_stress_signal` | float | 0.0–1.0 | Utility demand-response urgency (0.7+ = critical) |
38
- | `carbon_intensity` | float | ≥ 0 gCO₂/kWh | Current grid carbon intensity |
39
- | `hour_of_day` | int | 0–23 | Time-of-day context |
40
- | `batch_queue` | int array | — | Pending batch jobs with deadline slots |
41
- | `cumulative_cost` | float | ≥ 0 $ | Energy cost accumulated in current episode so far |
42
- | `step` | int | 0–95 | Current timestep (96 total = 24 hours) |
43
- | `building_id` | int | 0+ | Building identifier (for multi-building scenarios) |
44
-
45
- **Observation Properties:**
46
- - Observations are **deterministic** given the seed — same seed produces identical sequences
47
- - All fields are **normalized or bounded** for stable learning
48
- - Prices follow realistic time-of-use patterns; carbon intensity varies with grid mix
49
- - Batch queue starts empty; jobs appear stochastically based on the task/seed
50
-
51
- ---
52
-
53
- ## 🎮 Action Space
54
-
55
- At each step, the agent sends an action controlling four independent subsystems:
56
-
57
- | Field | Data Type | Range | Description |
58
- |-------|-----------|-------|-------------|
59
- | `hvac_power_level` | float | 0.0–1.0 | HVAC system power (0 = off, 1 = full) |
60
- | `thermal_charge_rate` | float | -1.0–1.0 | Thermal storage control (+charge, -discharge) |
61
- | `batch_job_slot` | int | 0–4 | Schedule next batch job: 0=immediate, 1–4=defer |
62
- | `load_shed_fraction` | float | 0.0–0.5 | Non-critical load reduction (0–50%) for demand-response |
63
- | `building_id` | int | 0+ | Building identifier (routing) |
64
-
65
- **Action Space Properties:**
66
- - **Continuous** (HVAC, thermal charging, load shedding) + **discrete** (batch scheduling) → hybrid control
67
- - Actions are applied every 15-minute step
68
- - Load shedding is capped at 50% to ensure safety/habitability
69
- - Batch scheduling decisions affect energy cost and deadline compliance
70
 
71
  ---
72
 
73
- ## 💡 Reward Function
74
-
75
- The environment provides **dense rewards every step** (not sparse, not binary). Each step returns:
76
- - A scalar reward (sum of components)
77
- - A dictionary of 7 weighted sub-components for transparency
78
 
79
- | Component | Purpose | Possible Values |
80
- |-----------|---------|-----------------|
81
- | **cost_savings** | Minimize energy bill | Negative (cost increases) to positive (savings vs baseline) |
82
- | **temp_constraint** | Maintain comfort | Gaussian bonus near 21°C, penalty outside 19–23°C bounds |
83
- | **grid_response** | Shift load during stress | Bonus proportional to shed fraction when grid signal > 0.7 |
84
- | **efficiency_bonus** | Exploit thermal storage | Reward charge/discharge timing and thermal arbitrage |
85
- | **stability_penalty** | Smooth control | Small penalty for rapid oscillations in HVAC/storage |
86
- | **deadline_penalty** | Meet job deadlines | Large penalty if batch job finishes after deadline |
87
- | **carbon_reward** | Low-carbon consumption | Bonus for consuming during low-carbon grid periods |
88
 
89
- **Example Reward Calculation:**
90
- If an agent takes a well-timed action during high-price, high-stress period:
91
- - Large positive `cost_savings` (avoided expensive hour)
92
- - Positive `grid_response` (shed load successfully)
93
- - Possible positive `carbon_reward` (if grid is clean)
94
- - **Total step reward** = weighted sum of all components
95
-
96
- This multi-objective reward structure encourages **learning tradeoffs** between cost, comfort, grid support, and carbon efficiency.
 
 
97
 
98
- ---
 
 
 
 
99
 
100
  ---
101
 
102
- ## 📋 Tasks & Difficulty Levels
103
 
104
- Three independent tasks with **deterministic programmatic graders**. Scores range **0.0–1.0**; higher is better.
105
 
106
- ### Task 1 Cost Minimization (🟢 Easy)
 
 
 
 
 
 
 
 
 
 
 
 
107
 
108
- **Objective:** Minimize total energy cost in 24 hours with no other constraints.
109
 
110
- **Difficulty Rationale:** Only one objective (cost) to optimize; temperature and grid constraints are relaxed.
 
 
 
 
 
 
111
 
112
- **Grader Metrics:**
113
- - **Cost score (100%)** — Compares total episode energy cost to a deterministic baseline. Higher savings → higher score.
114
 
115
- **Baseline Score:** **0.7063**
 
 
 
 
 
 
 
 
116
 
117
  ---
118
 
119
- ### Task 2 — Constrained Temperature Control (🟡 Medium)
120
-
121
- **Objective:** Minimize cost while maintaining indoor temperature between **19–23°C** throughout the episode.
122
-
123
- **Difficulty Rationale:** Introduces a hard constraint (temperature bounds). Agent must use thermal storage strategically to meet both cost and comfort goals.
124
 
125
- **Grader Metrics:**
126
- - **Cost score (60%)** — Total energy cost vs baseline
127
- - **Temperature score (40%)** Fraction of steps within bounds (hard penalty for violations)
 
 
128
 
129
- **Notes:** A naive agent might achieve low cost by disabling HVAC, but then temperatures drift out of bounds (0 score). Trade-off learning is required.
130
-
131
- **Baseline Score:** **0.6333**
132
 
133
  ---
134
 
135
- ### Task 3 — Full Demand Response (🔴 Hard)
136
-
137
- **Objective:** Minimize cost, maintain temperature, respond to grid events, complete batch jobs on time, and minimize carbon emissions. This is a **multi-objective constraint satisfaction** problem.
138
-
139
- **Difficulty Rationale:** Most realistic. Agent must balance five competing objectives simultaneously; any single failure is costly.
140
-
141
- **Grader Metrics:**
142
- - **Cost score (28%)** — Energy cost
143
- - **Temperature score (20%)** — Time within comfort bounds
144
- - **Grid response score (20%)** — Load shed during demand-response events (signal > 0.7)
145
- - **Batch deadline score (12%)** — Fraction of jobs completed before deadline
146
- - **Carbon reward score (20%)** — Shift load to low-carbon periods
147
-
148
- **Baseline Breakdown:**
149
- - Cost: 0.670, Temperature: 0.573, Grid: 0.214, Batch: 1.000, Carbon: 0.657
150
- - **Overall: 0.5966**
151
-
152
- **Challenge:** Grid response score (~0.21) shows that the baseline heuristic rarely sheds load opportunistically. Learning agents should discover that quick load shedding during high-price, high-stress periods yields significant cost savings.
153
-
154
- **Grader Determinism:** Same seed always produces identical evaluations. Episodes are seeded internally; reproducible batches of evaluations can be generated for benchmark comparisons.
155
-
156
- ---
157
-
158
- ## 🚀 Setup & Usage
159
-
160
- ### Prerequisites
161
-
162
- - **Docker** — [Download Docker Desktop](https://www.docker.com/products/docker-desktop/)
163
- - **Python 3.10+** — [Download Python](https://www.python.org/downloads/)
164
- - **Git** — [Download Git](https://git-scm.com/downloads)
165
-
166
- ### Quick Start (5 minutes)
167
-
168
- #### 1. Clone the Repository
169
-
170
- ```bash
171
- git clone https://github.com/LO-Kyu/gridmind-rl.git
172
- cd gridmind-rl
173
- ```
174
 
175
- #### 2. Build and Start the Environment Server
176
 
177
  ```bash
178
  docker build -t gridmind-rl .
179
- docker run --rm -d -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl
180
  ```
181
 
182
- Verify the server is running:
183
 
 
184
  ```bash
185
- # Check health endpoint
186
- curl http://localhost:7860/health
187
- # Expected: {"status":"ok","version":"1.0.0"}
188
  ```
189
 
190
- #### 3. Install Python Dependencies
191
-
192
- Open a **new terminal** and install:
193
-
194
  ```bash
195
- pip install -r python/requirements.txt
196
- ```
197
-
198
- #### 4. Run Inference (No LLM — Fast)
199
-
200
- Run a fast, deterministic baseline using heuristic policy:
201
 
202
- ```bash
203
  python inference.py --fast-mode --episodes 1
204
- ```
205
-
206
- Expected output (sample):
207
- ```
208
- [START] task=Cost_Minimization env=gridmind model=heuristic
209
- [STEP1] step=1 action={...} reward=10.5 done=false
210
- [STEP2] step=2 action={...} reward=12.3 done=false
211
- ...
212
- [STEP96] step=96 action={...} reward=8.9 done=true
213
- [END] success=true steps=96 rewards=[10.5, 12.3, ..., 8.9]
214
- ```
215
-
216
- Results saved to: `baseline_scores.json`
217
-
218
- #### 5. (Optional) Run with LLM
219
-
220
- To use an LLM agent for decision-making:
221
-
222
- 1. Get a **free API key** from [openrouter.ai/keys](https://openrouter.ai/keys) (no credit card needed)
223
- 2. Create `.env` file (copy from `.env.example`):
224
- ```bash
225
- cp .env.example .env
226
- ```
227
- 3. Edit `.env` and add your API key:
228
- ```env
229
- HF_TOKEN=sk-or-v1-your-key-here
230
- # or
231
- OPENAI_API_KEY=sk-or-v1-your-key-here
232
- ```
233
- 4. Run with LLM:
234
- ```bash
235
- python inference.py --episodes 1
236
- ```
237
-
238
- #### 6. Stop the Server (When Done)
239
-
240
- ```bash
241
- docker stop gridmind
242
- ```
243
 
244
- ---
245
-
246
- ### Inference Script Reference
247
-
248
- The `inference.py` script (project root) is the **hackathon submission entrypoint**.
249
-
250
- **Environment Variables:**
251
-
252
- | Variable | Default | Description |
253
- |----------|---------|-------------|
254
- | `HF_TOKEN` | (required for submission) | API key for LLM provider or HF Spaces |
255
- | `OPENAI_API_KEY` | (optional fallback) | Alternative OpenAI-compatible key |
256
- | `API_BASE_URL` | `https://openrouter.ai/api/v1` | LLM endpoint URL |
257
- | `MODEL_NAME` | `meta-llama/llama-3.3-70b-instruct:free` | Model identifier |
258
- | `ENV_URL` | `http://localhost:7860` | Environment server address |
259
-
260
- **Command-Line Flags:**
261
-
262
- | Flag | Default | Description |
263
- |------|---------|-------------|
264
- | `--episodes N` | 1 | Episodes per task (runs tasks 1, 2, 3 in sequence) |
265
- | `--fast-mode` | off | Don't call LLM; use heuristic policy only (reproducible, no API calls) |
266
- | `--llm-every N` | 4 | Reuse each LLM decision for N steps (reduces API calls) |
267
- | `--max-steps N` | 96 | Stop episode early after N steps |
268
- | `--env-url URL` | from env var | Override environment server URL |
269
- | `--output FILE` | `baseline_scores.json` | Output results filename |
270
- | `--verbose` | off | Print detailed logs for each step |
271
-
272
- **Examples:**
273
-
274
- ```bash
275
- # Run all 3 tasks with LLM (1 episode each)
276
  python inference.py --episodes 1
277
-
278
- # Reproduce baseline fast (no LLM)
279
- python inference.py --fast-mode --episodes 1
280
-
281
- # Only Task 2, heuristic, verbose output
282
- python inference.py --fast-mode --episodes 1 --verbose
283
-
284
- # Run 5 episodes per task with custom environment
285
- python inference.py --episodes 5 --env-url http://my-server:7860
286
  ```
287
 
288
- ---
289
-
290
- ### HTTP API Reference
291
-
292
- **Base URL:** `http://localhost:7860`
293
-
294
- | Endpoint | Method | Purpose | Example Body |
295
- |----------|--------|---------|---------------|
296
- | `/health` | GET | Liveness check | — |
297
- | `/ping` | GET | Lightweight ping | — |
298
- | `/reset` | POST | Reset episode for a task | `{"task_id": 1, "seed": 42}` |
299
- | `/step` | POST | Apply action, get next observation | `{"hvac_power_level": 0.5, "thermal_charge_rate": 0.1, ...}` |
300
- | `/state` | GET | Current full state snapshot | — |
301
- | `/grade` | GET | Episode score (0.0–1.0) with sub-scores | — |
302
- | `/replay` | GET | Full step-by-step trajectory | — |
303
- | `/tasks` | GET | Task definitions and grader weights | — |
304
- | `/metrics` | GET | Prometheus-format metrics | — |
305
-
306
- **Example Workflow:**
307
 
308
- ```bash
309
- # 1. Reset to Task 1 with seed 42
310
- curl -X POST http://localhost:7860/reset \
311
- -H "Content-Type: application/json" \
312
- -d '{"task_id": 1, "seed": 42}'
313
-
314
- # 2. Get initial observation
315
- curl http://localhost:7860/state
316
-
317
- # 3. Take an action
318
- curl -X POST http://localhost:7860/step \
319
- -H "Content-Type: application/json" \
320
- -d '{
321
- "hvac_power_level": 0.5,
322
- "thermal_charge_rate": 0.1,
323
- "batch_job_slot": 1,
324
- "load_shed_fraction": 0.0
325
- }'
326
-
327
- # 4. Check final score after episode completes
328
- curl http://localhost:7860/grade
329
- ```
330
 
331
  ---
332
 
333
- ## 📊 Baseline Performance Scores
334
-
335
- The baseline is a **heuristic policy** (rule-based, no LLM) representing a reasonable but non-optimized control strategy. Your RL agent should aim to exceed these scores.
336
-
337
- **Baseline Run:** `python inference.py --fast-mode --episodes 1`
338
-
339
- ### Summary Scores
340
-
341
- | Task | Difficulty | Score | Model |
342
- |------|:----------:|:-----:|-------|
343
- | Task 1 — Cost Minimization | 🟢 Easy | **0.7063** | Heuristic |
344
- | Task 2 — Temperature Control | 🟡 Medium | **0.6333** | Heuristic |
345
- | Task 3 — Full Demand Response | 🔴 Hard | **0.5966** | Heuristic |
346
- | **Overall Average** | — | **0.6454** | Heuristic |
347
 
348
- ### Detailed Breakdown
349
 
350
- #### Task 1 Results
351
- - **Task:** Cost minimization (96 hours × 15 min = 24 hours)
352
- - **Score:** 0.7063
353
- - **Sub-score:** Cost = 0.706
354
- - **Interpretation:** Heuristic achieves ~70% of optimal cost reduction vs baseline
355
-
356
- #### Task 2 Results
357
- - **Task:** Minimize cost while maintaining temperature 19–23°C
358
- - **Score:** 0.6333
359
- - **Sub-scores:**
360
- - Cost: 0.701
361
- - Temperature constraint: 0.531 (agent violated comfort bounds ~47% of the time)
362
- - **Interpretation:** Temperature management is challenging for the heuristic. Tighter thermal control could improve this score significantly.
363
-
364
- #### Task 3 Results (Most Interesting)
365
- - **Task:** Multi-objective: cost, temperature, grid response, batch deadlines, carbon
366
- - **Score:** 0.5966
367
- - **Sub-scores:**
368
- - Cost: 0.670
369
- - Temperature: 0.573 (similar temperature control challenge as Task 2)
370
- - **Grid response: 0.214** ← Heuristic rarely participates in demand-response
371
- - Batch deadline: 1.000 (heuristic always completes jobs on time)
372
- - Carbon: 0.657
373
-
374
- **Key Insight:** The heuristic's low grid response score (0.21) suggests that learned agents have significant room for improvement by:
375
- 1. Recognizing high-price + high-stress periods
376
- 2. Proactively shedding load to reduce cost
377
- 3. Using thermal storage to recover comfort afterward
378
-
379
- This multi-objective setting is where RL agents typically exceed heuristic baselines.
380
-
381
- ### Reproducibility & Evaluation
382
-
383
- - **Deterministic:** Baseline scores are **deterministic** — same seed always produces identical actions and rewards
384
- - **Seeding:** Each task uses a fixed base seed (1100, 1200, 1300) for reproducible evaluation
385
- - **Your Submissions:** Your agent will be evaluated on the same seed distribution; compare your scores directly to baseline
386
 
387
  ---
388
 
389
- ## 🏗️ Architecture
390
 
391
- ```
392
- ┌─────────────────────────────────────────────────────────────────┐
393
- │ inference.py (LLM Agent or Heuristic) │
394
- │ │ │
395
- │ │ HTTP: POST /reset, /step · GET /grade, /state │
396
- │ ▼ │
397
- │ ┌───────────────────────────────────────────────────────────┐ │
398
- │ │ Docker Container │ │
399
- │ │ │ │
400
- │ │ ┌─────────────────────┐ ┌───────────────────────────┐ │ │
401
- │ │ │ Go Environment │ │ Python Dashboard │ │ │
402
- │ │ │ Server (:7860) │ │ FastAPI + UI (:7861) │ │ │
403
- │ │ │ │ │ │ │ │
404
- │ │ │ • Physics engine │ │ • Proxies /api → :7860 │ │ │
405
- │ │ │ • Reward function │◄──│ • Real-time charts │ │ │
406
- │ │ │ • Task graders │ │ • State visualization │ │ │
407
- │ │ └─────────────────────┘ └───────────────────────────┘ │ │
408
- │ │ │ │
409
- │ │ Isolated · Reproducible · Non-root user │ │
410
- │ └───────────────────────────────────────────────────────────┘ │
411
- └─────────────────────────────────────────────────────────────────┘
412
- ```
413
 
414
- ### Project Structure
 
 
 
 
415
 
416
- ```
417
- gridmind/
418
- ├── inference.py ← Hackathon entrypoint (root)
419
- ├── openenv.yaml ← OpenEnv spec manifest
420
- ├── Dockerfile ← Multi-stage build (Go + Python)
421
- ├── .env ← API credentials (git-ignored)
422
- ├── baseline_scores.json ← Produced by inference.py
423
-
424
- ├── main.go ← HTTP server (routes, middleware, metrics)
425
- ├── env/ ← Core environment logic (Go)
426
- │ ├── environment.go ← Simulation: physics, thermal dynamics
427
- │ ├── models.go ← All data types (Observation, Action, etc.)
428
- │ ├── rewards.go ← 7-component dense reward function
429
- │ └── tasks.go ← 3 task definitions + deterministic graders
430
-
431
- ├── python/ ← Python support layer
432
- │ ├── inference.py ← Full LLM agent + heuristic fallback
433
- │ ├── models.py ← Typed Pydantic models (mirrors Go structs)
434
- │ ├── validate.py ← OpenEnv spec validation suite
435
- │ └── requirements.txt ← Python dependencies
436
-
437
- ├── tests/ ← Automated tests
438
- │ ├── environment_test.go ← Go unit tests (determinism, bounds, etc.)
439
- │ └── test_graders.py ← Python grader tests (pytest)
440
-
441
- └── dashboard/ ← Optional web dashboard
442
- ├── server.py ← FastAPI server
443
- └── static/ ← Frontend assets
444
- ```
445
 
446
  ---
447
 
448
- ## 🐳 Docker
449
-
450
- | Action | Command |
451
- |--------|---------|
452
- | **Build** | `docker build -t gridmind-rl .` |
453
- | **Run (foreground)** | `docker run --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
454
- | **Run (background)** | `docker run --rm -d -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
455
- | **Stop** | `docker stop gridmind` |
456
- | **Run inference inside container** | `docker exec -it gridmind python /app/inference.py --fast-mode` |
457
-
458
- The Dockerfile uses a **multi-stage build**:
459
- 1. **Stage 1** — Go 1.21 Alpine: compiles the environment server binary
460
- 2. **Stage 2** — Python 3.11 slim: runs the Go binary + Python dashboard via Supervisor
461
-
462
- ---
463
-
464
- ## ☁️ Hugging Face Space Deployment
465
-
466
- ### 1. Create a New Space
467
-
468
- Go to [huggingface.co/new-space](https://huggingface.co/new-space):
469
- - **SDK:** Docker
470
- - **Hardware:** CPU Basic (2 vCPU, 16 GB — free tier)
471
-
472
- ### 2. Push to HF
473
 
474
- ```bash
475
- git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/gridmind-rl
476
- git push hf main
477
  ```
478
-
479
- ### 3. Verify
480
-
481
- ```bash
482
- curl https://YOUR_USERNAME-gridmind-rl.hf.space/health
483
- # → {"status":"ok","version":"1.0.0"}
484
-
485
- curl -X POST https://YOUR_USERNAME-gridmind-rl.hf.space/reset \
486
- -H "Content-Type: application/json" \
487
- -d '{"task_id":1,"seed":42}'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
488
  ```
489
 
490
- > **Note:** HF Spaces exposes port **7860** publicly. The dashboard (7861) is for local development only.
491
-
492
  ---
493
 
494
- ## 🧪 Testing
495
 
496
- ### Run Go Unit Tests
497
 
498
  ```bash
499
- cd gridmind
500
- go test ./tests/ -v
501
- ```
502
-
503
- ### Run Python Grader Tests (requires server running)
504
 
505
- ```bash
506
  pytest tests/test_graders.py -v
507
  ```
508
 
509
- ### Run Full OpenEnv Validation
510
 
511
  ```bash
512
- python python/validate.py --env-url http://localhost:7860
513
- ```
514
-
515
- ---
516
-
517
- ## 📝 Inference Script Reference
518
-
519
- The `inference.py` script at the project root is the **hackathon entrypoint**.
520
-
521
- ### Environment Variables
522
-
523
- | Variable | Default | Description |
524
- |----------|---------|-------------|
525
- | `API_BASE_URL` | `https://openrouter.ai/api/v1` | LLM API endpoint |
526
- | `MODEL_NAME` | `meta-llama/llama-3.1-8b-instruct:free` | Model to use |
527
- | `OPENAI_API_KEY` | — | API key (any OpenAI-compatible provider) |
528
- | `ENV_URL` | `http://localhost:7860` | Environment server URL |
529
-
530
- ### Command-Line Flags
531
-
532
- | Flag | Default | Description |
533
- |------|---------|-------------|
534
- | `--episodes N` | 1 | Episodes per task (tasks 1–3 run in sequence) |
535
- | `--fast-mode` | off | Use heuristic policy only (no LLM, fully reproducible) |
536
- | `--llm-every N` | 4 | Reuse each LLM action for N steps (reduces API calls) |
537
- | `--max-steps N` | 96 | Stop early after N steps |
538
- | `--env-url URL` | from env | Override environment URL |
539
- | `--output FILE` | `baseline_scores.json` | Output results file |
540
- | `--verbose` | off | Print detailed step logs |
541
-
542
- ### Stdout Log Format
543
-
544
- Each episode emits structured markers for automated evaluation:
545
-
546
- ```
547
- [START]
548
- [STEP1]
549
- [STEP2]
550
- ...
551
- [STEP96]
552
- [END]
553
  ```
554
 
555
  ---
556
 
557
- ## ✅ OpenEnv Specification Compliance
558
-
559
- GridMind-RL fully implements the OpenEnv specification for standardized RL environments. All components are present and tested:
560
 
561
- | Requirement | Status | Notes |
562
- |-------------|:------:|-------|
563
- | Manifest (`openenv.yaml`) | ✅ | All metadata, schema definitions, and version info |
564
- | Observation Schema | ✅ | 11-field object: temperature, storage, price, grid signal, carbon, hour, batch queue, cost, step, building_id |
565
- | Action Schema | ✅ | 5-field object: HVAC, thermal rate, batch slot, load shed, building_id |
566
- | HTTP Endpoints | ✅ | `/reset`, `/step`, `/state`, `/grade`, `/replay`, `/tasks`, `/health`, `/metrics` |
567
- | Determinism | ✅ | Seeded episode generation; identical seeds produce identical trajectories |
568
- | Typed Models | ✅ | Pydantic models (Python) mirror Go structs exactly |
569
- | Dense Rewards | ✅ | 7-component reward breakdown every step |
570
- | Graders | ✅ | 3 tasks with programmatic, deterministic graders (0.0–1.0 range) |
571
- | Exploit Detection | ✅ | Built into grading pipeline to flag unrealistic scores |
572
 
573
  ---
574
 
575
- ## FAQ
576
-
577
- **Q: Can I use a different model?**
578
- A: Yes. Set `MODEL_NAME` environment variable to any OpenAI-compatible model. The default (`meta-llama/llama-3.3-70b-instruct:free`) is free on OpenRouter with no credit card.
579
-
580
- **Q: How do I avoid rate limiting?**
581
- A: (1) Use `--fast-mode` for local testing (no API calls), (2) Set `--llm-every 4` to reuse decisions, (3) Use a paid API tier for submission, or (4) Train & submit an offline policy.
582
-
583
- **Q: Will my API key be exposed in submissions?**
584
- A: No. Store your API key in `.env` (git-ignored). On HF Spaces, set secrets via the Space settings UI; keys are never committed to the repo.
585
-
586
- **Q: What's the difference between `HF_TOKEN` and `OPENAI_API_KEY`?**
587
- A: `HF_TOKEN` is used in HF Space deployments and external evaluations. `OPENAI_API_KEY` is a fallback for local development. The code tries `HF_TOKEN` first, then `OPENAI_API_KEY`. At least one must be set.
588
-
589
- **Q: Can I submit an offline/trained policy?**
590
- A: Yes. Modify `python/inference.py` to use your trained agent instead of LLM calls. Ensure you still output the required `[START]`, `[STEP]`, `[END]` format.
591
-
592
- **Q: What if my submission times out?**
593
- A: Each episode is 96 steps. The environment runs 3 episodes (one per task). Optimize for latency: reduce LLM calls (use `--llm-every`), use a faster model, or submit a heuristic/trained offline policy.
594
-
595
- ---
596
-
597
- ## 🎯 Submission Checklist
598
-
599
- Before submitting, verify:
600
-
601
- - [ ] Clone repo, build Docker, run `docker run -p 7860:7860 -p 7861:7861 gridmind-rl`
602
- - [ ] Run `python inference.py --fast-mode --episodes 1` locally — should produce `baseline_scores.json`
603
- - [ ] Check `[START]`, `[STEP]`, `[END]` markers in stdout
604
- - [ ] Set `HF_TOKEN` or `OPENAI_API_KEY` in `.env` for LLM runs
605
- - [ ] Test with LLM: `python inference.py --episodes 1`
606
- - [ ] Verify Dockerfile builds without errors: `docker build -t gridmind-rl .`
607
- - [ ] Create HF Space (Docker SDK, CPU Basic)
608
- - [ ] Push repo to HF Space: `git push hf main`
609
- - [ ] Set secrets in HF Space UI: `HF_TOKEN`, `API_BASE_URL` (optional), `MODEL_NAME` (optional)
610
- - [ ] Verify Space is running: `curl https://YOUR_USERNAME-gridmind-rl.hf.space/health`
611
- - [ ] Submit Space URL to hackathon organizers
612
-
613
- ---
614
-
615
- ## 📚 Additional Resources
616
-
617
- - **OpenEnv Spec:** https://github.com/meta-pytorch/OpenEnv
618
- - **OpenRouter Free Models:** https://openrouter.ai/keys
619
- - **HF Spaces Docs:** https://huggingface.co/docs/hub/spaces
620
- - **GridMind Repository:** https://github.com/LO-Kyu/gridmind-rl
621
-
622
- ---
623
-
624
- ## 📄 License
625
-
626
- See `LICENSE` in the repository.
 
1
+ # GridMind-RL
2
 
3
+ **Industrial building energy management reinforcement learning environment**
4
 
5
+ [![OpenEnv Compatible](https://img.shields.io/badge/OpenEnv-Compatible-blue)](https://openenv.org/)
6
+ [![Go 1.21](https://img.shields.io/badge/Go-1.21-00ADD8)](https://golang.org/)
7
+ [![Python 3.11](https://img.shields.io/badge/Python-3.11+-3776ab)](https://www.python.org/)
8
+ [![Docker Ready](https://img.shields.io/badge/Docker-Ready-2496ED)](https://www.docker.com/)
9
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
10
 
11
  ---
12
 
13
+ ## Overview
14
 
15
+ GridMind-RL is a reinforcement learning environment for training and evaluating intelligent control policies in industrial building energy management. The environment simulates realistic HVAC control, thermal storage management, batch job scheduling, and demand response scenarios under stochastic electricity pricing and grid stress events.
16
 
17
+ **Key challenges solved by the environment:**
18
+ - **Cost minimization**: Navigate complex electricity pricing curves across 24-hour periods
19
+ - **Comfort maintenance**: Keep indoor temperature within comfort bounds while optimizing cost
20
+ - **Grid responsiveness**: Respond to grid stress signals with intelligent load shedding
21
+ - **Carbon reduction**: Minimize grid carbon intensity through demand response
22
+ - **Batch scheduling**: Schedule compute-intensive batch jobs optimally
23
+ - **Storage management**: Efficiently use thermal storage for load shifting
24
 
25
+ This environment is ideal for training deep reinforcement learning agents, testing heuristic policies, and benchmarking control algorithms. It provides dense reward signals enabling efficient policy learning.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ---
28
 
29
+ ## Architecture
 
 
 
 
30
 
31
+ GridMind-RL consists of three tightly integrated components:
 
 
 
 
 
 
 
 
32
 
33
+ ```
34
+ Agent (python/inference.py)
35
+ ?? HTTP POST /step, /reset, /grade
36
+ ?
37
+ Go Environment Server (main.go) Port 7860
38
+ ?
39
+ Physics Engine (env/environment.go) + Rewards (env/rewards.go) + Tasks (env/tasks.go)
40
+ ?
41
+ Web Dashboard (dashboard/server.py) � Port 7861
42
+ ```
43
 
44
+ **Design philosophy:**
45
+ - **Separation of concerns**: Physics engine (Go) decoupled from policy layer (Python)
46
+ - **OpenEnv compliance**: Standardized REST API enables any language agent
47
+ - **Deterministic simulation**: Seeded RNG for reproducible experiments
48
+ - **Dense rewards**: 7-component reward for effective learning
49
 
50
  ---
51
 
52
+ ## Environment Specification
53
 
54
+ ### Observation Space (11 fields)
55
 
56
+ | Field | Type | Range | Description |
57
+ |-------|------|-------|-------------|
58
+ | `indoor_temperature` | float | [15-27] �C | Building indoor temperature |
59
+ | `thermal_storage_level` | float | [0-1] | Thermal storage charge (0=empty, 1=full) |
60
+ | `process_demand` | float | [5-50] kW | Baseline demand |
61
+ | `current_price` | float | [0.03-0.25] $/kWh | Electricity price |
62
+ | `grid_stress_signal` | float | [0-1] | Grid stress (>0.7 = critical) |
63
+ | `carbon_intensity` | float | [50-800] gCO2/kWh | Grid carbon intensity |
64
+ | `hour_of_day` | int | [0-23] | Time of day |
65
+ | `batch_queue` | list | Up to 10 items | Batch job deadlines |
66
+ | `cumulative_cost` | float | [0-1000] $ | Total cost this episode |
67
+ | `step` | int | [0-95] | Current step (96 steps = 24 hours) |
68
+ | `building_id` | int | {0} | Building identifier |
69
 
70
+ ### Action Space (5 fields)
71
 
72
+ | Field | Type | Range | Description |
73
+ |-------|------|-------|-------------|
74
+ | `hvac_power_level` | float | [0-1] | HVAC power (0=off, 1=max) |
75
+ | `thermal_charge_rate` | float | [-1-1] | Storage charge/discharge rate |
76
+ | `batch_job_slot` | int | [0-4] | Batch job scheduling slot |
77
+ | `load_shed_fraction` | float | [0-0.5] | Load shedding fraction |
78
+ | `building_id` | int | {0} | Building identifier |
79
 
80
+ ### Reward Function (7 Components)
 
81
 
82
+ | Component | Description |
83
+ |-----------|-------------|
84
+ | **Cost Savings** | Negative cost per energy consumed |
85
+ | **Temperature Constraint** | Penalty if T outside [19-23]�C |
86
+ | **Grid Response** | Bonus for load shedding during stress |
87
+ | **Deadline Penalty** | Penalty for missed batch deadlines |
88
+ | **Efficiency Bonus** | Bonus for off-peak charging |
89
+ | **Stability Penalty** | Penalty for rapid control changes |
90
+ | **Carbon Reward** | Bonus for low-carbon periods |
91
 
92
  ---
93
 
94
+ ## Tasks
 
 
 
 
95
 
96
+ | Task | Difficulty | Objective | Baseline Score |
97
+ |------|-----------|-----------|-----------------|
98
+ | Task 1 | Easy | Minimize cost only | **0.708** |
99
+ | Task 2 | Medium | Minimize cost + maintain comfort | **0.633** |
100
+ | Task 3 | Hard | Full demand response + scheduling | **0.598** |
101
 
102
+ **Task 1 (Easy)**: Cost minimization, no constraints
103
+ **Task 2 (Medium)**: Cost + temperature comfort (19-23�C)
104
+ **Task 3 (Hard)**: Cost + comfort + grid response + batch scheduling + carbon
105
 
106
  ---
107
 
108
+ ## Quickstart
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
 
110
+ ### Docker (Recommended)
111
 
112
  ```bash
113
  docker build -t gridmind-rl .
114
+ docker run -p 7860:7860 -p 7861:7861 gridmind-rl
115
  ```
116
 
117
+ ### Local Development
118
 
119
+ **Terminal 1: Start Go server**
120
  ```bash
121
+ go run main.go
 
 
122
  ```
123
 
124
+ **Terminal 2: Run agent**
 
 
 
125
  ```bash
126
+ export HF_TOKEN="your_api_key"
127
+ export API_BASE_URL="https://openrouter.ai/api/v1"
128
+ export MODEL_NAME="meta-llama/llama-3.3-70b-instruct:free"
 
 
 
129
 
130
+ # Heuristic policy (no LLM)
131
  python inference.py --fast-mode --episodes 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
 
133
+ # LLM agent
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
  python inference.py --episodes 1
 
 
 
 
 
 
 
 
 
135
  ```
136
 
137
+ ### Environment Variables
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
+ | Variable | Required | Default | Description |
140
+ |----------|----------|---------|-------------|
141
+ | `HF_TOKEN` | Yes | � | LLM API key |
142
+ | `API_BASE_URL` | No | `https://openrouter.ai/api/v1` | LLM endpoint |
143
+ | `MODEL_NAME` | No | `meta-llama/llama-3.3-70b-instruct:free` | Model ID |
144
+ | `ENV_URL` | No | `http://localhost:7860` | Environment server URL |
145
+ | `OPENAI_API_KEY` | No | � | Alternative to HF_TOKEN |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
 
147
  ---
148
 
149
+ ## API Reference
 
 
 
 
 
 
 
 
 
 
 
 
 
150
 
151
+ All endpoints on port 7860 (OpenEnv standard).
152
 
153
+ | Method | Endpoint | Description |
154
+ |--------|----------|-------------|
155
+ | `GET` | `/health` | Health check |
156
+ | `GET` | `/ping` | Liveness probe |
157
+ | `POST` | `/reset` | Start new episode |
158
+ | `POST` | `/step` | Take action step |
159
+ | `GET` | `/state` | Get current state |
160
+ | `GET` | `/grade` | Grade episode (0.0-1.0 score) |
161
+ | `GET` | `/tasks` | Available tasks |
162
+ | `GET` | `/metrics` | System metrics |
163
+ | `GET` | `/replay` | Episode history |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
164
 
165
  ---
166
 
167
+ ## Baseline Performance
168
 
169
+ Reference heuristic policy scores (rule-based, deterministic):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
170
 
171
+ | Task | Score | Policy |
172
+ |------|-------|--------|
173
+ | Task 1 | 0.708 | Simple load-shifting heuristic |
174
+ | Task 2 | 0.633 | Temperature-aware heuristic |
175
+ | Task 3 | 0.598 | Full demand response heuristic |
176
 
177
+ LLM and RL agents are expected to exceed these scores.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
 
179
  ---
180
 
181
+ ## Project Structure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
 
 
 
 
183
  ```
184
+ gridmind-rl/
185
+ +-- main.go # HTTP server & OpenEnv API
186
+ +-- inference.py # Agent entry point
187
+ +-- openenv.yaml # OpenEnv spec
188
+ +-- Dockerfile # Container build
189
+ +-- env/
190
+ � +-- environment.go # Physics simulation
191
+ � +-- models.go # Data models
192
+ � +-- rewards.go # Reward computation
193
+ � +-- tasks.go # Task grading
194
+ +-- python/
195
+ � +-- inference.py # LLM agent
196
+ � +-- models.py # Pydantic models
197
+ � +-- requirements.txt
198
+ +-- dashboard/
199
+ � +-- server.py # Web server (port 7861)
200
+ � +-- static/ # Frontend assets
201
+ +-- data/
202
+ � +-- price_curves.json # Price data
203
+ � +-- generate_prices.py # Price generator
204
+ +-- tests/
205
+ � +-- test_graders.py # Python tests
206
+ � +-- environment_test.go # Go tests
207
+ +-- baseline_scores.json # Reference scores
208
+ +-- .env.example # Environment template
209
+ +-- LICENSE # MIT License
210
  ```
211
 
 
 
212
  ---
213
 
214
+ ## Development
215
 
216
+ ### Running Tests
217
 
218
  ```bash
219
+ # Go tests
220
+ go test ./tests/... -v
 
 
 
221
 
222
+ # Python tests (requires server running on 7860)
223
  pytest tests/test_graders.py -v
224
  ```
225
 
226
+ ### Rebuilding Price Data
227
 
228
  ```bash
229
+ python data/generate_prices.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
230
  ```
231
 
232
  ---
233
 
234
+ ## License
 
 
235
 
236
+ MIT License. See [LICENSE](LICENSE) file.
 
 
 
 
 
 
 
 
 
 
237
 
238
  ---
239
 
240
+ **Questions?** Open an issue on GitHub.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__pycache__/inference.cpython-311.pyc DELETED
Binary file (691 Bytes)
 
dashboard/__pycache__/server.cpython-311.pyc DELETED
Binary file (5.56 kB)
 
dashboard/__pycache__/server.cpython-314.pyc DELETED
Binary file (5.09 kB)
 
data/generate_prices.py CHANGED
@@ -1,4 +1,9 @@
1
- """Generate 30 days of realistic ISO New England-style hourly price data."""
 
 
 
 
 
2
  import json
3
  import math
4
  import random
 
1
+ """
2
+ One-time script used to generate data/price_curves.json
3
+ Generate 30 days of realistic ISO New England-style hourly price data.
4
+ Run: python data/generate_prices.py
5
+ Output: data/price_curves.json
6
+ """
7
  import json
8
  import math
9
  import random
gridmind_rl.egg-info/PKG-INFO DELETED
@@ -1,667 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: gridmind-rl
3
- Version: 1.0.0
4
- Summary: GridMind-RL: Industrial Load-Shaping and Demand-Response RL Environment. Control HVAC, thermal storage, and batch job scheduling under stochastic electricity prices and grid stress events.
5
- Author: LOKyu Team
6
- License: MIT
7
- Project-URL: Homepage, https://github.com/meta-pytorch/OpenEnv
8
- Project-URL: Repository, https://github.com/meta-pytorch/OpenEnv
9
- Project-URL: Documentation, https://github.com/meta-pytorch/OpenEnv
10
- Keywords: reinforcement-learning,openenv,energy-management,demand-response
11
- Classifier: Development Status :: 4 - Beta
12
- Classifier: Environment :: GPU
13
- Classifier: Intended Audience :: Science/Research
14
- Classifier: License :: OSI Approved :: MIT License
15
- Classifier: Natural Language :: English
16
- Classifier: Operating System :: OS Independent
17
- Classifier: Programming Language :: Python :: 3
18
- Classifier: Programming Language :: Python :: 3.9
19
- Classifier: Programming Language :: Python :: 3.10
20
- Classifier: Programming Language :: Python :: 3.11
21
- Classifier: Programming Language :: Python :: 3.12
22
- Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
23
- Requires-Python: >=3.9
24
- Description-Content-Type: text/markdown
25
- License-File: LICENSE
26
- Requires-Dist: openai>=1.0.0
27
- Requires-Dist: openenv-core>=0.2.0
28
- Requires-Dist: fastapi>=0.100.0
29
- Requires-Dist: uvicorn>=0.23.0
30
- Requires-Dist: pydantic>=2.0.0
31
- Requires-Dist: requests>=2.31.0
32
- Requires-Dist: httpx>=0.24.0
33
- Requires-Dist: pytest>=7.0.0
34
- Requires-Dist: python-dotenv>=1.0.0
35
- Provides-Extra: dev
36
- Requires-Dist: pytest>=7.0.0; extra == "dev"
37
- Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
38
- Requires-Dist: black>=23.0.0; extra == "dev"
39
- Requires-Dist: ruff>=0.1.0; extra == "dev"
40
- Dynamic: license-file
41
-
42
- # 🏢 GridMind-RL — Energy Management Reinforcement Learning Environment
43
-
44
- **A real-world RL environment for intelligent building energy optimization.** Control HVAC systems, thermal storage, batch job scheduling, and demand-response under stochastic electricity prices and grid stress events.
45
-
46
- Built on the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) specification. Containerized. Ready for Hugging Face Spaces deployment.
47
-
48
- ---
49
-
50
- ## 📖 Overview & Motivation
51
-
52
- Building energy management is a **real-world optimization problem** facing utilities, facility operators, and industrial sites globally. Traditional rule-based controls waste billions in energy costs and miss opportunities for grid participation.
53
-
54
- **GridMind-RL** simulates decisions that facility operators must make daily:
55
-
56
- - **Cost Optimization** — Buy electricity when prices are low, avoid peak surcharges
57
- - **Comfort & Safety** — Maintain indoor temperature within acceptable ranges while managing thermal inertia
58
- - **Grid Participation** — Respond to demand-response signals and grid stress events
59
- - **Batch Scheduling** — Coordinate industrial process timings to meet deadlines and minimize energy cost
60
- - **Carbon Minimization** — Shift consumption to periods when grid carbon intensity is low
61
-
62
- **Why this matters:** An RL agent trained in this environment can learn strategies that would be difficult or impossible for humans to hand-craft. The combination of continuous control (HVAC power, thermal storage), discrete decisions (batch scheduling), and multiple simultaneous objectives (cost, comfort, grid, deadlines, carbon) creates a realistic, challenging benchmark.
63
-
64
- **Episode Length:** 96 steps = 24 hours at 15-minute resolution. A complete episode requires strategic decision-making across a full day-night cycle.
65
-
66
- ---
67
-
68
- ## � Observation Space
69
-
70
- At each timestep, the environment provides the following observations. **Episode length: 96 steps** (15-minute intervals = 24 hours).
71
-
72
- | Field | Data Type | Range / Values | Description |
73
- |-------|-----------|-----------------|-------------|
74
- | `indoor_temperature` | float | 10–40 °C | Current building interior temperature |
75
- | `thermal_storage_level` | float | 0.0–1.0 | Thermal tank charge state (0 = empty, 1 = full) |
76
- | `process_demand` | float | ≥ 0 kW | Current industrial batch process power draw |
77
- | `current_price` | float | > 0 $/kWh | Real-time spot electricity price |
78
- | `grid_stress_signal` | float | 0.0–1.0 | Utility demand-response urgency (0.7+ = critical) |
79
- | `carbon_intensity` | float | ≥ 0 gCO₂/kWh | Current grid carbon intensity |
80
- | `hour_of_day` | int | 0–23 | Time-of-day context |
81
- | `batch_queue` | int array | — | Pending batch jobs with deadline slots |
82
- | `cumulative_cost` | float | ≥ 0 $ | Energy cost accumulated in current episode so far |
83
- | `step` | int | 0–95 | Current timestep (96 total = 24 hours) |
84
- | `building_id` | int | 0+ | Building identifier (for multi-building scenarios) |
85
-
86
- **Observation Properties:**
87
- - Observations are **deterministic** given the seed — same seed produces identical sequences
88
- - All fields are **normalized or bounded** for stable learning
89
- - Prices follow realistic time-of-use patterns; carbon intensity varies with grid mix
90
- - Batch queue starts empty; jobs appear stochastically based on the task/seed
91
-
92
- ---
93
-
94
- ## 🎮 Action Space
95
-
96
- At each step, the agent sends an action controlling four independent subsystems:
97
-
98
- | Field | Data Type | Range | Description |
99
- |-------|-----------|-------|-------------|
100
- | `hvac_power_level` | float | 0.0–1.0 | HVAC system power (0 = off, 1 = full) |
101
- | `thermal_charge_rate` | float | -1.0–1.0 | Thermal storage control (+charge, -discharge) |
102
- | `batch_job_slot` | int | 0–4 | Schedule next batch job: 0=immediate, 1–4=defer |
103
- | `load_shed_fraction` | float | 0.0–0.5 | Non-critical load reduction (0–50%) for demand-response |
104
- | `building_id` | int | 0+ | Building identifier (routing) |
105
-
106
- **Action Space Properties:**
107
- - **Continuous** (HVAC, thermal charging, load shedding) + **discrete** (batch scheduling) → hybrid control
108
- - Actions are applied every 15-minute step
109
- - Load shedding is capped at 50% to ensure safety/habitability
110
- - Batch scheduling decisions affect energy cost and deadline compliance
111
-
112
- ---
113
-
114
- ## 💡 Reward Function
115
-
116
- The environment provides **dense rewards every step** (not sparse, not binary). Each step returns:
117
- - A scalar reward (sum of components)
118
- - A dictionary of 7 weighted sub-components for transparency
119
-
120
- | Component | Purpose | Possible Values |
121
- |-----------|---------|-----------------|
122
- | **cost_savings** | Minimize energy bill | Negative (cost increases) to positive (savings vs baseline) |
123
- | **temp_constraint** | Maintain comfort | Gaussian bonus near 21°C, penalty outside 19–23°C bounds |
124
- | **grid_response** | Shift load during stress | Bonus proportional to shed fraction when grid signal > 0.7 |
125
- | **efficiency_bonus** | Exploit thermal storage | Reward charge/discharge timing and thermal arbitrage |
126
- | **stability_penalty** | Smooth control | Small penalty for rapid oscillations in HVAC/storage |
127
- | **deadline_penalty** | Meet job deadlines | Large penalty if batch job finishes after deadline |
128
- | **carbon_reward** | Low-carbon consumption | Bonus for consuming during low-carbon grid periods |
129
-
130
- **Example Reward Calculation:**
131
- If an agent takes a well-timed action during high-price, high-stress period:
132
- - Large positive `cost_savings` (avoided expensive hour)
133
- - Positive `grid_response` (shed load successfully)
134
- - Possible positive `carbon_reward` (if grid is clean)
135
- - **Total step reward** = weighted sum of all components
136
-
137
- This multi-objective reward structure encourages **learning tradeoffs** between cost, comfort, grid support, and carbon efficiency.
138
-
139
- ---
140
-
141
- ---
142
-
143
- ## 📋 Tasks & Difficulty Levels
144
-
145
- Three independent tasks with **deterministic programmatic graders**. Scores range **0.0–1.0**; higher is better.
146
-
147
- ### Task 1 — Cost Minimization (🟢 Easy)
148
-
149
- **Objective:** Minimize total energy cost in 24 hours with no other constraints.
150
-
151
- **Difficulty Rationale:** Only one objective (cost) to optimize; temperature and grid constraints are relaxed.
152
-
153
- **Grader Metrics:**
154
- - **Cost score (100%)** — Compares total episode energy cost to a deterministic baseline. Higher savings → higher score.
155
-
156
- **Baseline Score:** **0.7063**
157
-
158
- ---
159
-
160
- ### Task 2 — Constrained Temperature Control (🟡 Medium)
161
-
162
- **Objective:** Minimize cost while maintaining indoor temperature between **19–23°C** throughout the episode.
163
-
164
- **Difficulty Rationale:** Introduces a hard constraint (temperature bounds). Agent must use thermal storage strategically to meet both cost and comfort goals.
165
-
166
- **Grader Metrics:**
167
- - **Cost score (60%)** — Total energy cost vs baseline
168
- - **Temperature score (40%)** — Fraction of steps within bounds (hard penalty for violations)
169
-
170
- **Notes:** A naive agent might achieve low cost by disabling HVAC, but then temperatures drift out of bounds (0 score). Trade-off learning is required.
171
-
172
- **Baseline Score:** **0.6333**
173
-
174
- ---
175
-
176
- ### Task 3 — Full Demand Response (🔴 Hard)
177
-
178
- **Objective:** Minimize cost, maintain temperature, respond to grid events, complete batch jobs on time, and minimize carbon emissions. This is a **multi-objective constraint satisfaction** problem.
179
-
180
- **Difficulty Rationale:** Most realistic. Agent must balance five competing objectives simultaneously; any single failure is costly.
181
-
182
- **Grader Metrics:**
183
- - **Cost score (28%)** — Energy cost
184
- - **Temperature score (20%)** — Time within comfort bounds
185
- - **Grid response score (20%)** — Load shed during demand-response events (signal > 0.7)
186
- - **Batch deadline score (12%)** — Fraction of jobs completed before deadline
187
- - **Carbon reward score (20%)** — Shift load to low-carbon periods
188
-
189
- **Baseline Breakdown:**
190
- - Cost: 0.670, Temperature: 0.573, Grid: 0.214, Batch: 1.000, Carbon: 0.657
191
- - **Overall: 0.5966**
192
-
193
- **Challenge:** Grid response score (~0.21) shows that the baseline heuristic rarely sheds load opportunistically. Learning agents should discover that quick load shedding during high-price, high-stress periods yields significant cost savings.
194
-
195
- **Grader Determinism:** Same seed always produces identical evaluations. Episodes are seeded internally; reproducible batches of evaluations can be generated for benchmark comparisons.
196
-
197
- ---
198
-
199
- ## 🚀 Setup & Usage
200
-
201
- ### Prerequisites
202
-
203
- - **Docker** — [Download Docker Desktop](https://www.docker.com/products/docker-desktop/)
204
- - **Python 3.10+** — [Download Python](https://www.python.org/downloads/)
205
- - **Git** — [Download Git](https://git-scm.com/downloads)
206
-
207
- ### Quick Start (5 minutes)
208
-
209
- #### 1. Clone the Repository
210
-
211
- ```bash
212
- git clone https://github.com/LO-Kyu/gridmind-rl.git
213
- cd gridmind-rl
214
- ```
215
-
216
- #### 2. Build and Start the Environment Server
217
-
218
- ```bash
219
- docker build -t gridmind-rl .
220
- docker run --rm -d -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl
221
- ```
222
-
223
- Verify the server is running:
224
-
225
- ```bash
226
- # Check health endpoint
227
- curl http://localhost:7860/health
228
- # Expected: {"status":"ok","version":"1.0.0"}
229
- ```
230
-
231
- #### 3. Install Python Dependencies
232
-
233
- Open a **new terminal** and install:
234
-
235
- ```bash
236
- pip install -r python/requirements.txt
237
- ```
238
-
239
- #### 4. Run Inference (No LLM — Fast)
240
-
241
- Run a fast, deterministic baseline using heuristic policy:
242
-
243
- ```bash
244
- python inference.py --fast-mode --episodes 1
245
- ```
246
-
247
- Expected output (sample):
248
- ```
249
- [START] task=Cost_Minimization env=gridmind model=heuristic
250
- [STEP1] step=1 action={...} reward=10.5 done=false
251
- [STEP2] step=2 action={...} reward=12.3 done=false
252
- ...
253
- [STEP96] step=96 action={...} reward=8.9 done=true
254
- [END] success=true steps=96 rewards=[10.5, 12.3, ..., 8.9]
255
- ```
256
-
257
- Results saved to: `baseline_scores.json`
258
-
259
- #### 5. (Optional) Run with LLM
260
-
261
- To use an LLM agent for decision-making:
262
-
263
- 1. Get a **free API key** from [openrouter.ai/keys](https://openrouter.ai/keys) (no credit card needed)
264
- 2. Create `.env` file (copy from `.env.example`):
265
- ```bash
266
- cp .env.example .env
267
- ```
268
- 3. Edit `.env` and add your API key:
269
- ```env
270
- HF_TOKEN=sk-or-v1-your-key-here
271
- # or
272
- OPENAI_API_KEY=sk-or-v1-your-key-here
273
- ```
274
- 4. Run with LLM:
275
- ```bash
276
- python inference.py --episodes 1
277
- ```
278
-
279
- #### 6. Stop the Server (When Done)
280
-
281
- ```bash
282
- docker stop gridmind
283
- ```
284
-
285
- ---
286
-
287
- ### Inference Script Reference
288
-
289
- The `inference.py` script (project root) is the **hackathon submission entrypoint**.
290
-
291
- **Environment Variables:**
292
-
293
- | Variable | Default | Description |
294
- |----------|---------|-------------|
295
- | `HF_TOKEN` | (required for submission) | API key for LLM provider or HF Spaces |
296
- | `OPENAI_API_KEY` | (optional fallback) | Alternative OpenAI-compatible key |
297
- | `API_BASE_URL` | `https://openrouter.ai/api/v1` | LLM endpoint URL |
298
- | `MODEL_NAME` | `meta-llama/llama-3.3-70b-instruct:free` | Model identifier |
299
- | `ENV_URL` | `http://localhost:7860` | Environment server address |
300
-
301
- **Command-Line Flags:**
302
-
303
- | Flag | Default | Description |
304
- |------|---------|-------------|
305
- | `--episodes N` | 1 | Episodes per task (runs tasks 1, 2, 3 in sequence) |
306
- | `--fast-mode` | off | Don't call LLM; use heuristic policy only (reproducible, no API calls) |
307
- | `--llm-every N` | 4 | Reuse each LLM decision for N steps (reduces API calls) |
308
- | `--max-steps N` | 96 | Stop episode early after N steps |
309
- | `--env-url URL` | from env var | Override environment server URL |
310
- | `--output FILE` | `baseline_scores.json` | Output results filename |
311
- | `--verbose` | off | Print detailed logs for each step |
312
-
313
- **Examples:**
314
-
315
- ```bash
316
- # Run all 3 tasks with LLM (1 episode each)
317
- python inference.py --episodes 1
318
-
319
- # Reproduce baseline fast (no LLM)
320
- python inference.py --fast-mode --episodes 1
321
-
322
- # Only Task 2, heuristic, verbose output
323
- python inference.py --fast-mode --episodes 1 --verbose
324
-
325
- # Run 5 episodes per task with custom environment
326
- python inference.py --episodes 5 --env-url http://my-server:7860
327
- ```
328
-
329
- ---
330
-
331
- ### HTTP API Reference
332
-
333
- **Base URL:** `http://localhost:7860`
334
-
335
- | Endpoint | Method | Purpose | Example Body |
336
- |----------|--------|---------|---------------|
337
- | `/health` | GET | Liveness check | — |
338
- | `/ping` | GET | Lightweight ping | — |
339
- | `/reset` | POST | Reset episode for a task | `{"task_id": 1, "seed": 42}` |
340
- | `/step` | POST | Apply action, get next observation | `{"hvac_power_level": 0.5, "thermal_charge_rate": 0.1, ...}` |
341
- | `/state` | GET | Current full state snapshot | — |
342
- | `/grade` | GET | Episode score (0.0–1.0) with sub-scores | — |
343
- | `/replay` | GET | Full step-by-step trajectory | — |
344
- | `/tasks` | GET | Task definitions and grader weights | — |
345
- | `/metrics` | GET | Prometheus-format metrics | — |
346
-
347
- **Example Workflow:**
348
-
349
- ```bash
350
- # 1. Reset to Task 1 with seed 42
351
- curl -X POST http://localhost:7860/reset \
352
- -H "Content-Type: application/json" \
353
- -d '{"task_id": 1, "seed": 42}'
354
-
355
- # 2. Get initial observation
356
- curl http://localhost:7860/state
357
-
358
- # 3. Take an action
359
- curl -X POST http://localhost:7860/step \
360
- -H "Content-Type: application/json" \
361
- -d '{
362
- "hvac_power_level": 0.5,
363
- "thermal_charge_rate": 0.1,
364
- "batch_job_slot": 1,
365
- "load_shed_fraction": 0.0
366
- }'
367
-
368
- # 4. Check final score after episode completes
369
- curl http://localhost:7860/grade
370
- ```
371
-
372
- ---
373
-
374
- ## 📊 Baseline Performance Scores
375
-
376
- The baseline is a **heuristic policy** (rule-based, no LLM) representing a reasonable but non-optimized control strategy. Your RL agent should aim to exceed these scores.
377
-
378
- **Baseline Run:** `python inference.py --fast-mode --episodes 1`
379
-
380
- ### Summary Scores
381
-
382
- | Task | Difficulty | Score | Model |
383
- |------|:----------:|:-----:|-------|
384
- | Task 1 — Cost Minimization | 🟢 Easy | **0.7063** | Heuristic |
385
- | Task 2 — Temperature Control | 🟡 Medium | **0.6333** | Heuristic |
386
- | Task 3 — Full Demand Response | 🔴 Hard | **0.5966** | Heuristic |
387
- | **Overall Average** | — | **0.6454** | Heuristic |
388
-
389
- ### Detailed Breakdown
390
-
391
- #### Task 1 Results
392
- - **Task:** Cost minimization (96 hours × 15 min = 24 hours)
393
- - **Score:** 0.7063
394
- - **Sub-score:** Cost = 0.706
395
- - **Interpretation:** Heuristic achieves ~70% of optimal cost reduction vs baseline
396
-
397
- #### Task 2 Results
398
- - **Task:** Minimize cost while maintaining temperature 19–23°C
399
- - **Score:** 0.6333
400
- - **Sub-scores:**
401
- - Cost: 0.701
402
- - Temperature constraint: 0.531 (agent violated comfort bounds ~47% of the time)
403
- - **Interpretation:** Temperature management is challenging for the heuristic. Tighter thermal control could improve this score significantly.
404
-
405
- #### Task 3 Results (Most Interesting)
406
- - **Task:** Multi-objective: cost, temperature, grid response, batch deadlines, carbon
407
- - **Score:** 0.5966
408
- - **Sub-scores:**
409
- - Cost: 0.670
410
- - Temperature: 0.573 (similar temperature control challenge as Task 2)
411
- - **Grid response: 0.214** ← Heuristic rarely participates in demand-response
412
- - Batch deadline: 1.000 (heuristic always completes jobs on time)
413
- - Carbon: 0.657
414
-
415
- **Key Insight:** The heuristic's low grid response score (0.21) suggests that learned agents have significant room for improvement by:
416
- 1. Recognizing high-price + high-stress periods
417
- 2. Proactively shedding load to reduce cost
418
- 3. Using thermal storage to recover comfort afterward
419
-
420
- This multi-objective setting is where RL agents typically exceed heuristic baselines.
421
-
422
- ### Reproducibility & Evaluation
423
-
424
- - **Deterministic:** Baseline scores are **deterministic** — same seed always produces identical actions and rewards
425
- - **Seeding:** Each task uses a fixed base seed (1100, 1200, 1300) for reproducible evaluation
426
- - **Your Submissions:** Your agent will be evaluated on the same seed distribution; compare your scores directly to baseline
427
-
428
- ---
429
-
430
- ## 🏗️ Architecture
431
-
432
- ```
433
- ┌─────────────────────────────────────────────────────────────────┐
434
- │ inference.py (LLM Agent or Heuristic) │
435
- │ │ │
436
- │ │ HTTP: POST /reset, /step · GET /grade, /state │
437
- │ ▼ │
438
- │ ┌───────────────────────────────────────────────────────────┐ │
439
- │ │ Docker Container │ │
440
- │ │ │ │
441
- │ │ ┌─────────────────────┐ ┌───────────────────────────┐ │ │
442
- │ │ │ Go Environment │ │ Python Dashboard │ │ │
443
- │ │ │ Server (:7860) │ │ FastAPI + UI (:7861) │ │ │
444
- │ │ │ │ │ │ │ │
445
- │ │ │ • Physics engine │ │ • Proxies /api → :7860 │ │ │
446
- │ │ │ • Reward function │◄──│ • Real-time charts │ │ │
447
- │ │ │ • Task graders │ │ • State visualization │ │ │
448
- │ │ └─────────────────────┘ └───────────────────────────┘ │ │
449
- │ │ │ │
450
- │ │ Isolated · Reproducible · Non-root user │ │
451
- │ └───────────────────────────────────────────────────────────┘ │
452
- └─────────────────────────────────────────────────────────────────┘
453
- ```
454
-
455
- ### Project Structure
456
-
457
- ```
458
- gridmind/
459
- ├── inference.py ← Hackathon entrypoint (root)
460
- ├── openenv.yaml ← OpenEnv spec manifest
461
- ├── Dockerfile ← Multi-stage build (Go + Python)
462
- ├── .env ← API credentials (git-ignored)
463
- ├── baseline_scores.json ← Produced by inference.py
464
-
465
- ├── main.go ← HTTP server (routes, middleware, metrics)
466
- ├── env/ ← Core environment logic (Go)
467
- │ ├── environment.go ← Simulation: physics, thermal dynamics
468
- │ ├── models.go ← All data types (Observation, Action, etc.)
469
- │ ├── rewards.go ← 7-component dense reward function
470
- │ └── tasks.go ← 3 task definitions + deterministic graders
471
-
472
- ├── python/ ← Python support layer
473
- │ ├── inference.py ← Full LLM agent + heuristic fallback
474
- │ ├── models.py ← Typed Pydantic models (mirrors Go structs)
475
- │ ├── validate.py ← OpenEnv spec validation suite
476
- │ └── requirements.txt ← Python dependencies
477
-
478
- ├── tests/ ← Automated tests
479
- │ ├── environment_test.go ← Go unit tests (determinism, bounds, etc.)
480
- │ └── test_graders.py ← Python grader tests (pytest)
481
-
482
- └── dashboard/ ← Optional web dashboard
483
- ├── server.py ← FastAPI server
484
- └── static/ ← Frontend assets
485
- ```
486
-
487
- ---
488
-
489
- ## 🐳 Docker
490
-
491
- | Action | Command |
492
- |--------|---------|
493
- | **Build** | `docker build -t gridmind-rl .` |
494
- | **Run (foreground)** | `docker run --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
495
- | **Run (background)** | `docker run --rm -d -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
496
- | **Stop** | `docker stop gridmind` |
497
- | **Run inference inside container** | `docker exec -it gridmind python /app/inference.py --fast-mode` |
498
-
499
- The Dockerfile uses a **multi-stage build**:
500
- 1. **Stage 1** — Go 1.21 Alpine: compiles the environment server binary
501
- 2. **Stage 2** — Python 3.11 slim: runs the Go binary + Python dashboard via Supervisor
502
-
503
- ---
504
-
505
- ## ☁️ Hugging Face Space Deployment
506
-
507
- ### 1. Create a New Space
508
-
509
- Go to [huggingface.co/new-space](https://huggingface.co/new-space):
510
- - **SDK:** Docker
511
- - **Hardware:** CPU Basic (2 vCPU, 16 GB — free tier)
512
-
513
- ### 2. Push to HF
514
-
515
- ```bash
516
- git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/gridmind-rl
517
- git push hf main
518
- ```
519
-
520
- ### 3. Verify
521
-
522
- ```bash
523
- curl https://YOUR_USERNAME-gridmind-rl.hf.space/health
524
- # → {"status":"ok","version":"1.0.0"}
525
-
526
- curl -X POST https://YOUR_USERNAME-gridmind-rl.hf.space/reset \
527
- -H "Content-Type: application/json" \
528
- -d '{"task_id":1,"seed":42}'
529
- ```
530
-
531
- > **Note:** HF Spaces exposes port **7860** publicly. The dashboard (7861) is for local development only.
532
-
533
- ---
534
-
535
- ## 🧪 Testing
536
-
537
- ### Run Go Unit Tests
538
-
539
- ```bash
540
- cd gridmind
541
- go test ./tests/ -v
542
- ```
543
-
544
- ### Run Python Grader Tests (requires server running)
545
-
546
- ```bash
547
- pytest tests/test_graders.py -v
548
- ```
549
-
550
- ### Run Full OpenEnv Validation
551
-
552
- ```bash
553
- python python/validate.py --env-url http://localhost:7860
554
- ```
555
-
556
- ---
557
-
558
- ## 📝 Inference Script Reference
559
-
560
- The `inference.py` script at the project root is the **hackathon entrypoint**.
561
-
562
- ### Environment Variables
563
-
564
- | Variable | Default | Description |
565
- |----------|---------|-------------|
566
- | `API_BASE_URL` | `https://openrouter.ai/api/v1` | LLM API endpoint |
567
- | `MODEL_NAME` | `meta-llama/llama-3.1-8b-instruct:free` | Model to use |
568
- | `OPENAI_API_KEY` | — | API key (any OpenAI-compatible provider) |
569
- | `ENV_URL` | `http://localhost:7860` | Environment server URL |
570
-
571
- ### Command-Line Flags
572
-
573
- | Flag | Default | Description |
574
- |------|---------|-------------|
575
- | `--episodes N` | 1 | Episodes per task (tasks 1–3 run in sequence) |
576
- | `--fast-mode` | off | Use heuristic policy only (no LLM, fully reproducible) |
577
- | `--llm-every N` | 4 | Reuse each LLM action for N steps (reduces API calls) |
578
- | `--max-steps N` | 96 | Stop early after N steps |
579
- | `--env-url URL` | from env | Override environment URL |
580
- | `--output FILE` | `baseline_scores.json` | Output results file |
581
- | `--verbose` | off | Print detailed step logs |
582
-
583
- ### Stdout Log Format
584
-
585
- Each episode emits structured markers for automated evaluation:
586
-
587
- ```
588
- [START]
589
- [STEP1]
590
- [STEP2]
591
- ...
592
- [STEP96]
593
- [END]
594
- ```
595
-
596
- ---
597
-
598
- ## ✅ OpenEnv Specification Compliance
599
-
600
- GridMind-RL fully implements the OpenEnv specification for standardized RL environments. All components are present and tested:
601
-
602
- | Requirement | Status | Notes |
603
- |-------------|:------:|-------|
604
- | Manifest (`openenv.yaml`) | ✅ | All metadata, schema definitions, and version info |
605
- | Observation Schema | ✅ | 11-field object: temperature, storage, price, grid signal, carbon, hour, batch queue, cost, step, building_id |
606
- | Action Schema | ✅ | 5-field object: HVAC, thermal rate, batch slot, load shed, building_id |
607
- | HTTP Endpoints | ✅ | `/reset`, `/step`, `/state`, `/grade`, `/replay`, `/tasks`, `/health`, `/metrics` |
608
- | Determinism | ✅ | Seeded episode generation; identical seeds produce identical trajectories |
609
- | Typed Models | ✅ | Pydantic models (Python) mirror Go structs exactly |
610
- | Dense Rewards | ✅ | 7-component reward breakdown every step |
611
- | Graders | ✅ | 3 tasks with programmatic, deterministic graders (0.0–1.0 range) |
612
- | Exploit Detection | ✅ | Built into grading pipeline to flag unrealistic scores |
613
-
614
- ---
615
-
616
- ## ❓ FAQ
617
-
618
- **Q: Can I use a different model?**
619
- A: Yes. Set `MODEL_NAME` environment variable to any OpenAI-compatible model. The default (`meta-llama/llama-3.3-70b-instruct:free`) is free on OpenRouter with no credit card.
620
-
621
- **Q: How do I avoid rate limiting?**
622
- A: (1) Use `--fast-mode` for local testing (no API calls), (2) Set `--llm-every 4` to reuse decisions, (3) Use a paid API tier for submission, or (4) Train & submit an offline policy.
623
-
624
- **Q: Will my API key be exposed in submissions?**
625
- A: No. Store your API key in `.env` (git-ignored). On HF Spaces, set secrets via the Space settings UI; keys are never committed to the repo.
626
-
627
- **Q: What's the difference between `HF_TOKEN` and `OPENAI_API_KEY`?**
628
- A: `HF_TOKEN` is used in HF Space deployments and external evaluations. `OPENAI_API_KEY` is a fallback for local development. The code tries `HF_TOKEN` first, then `OPENAI_API_KEY`. At least one must be set.
629
-
630
- **Q: Can I submit an offline/trained policy?**
631
- A: Yes. Modify `python/inference.py` to use your trained agent instead of LLM calls. Ensure you still output the required `[START]`, `[STEP]`, `[END]` format.
632
-
633
- **Q: What if my submission times out?**
634
- A: Each episode is 96 steps. The environment runs 3 episodes (one per task). Optimize for latency: reduce LLM calls (use `--llm-every`), use a faster model, or submit a heuristic/trained offline policy.
635
-
636
- ---
637
-
638
- ## 🎯 Submission Checklist
639
-
640
- Before submitting, verify:
641
-
642
- - [ ] Clone repo, build Docker, run `docker run -p 7860:7860 -p 7861:7861 gridmind-rl`
643
- - [ ] Run `python inference.py --fast-mode --episodes 1` locally — should produce `baseline_scores.json`
644
- - [ ] Check `[START]`, `[STEP]`, `[END]` markers in stdout
645
- - [ ] Set `HF_TOKEN` or `OPENAI_API_KEY` in `.env` for LLM runs
646
- - [ ] Test with LLM: `python inference.py --episodes 1`
647
- - [ ] Verify Dockerfile builds without errors: `docker build -t gridmind-rl .`
648
- - [ ] Create HF Space (Docker SDK, CPU Basic)
649
- - [ ] Push repo to HF Space: `git push hf main`
650
- - [ ] Set secrets in HF Space UI: `HF_TOKEN`, `API_BASE_URL` (optional), `MODEL_NAME` (optional)
651
- - [ ] Verify Space is running: `curl https://YOUR_USERNAME-gridmind-rl.hf.space/health`
652
- - [ ] Submit Space URL to hackathon organizers
653
-
654
- ---
655
-
656
- ## 📚 Additional Resources
657
-
658
- - **OpenEnv Spec:** https://github.com/meta-pytorch/OpenEnv
659
- - **OpenRouter Free Models:** https://openrouter.ai/keys
660
- - **HF Spaces Docs:** https://huggingface.co/docs/hub/spaces
661
- - **GridMind Repository:** https://github.com/LO-Kyu/gridmind-rl
662
-
663
- ---
664
-
665
- ## 📄 License
666
-
667
- See `LICENSE` in the repository.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
gridmind_rl.egg-info/SOURCES.txt DELETED
@@ -1,16 +0,0 @@
1
- LICENSE
2
- README.md
3
- pyproject.toml
4
- gridmind_rl.egg-info/PKG-INFO
5
- gridmind_rl.egg-info/SOURCES.txt
6
- gridmind_rl.egg-info/dependency_links.txt
7
- gridmind_rl.egg-info/entry_points.txt
8
- gridmind_rl.egg-info/requires.txt
9
- gridmind_rl.egg-info/top_level.txt
10
- python/__init__.py
11
- python/inference.py
12
- python/models.py
13
- python/validate.py
14
- server/__init__.py
15
- server/app.py
16
- tests/test_graders.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
gridmind_rl.egg-info/dependency_links.txt DELETED
@@ -1 +0,0 @@
1
-
 
 
gridmind_rl.egg-info/entry_points.txt DELETED
@@ -1,2 +0,0 @@
1
- [console_scripts]
2
- gridmind-server = server.app:main
 
 
 
gridmind_rl.egg-info/requires.txt DELETED
@@ -1,15 +0,0 @@
1
- openai>=1.0.0
2
- openenv-core>=0.2.0
3
- fastapi>=0.100.0
4
- uvicorn>=0.23.0
5
- pydantic>=2.0.0
6
- requests>=2.31.0
7
- httpx>=0.24.0
8
- pytest>=7.0.0
9
- python-dotenv>=1.0.0
10
-
11
- [dev]
12
- pytest>=7.0.0
13
- pytest-cov>=4.0.0
14
- black>=23.0.0
15
- ruff>=0.1.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
gridmind_rl.egg-info/top_level.txt DELETED
@@ -1,2 +0,0 @@
1
- python
2
- server
 
 
 
inference.py CHANGED
@@ -1,13 +1,15 @@
1
  """
2
- Hackathon entrypoint: run from repo root with:
 
 
3
  python inference.py
4
 
5
  Reads environment variables:
6
  - API_BASE_URL (default: https://openrouter.ai/api/v1)
7
  - MODEL_NAME (default: meta-llama/llama-3.3-70b-instruct:free)
8
- - HF_TOKEN (mandatory, no default)
9
 
10
- Emits hackathon-compliant stdout format:
11
  [START] task=<name> env=gridmind model=<model>
12
  [STEP] step=<n> action=<json> reward=<0.00> done=<true|false> error=<msg|null>
13
  [END] success=<true|false> steps=<n> rewards=<r1,r2,...>
 
1
  """
2
+ GridMind-RL Agent Entry Point
3
+
4
+ Run from repo root with:
5
  python inference.py
6
 
7
  Reads environment variables:
8
  - API_BASE_URL (default: https://openrouter.ai/api/v1)
9
  - MODEL_NAME (default: meta-llama/llama-3.3-70b-instruct:free)
10
+ - HF_TOKEN (required, or OPENAI_API_KEY for testing)
11
 
12
+ Emits standard output format:
13
  [START] task=<name> env=gridmind model=<model>
14
  [STEP] step=<n> action=<json> reward=<0.00> done=<true|false> error=<msg|null>
15
  [END] success=<true|false> steps=<n> rewards=<r1,r2,...>
python/__pycache__/inference.cpython-311.pyc DELETED
Binary file (23.5 kB)
 
python/inference.py CHANGED
@@ -48,10 +48,10 @@ ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
48
  MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/llama-3.3-70b-instruct:free")
49
  API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
50
 
51
- # ── Hackathon Spec Compliance: HF_TOKEN → OpenAI API Key ──────────────────
52
- # Per hackathon spec, the LLM API credential is read from HF_TOKEN environment variable
53
  # and passed directly to the OpenAI client for initialization.
54
- # Primary: HF_TOKEN (hackathon spec requirement)
55
  # Fallback: OPENAI_API_KEY (for local testing/development)
56
  HF_TOKEN = os.getenv("HF_TOKEN")
57
  OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or HF_TOKEN
 
48
  MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/llama-3.3-70b-instruct:free")
49
  API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
50
 
51
+ # ── Environment Variable Handling ─────────────────────────────────────────
52
+ # The LLM API credential is read from HF_TOKEN or OPENAI_API_KEY environment variables
53
  # and passed directly to the OpenAI client for initialization.
54
+ # Primary: HF_TOKEN
55
  # Fallback: OPENAI_API_KEY (for local testing/development)
56
  HF_TOKEN = os.getenv("HF_TOKEN")
57
  OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or HF_TOKEN
server/__init__.py DELETED
File without changes
test_suite_results.json DELETED
@@ -1,307 +0,0 @@
1
- {
2
- "passed": 50,
3
- "failed": 0,
4
- "warnings": 0,
5
- "details": [
6
- {
7
- "id": "B1",
8
- "description": "API_BASE_URL has default",
9
- "status": "PASS",
10
- "details": "https://openrouter.ai/api/v1"
11
- },
12
- {
13
- "id": "B2",
14
- "description": "MODEL_NAME has default",
15
- "status": "PASS",
16
- "details": "meta-llama/llama-3.3-70b-instruct:free"
17
- },
18
- {
19
- "id": "B3",
20
- "description": "HF_TOKEN is mandatory (raises ValueError)",
21
- "status": "PASS",
22
- "details": "checked in python/inference.py line 57-58"
23
- },
24
- {
25
- "id": "B4",
26
- "description": "OpenAI client initialized",
27
- "status": "PASS",
28
- "details": "base_url and api_key from env vars"
29
- },
30
- {
31
- "id": "B5",
32
- "description": "HF_TOKEN is PRIMARY key",
33
- "status": "PASS",
34
- "details": "OPENAI_API_KEY is fallback only"
35
- },
36
- {
37
- "id": "C1",
38
- "description": "[START] format exists",
39
- "status": "PASS",
40
- "details": ""
41
- },
42
- {
43
- "id": "C2",
44
- "description": "[STEP] format exists",
45
- "status": "PASS",
46
- "details": ""
47
- },
48
- {
49
- "id": "C3",
50
- "description": "[END] format exists",
51
- "status": "PASS",
52
- "details": ""
53
- },
54
- {
55
- "id": "C4",
56
- "description": "Reward formatted to 2 decimal places",
57
- "status": "PASS",
58
- "details": ""
59
- },
60
- {
61
- "id": "C5",
62
- "description": "Lowercase booleans (true/false)",
63
- "status": "PASS",
64
- "details": ""
65
- },
66
- {
67
- "id": "C6",
68
- "description": "error field uses null",
69
- "status": "PASS",
70
- "details": ""
71
- },
72
- {
73
- "id": "D1",
74
- "description": "OpenEnv spec structure valid, name=gridmind-rl",
75
- "status": "PASS",
76
- "details": ""
77
- },
78
- {
79
- "id": "D1b",
80
- "description": "Port is 7860: 7860",
81
- "status": "PASS",
82
- "details": ""
83
- },
84
- {
85
- "id": "D1c",
86
- "description": "Has 3 tasks: 3",
87
- "status": "PASS",
88
- "details": ""
89
- },
90
- {
91
- "id": "D2",
92
- "description": "All OpenEnv endpoints declared: 6/6",
93
- "status": "PASS",
94
- "details": ""
95
- },
96
- {
97
- "id": "E1",
98
- "description": "Exactly 3 tasks: 3",
99
- "status": "PASS",
100
- "details": ""
101
- },
102
- {
103
- "id": "E1.1",
104
- "description": "Task 1 difficulty is easy",
105
- "status": "PASS",
106
- "details": "expected easy"
107
- },
108
- {
109
- "id": "E1.2",
110
- "description": "Task 2 difficulty is medium",
111
- "status": "PASS",
112
- "details": "expected medium"
113
- },
114
- {
115
- "id": "E1.3",
116
- "description": "Task 3 difficulty is hard",
117
- "status": "PASS",
118
- "details": "expected hard"
119
- },
120
- {
121
- "id": "E2",
122
- "description": "Task 1 grader exists",
123
- "status": "PASS",
124
- "details": ""
125
- },
126
- {
127
- "id": "E2",
128
- "description": "Task 2 grader exists",
129
- "status": "PASS",
130
- "details": ""
131
- },
132
- {
133
- "id": "E2",
134
- "description": "Task 3 grader exists",
135
- "status": "PASS",
136
- "details": ""
137
- },
138
- {
139
- "id": "E5",
140
- "description": "Exploit detection exists",
141
- "status": "PASS",
142
- "details": ""
143
- },
144
- {
145
- "id": "E6.1",
146
- "description": "Task 1 weights sum: 1.00",
147
- "status": "PASS",
148
- "details": ""
149
- },
150
- {
151
- "id": "E6.2",
152
- "description": "Task 2 weights sum: 1.00",
153
- "status": "PASS",
154
- "details": ""
155
- },
156
- {
157
- "id": "E6.3",
158
- "description": "Task 3 weights sum: 1.00",
159
- "status": "PASS",
160
- "details": ""
161
- },
162
- {
163
- "id": "F1",
164
- "description": "All 7 reward components exist: 7/7",
165
- "status": "PASS",
166
- "details": ""
167
- },
168
- {
169
- "id": "F2",
170
- "description": "Reward computed every step",
171
- "status": "PASS",
172
- "details": ""
173
- },
174
- {
175
- "id": "F3",
176
- "description": "Penalties for bad behaviors",
177
- "status": "PASS",
178
- "details": ""
179
- },
180
- {
181
- "id": "F4",
182
- "description": "Reward aggregated properly",
183
- "status": "PASS",
184
- "details": ""
185
- },
186
- {
187
- "id": "G1",
188
- "description": "Multi-stage build (Go builder + Python runtime)",
189
- "status": "PASS",
190
- "details": ""
191
- },
192
- {
193
- "id": "G2",
194
- "description": "Go server compiled",
195
- "status": "PASS",
196
- "details": ""
197
- },
198
- {
199
- "id": "G3",
200
- "description": "supervisord manages processes",
201
- "status": "PASS",
202
- "details": ""
203
- },
204
- {
205
- "id": "G4",
206
- "description": "Go server on port 7860",
207
- "status": "PASS",
208
- "details": ""
209
- },
210
- {
211
- "id": "G5",
212
- "description": "Dashboard on port 7861",
213
- "status": "PASS",
214
- "details": ""
215
- },
216
- {
217
- "id": "G7",
218
- "description": "Both ports exposed",
219
- "status": "PASS",
220
- "details": ""
221
- },
222
- {
223
- "id": "G8",
224
- "description": "Dockerfile syntax valid",
225
- "status": "PASS",
226
- "details": ""
227
- },
228
- {
229
- "id": "H1",
230
- "description": "README has Overview/Motivation",
231
- "status": "PASS",
232
- "details": ""
233
- },
234
- {
235
- "id": "H2",
236
- "description": "README documents Observation Space",
237
- "status": "PASS",
238
- "details": ""
239
- },
240
- {
241
- "id": "H3",
242
- "description": "README documents Action Space",
243
- "status": "PASS",
244
- "details": ""
245
- },
246
- {
247
- "id": "H4",
248
- "description": "README has task descriptions",
249
- "status": "PASS",
250
- "details": ""
251
- },
252
- {
253
- "id": "H5",
254
- "description": "README has setup/usage",
255
- "status": "PASS",
256
- "details": ""
257
- },
258
- {
259
- "id": "H6",
260
- "description": "README mentions baseline",
261
- "status": "PASS",
262
- "details": ""
263
- },
264
- {
265
- "id": "H7",
266
- "description": "README mentions OpenEnv/HF",
267
- "status": "PASS",
268
- "details": ""
269
- },
270
- {
271
- "id": "I1",
272
- "description": "All 3 task scores present: 3",
273
- "status": "PASS",
274
- "details": ""
275
- },
276
- {
277
- "id": "I2",
278
- "description": "All scores in [0.0, 1.0]",
279
- "status": "PASS",
280
- "details": ""
281
- },
282
- {
283
- "id": "I3",
284
- "description": "Has model and api_base fields",
285
- "status": "PASS",
286
- "details": ""
287
- },
288
- {
289
- "id": "J1",
290
- "description": "requirements.txt exists and includes required packages",
291
- "status": "PASS",
292
- "details": ""
293
- },
294
- {
295
- "id": "J2",
296
- "description": "openai package is importable",
297
- "status": "PASS",
298
- "details": ""
299
- },
300
- {
301
- "id": "J4",
302
- "description": "requests package is importable",
303
- "status": "PASS",
304
- "details": ""
305
- }
306
- ]
307
- }