Spaces:
Running
Running
docs: add HF blog post draft for community posting
Browse files- HF_BLOG_POST.md +94 -0
HF_BLOG_POST.md
ADDED
|
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: GridMind-RL: Training LLMs to Manage Industrial Buildings Under Faults and Grid Stress
|
| 3 |
+
description: An OpenEnv-compatible RL environment where LLMs learn to control HVAC, thermal storage, and batch scheduling across multi-building industrial facilities.
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
**Every industrial building wastes 20–30% of its energy because control systems can't handle real-time pricing, equipment faults, and grid stress simultaneously.** GridMind-RL is an OpenEnv-compatible RL environment that makes LLMs trainable on this problem.
|
| 7 |
+
|
| 8 |
+
## The Problem
|
| 9 |
+
|
| 10 |
+
Industrial buildings consume ~40% of global electricity. Most still use naive "always-on" HVAC policies. The capability gap is clear:
|
| 11 |
+
|
| 12 |
+
- LLMs can understand complex pricing curves, fault alerts, and natural language instructions
|
| 13 |
+
- But no environment exists to train them on real building energy management
|
| 14 |
+
- Existing RL environments are mostly grid-worlds or toy games — not genuine industrial problems
|
| 15 |
+
|
| 16 |
+
GridMind-RL closes this gap by simulating a complete building energy system where agents must:
|
| 17 |
+
|
| 18 |
+
- Navigate 24-hour price volatility (off-peak vs peak: 4¢ to 32¢/kWh)
|
| 19 |
+
- Maintain comfort (19–23°C) while minimizing cost
|
| 20 |
+
- Respond to grid stress emergencies
|
| 21 |
+
- Handle equipment faults (chiller failure, sensor malfunction, grid outages, tariff spikes)
|
| 22 |
+
- Parse and follow natural language objective cards
|
| 23 |
+
|
| 24 |
+
## The Environment
|
| 25 |
+
|
| 26 |
+
GridMind-RL is a 96-step episode (24 simulated hours at 15-minute resolution) with:
|
| 27 |
+
|
| 28 |
+
| Field | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Observation** | 13 fields: temperature, storage, price, stress, carbon, faults, HVAC efficiency, instruction card |
|
| 31 |
+
| **Actions** | HVAC level (0–1), thermal charge (−1 to 1), batch slot (0–4), load shed (0–0.5) |
|
| 32 |
+
| **Reward** | 9-component weighted sum: cost, temperature, grid, deadline, efficiency, stability, carbon, instruction, fault_mitigation |
|
| 33 |
+
| **Tasks** | 4 types: cost minimization, temperature management, demand response, instruction following |
|
| 34 |
+
|
| 35 |
+
### Four Hackathon Themes in One Environment
|
| 36 |
+
|
| 37 |
+
**Track 1 — Multi-Agent Interactions:** A coordinator LLM reads `/feeder` to see fleet-wide demand across 3 buildings, then sets per-building price multipliers via `/coordinate` to orchestrate behavior.
|
| 38 |
+
|
| 39 |
+
**Track 2 — Long-Horizon Planning & Instruction Following:** Task 4 presents a natural language objective card like "Keep total energy cost under $2.50 while maintaining 19–23°C." Agents must plan across all 96 steps.
|
| 40 |
+
|
| 41 |
+
**Track 3 — World Modeling:** The `/simulate` endpoint lets agents ask "what if?" before acting. When HVAC efficiency is low or faults are active, the agent simulates the proposed action and revises if the predicted reward is poor.
|
| 42 |
+
|
| 43 |
+
**Track 4 — Fault Handling:** Four fault types inject unpredictability:
|
| 44 |
+
- **Chiller failure**: HVAC drops to 20% capacity
|
| 45 |
+
- **Grid outage**: Price ×3, stress = 1.0
|
| 46 |
+
- **Sensor fault**: Temperature readings jitter ±5°C
|
| 47 |
+
- **Tariff spike**: Emergency 4× price surge
|
| 48 |
+
|
| 49 |
+
**Track 5 — Self-Improvement:** Curriculum learning auto-advances the agent from task 1 to task 4 when performance thresholds are met.
|
| 50 |
+
|
| 51 |
+
## Results
|
| 52 |
+
|
| 53 |
+
Heuristic baseline scores (fixed policy, no learning) across all 4 tasks:
|
| 54 |
+
|
| 55 |
+
| Policy | Task 1 | Task 2 | Task 3 | Task 4 |
|
| 56 |
+
|--------|--------|--------|--------|--------|
|
| 57 |
+
| **Heuristic Baseline** | 0.506 | 0.459 | 0.600 | 0.492 |
|
| 58 |
+
|
| 59 |
+
The GRPO fine-tuned model shows improvement over the zero-shot LLM baseline. The training curve below shows the learning trajectory:
|
| 60 |
+
|
| 61 |
+

|
| 62 |
+
|
| 63 |
+
## Training
|
| 64 |
+
|
| 65 |
+
GridMind-RL uses GRPO (Group Relative Policy Optimization) via HuggingFace TRL with Unsloth 4-bit LoRA fine-tuning of Qwen2.5-0.5B-Instruct. The training script connects to the live environment via HTTP, running 8-step rollouts and using the `/grade` endpoint (episode-level score 0.0–1.0) as the primary reward signal.
|
| 66 |
+
|
| 67 |
+
```python
|
| 68 |
+
# Training runs against the live environment
|
| 69 |
+
python scripts/train_unsloth.py --steps 500 --output-csv results/training_log.csv
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
Or run the Colab notebook: [gridmind_grpo_colab.ipynb](https://colab.research.google.com/)
|
| 73 |
+
|
| 74 |
+
## How to Try It
|
| 75 |
+
|
| 76 |
+
```bash
|
| 77 |
+
# Quick health check
|
| 78 |
+
curl https://lo-kyu-gridmind.hf.space/health
|
| 79 |
+
|
| 80 |
+
# Run a heuristic baseline
|
| 81 |
+
python inference.py --fast-mode --task 3 --episodes 5
|
| 82 |
+
|
| 83 |
+
# Run the LLM agent
|
| 84 |
+
python inference.py --task 3 --episodes 5
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
Live environment: [https://lo-kyu-gridmind.hf.space](https://lo-kyu-gridmind.hf.space)
|
| 88 |
+
Dashboard: [https://lo-kyu-gridmind.hf.space/dashboard](https://lo-kyu-gridmind.hf.space/dashboard)
|
| 89 |
+
|
| 90 |
+
Code: [github.com/LO-Kyu/gridmind](https://github.com/LO-Kyu/gridmind)
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
*GridMind-RL was built for the Meta PyTorch OpenEnv Hackathon Grand Finale, April 25–26, 2026, at Scaler School of Technology, Bangalore.*
|