Spaces:
Running
Running
Refactor project structure and remove unused files; update inference script for clarity and compliance with environment variable handling.
Browse files- .gitignore +43 -0
- README.md +154 -540
- __pycache__/inference.cpython-311.pyc +0 -0
- dashboard/__pycache__/server.cpython-311.pyc +0 -0
- dashboard/__pycache__/server.cpython-314.pyc +0 -0
- data/generate_prices.py +6 -1
- gridmind_rl.egg-info/PKG-INFO +0 -667
- gridmind_rl.egg-info/SOURCES.txt +0 -16
- gridmind_rl.egg-info/dependency_links.txt +0 -1
- gridmind_rl.egg-info/entry_points.txt +0 -2
- gridmind_rl.egg-info/requires.txt +0 -15
- gridmind_rl.egg-info/top_level.txt +0 -2
- inference.py +5 -3
- python/__pycache__/inference.cpython-311.pyc +0 -0
- python/inference.py +3 -3
- server/__init__.py +0 -0
- test_suite_results.json +0 -307
.gitignore
CHANGED
|
@@ -1,20 +1,63 @@
|
|
| 1 |
# Secrets
|
| 2 |
.env
|
|
|
|
| 3 |
|
| 4 |
# Python
|
| 5 |
__pycache__/
|
| 6 |
*.pyc
|
| 7 |
*.pyo
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
.pytest_cache/
|
| 9 |
|
| 10 |
# Go
|
| 11 |
*.exe
|
| 12 |
gridmind-out.exe
|
|
|
|
|
|
|
| 13 |
|
| 14 |
# IDE
|
| 15 |
.vscode/
|
| 16 |
.idea/
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
# OS
|
| 19 |
.DS_Store
|
| 20 |
Thumbs.db
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Secrets
|
| 2 |
.env
|
| 3 |
+
!.env.example
|
| 4 |
|
| 5 |
# Python
|
| 6 |
__pycache__/
|
| 7 |
*.pyc
|
| 8 |
*.pyo
|
| 9 |
+
*.so
|
| 10 |
+
.Python
|
| 11 |
+
env/
|
| 12 |
+
venv/
|
| 13 |
+
ENV/
|
| 14 |
+
build/
|
| 15 |
+
develop-eggs/
|
| 16 |
+
dist/
|
| 17 |
+
downloads/
|
| 18 |
+
eggs/
|
| 19 |
+
.eggs/
|
| 20 |
+
lib/
|
| 21 |
+
lib64/
|
| 22 |
+
parts/
|
| 23 |
+
sdist/
|
| 24 |
+
var/
|
| 25 |
+
wheels/
|
| 26 |
+
*.egg-info/
|
| 27 |
+
.installed.cfg
|
| 28 |
+
*.egg
|
| 29 |
.pytest_cache/
|
| 30 |
|
| 31 |
# Go
|
| 32 |
*.exe
|
| 33 |
gridmind-out.exe
|
| 34 |
+
gridmind-test.exe
|
| 35 |
+
gridmind-server
|
| 36 |
|
| 37 |
# IDE
|
| 38 |
.vscode/
|
| 39 |
.idea/
|
| 40 |
+
*.swp
|
| 41 |
+
*.swo
|
| 42 |
+
*~
|
| 43 |
|
| 44 |
# OS
|
| 45 |
.DS_Store
|
| 46 |
Thumbs.db
|
| 47 |
+
|
| 48 |
+
# Build and test artifacts
|
| 49 |
+
build/
|
| 50 |
+
dist/
|
| 51 |
+
*.log
|
| 52 |
+
*.orig
|
| 53 |
+
test_output.json
|
| 54 |
+
test_suite_results.json
|
| 55 |
+
|
| 56 |
+
# Old architecture (removed)
|
| 57 |
+
server/
|
| 58 |
+
embed/
|
| 59 |
+
gridmind_rl.egg-info/
|
| 60 |
+
HACKATHON_COMPLIANCE_REPORT.md
|
| 61 |
+
STRICT_QA_COMPLIANCE_REPORT.md
|
| 62 |
+
qa_strict_tests.py
|
| 63 |
+
qa_test_suite.py
|
README.md
CHANGED
|
@@ -1,626 +1,240 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
**
|
| 4 |
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
---
|
| 8 |
|
| 9 |
-
##
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
-
- **Comfort & Safety** — Maintain indoor temperature within acceptable ranges while managing thermal inertia
|
| 17 |
-
- **Grid Participation** — Respond to demand-response signals and grid stress events
|
| 18 |
-
- **Batch Scheduling** — Coordinate industrial process timings to meet deadlines and minimize energy cost
|
| 19 |
-
- **Carbon Minimization** — Shift consumption to periods when grid carbon intensity is low
|
| 20 |
-
|
| 21 |
-
**Why this matters:** An RL agent trained in this environment can learn strategies that would be difficult or impossible for humans to hand-craft. The combination of continuous control (HVAC power, thermal storage), discrete decisions (batch scheduling), and multiple simultaneous objectives (cost, comfort, grid, deadlines, carbon) creates a realistic, challenging benchmark.
|
| 22 |
-
|
| 23 |
-
**Episode Length:** 96 steps = 24 hours at 15-minute resolution. A complete episode requires strategic decision-making across a full day-night cycle.
|
| 24 |
-
|
| 25 |
-
---
|
| 26 |
-
|
| 27 |
-
## � Observation Space
|
| 28 |
-
|
| 29 |
-
At each timestep, the environment provides the following observations. **Episode length: 96 steps** (15-minute intervals = 24 hours).
|
| 30 |
-
|
| 31 |
-
| Field | Data Type | Range / Values | Description |
|
| 32 |
-
|-------|-----------|-----------------|-------------|
|
| 33 |
-
| `indoor_temperature` | float | 10–40 °C | Current building interior temperature |
|
| 34 |
-
| `thermal_storage_level` | float | 0.0–1.0 | Thermal tank charge state (0 = empty, 1 = full) |
|
| 35 |
-
| `process_demand` | float | ≥ 0 kW | Current industrial batch process power draw |
|
| 36 |
-
| `current_price` | float | > 0 $/kWh | Real-time spot electricity price |
|
| 37 |
-
| `grid_stress_signal` | float | 0.0–1.0 | Utility demand-response urgency (0.7+ = critical) |
|
| 38 |
-
| `carbon_intensity` | float | ≥ 0 gCO₂/kWh | Current grid carbon intensity |
|
| 39 |
-
| `hour_of_day` | int | 0–23 | Time-of-day context |
|
| 40 |
-
| `batch_queue` | int array | — | Pending batch jobs with deadline slots |
|
| 41 |
-
| `cumulative_cost` | float | ≥ 0 $ | Energy cost accumulated in current episode so far |
|
| 42 |
-
| `step` | int | 0–95 | Current timestep (96 total = 24 hours) |
|
| 43 |
-
| `building_id` | int | 0+ | Building identifier (for multi-building scenarios) |
|
| 44 |
-
|
| 45 |
-
**Observation Properties:**
|
| 46 |
-
- Observations are **deterministic** given the seed — same seed produces identical sequences
|
| 47 |
-
- All fields are **normalized or bounded** for stable learning
|
| 48 |
-
- Prices follow realistic time-of-use patterns; carbon intensity varies with grid mix
|
| 49 |
-
- Batch queue starts empty; jobs appear stochastically based on the task/seed
|
| 50 |
-
|
| 51 |
-
---
|
| 52 |
-
|
| 53 |
-
## 🎮 Action Space
|
| 54 |
-
|
| 55 |
-
At each step, the agent sends an action controlling four independent subsystems:
|
| 56 |
-
|
| 57 |
-
| Field | Data Type | Range | Description |
|
| 58 |
-
|-------|-----------|-------|-------------|
|
| 59 |
-
| `hvac_power_level` | float | 0.0–1.0 | HVAC system power (0 = off, 1 = full) |
|
| 60 |
-
| `thermal_charge_rate` | float | -1.0–1.0 | Thermal storage control (+charge, -discharge) |
|
| 61 |
-
| `batch_job_slot` | int | 0–4 | Schedule next batch job: 0=immediate, 1–4=defer |
|
| 62 |
-
| `load_shed_fraction` | float | 0.0–0.5 | Non-critical load reduction (0–50%) for demand-response |
|
| 63 |
-
| `building_id` | int | 0+ | Building identifier (routing) |
|
| 64 |
-
|
| 65 |
-
**Action Space Properties:**
|
| 66 |
-
- **Continuous** (HVAC, thermal charging, load shedding) + **discrete** (batch scheduling) → hybrid control
|
| 67 |
-
- Actions are applied every 15-minute step
|
| 68 |
-
- Load shedding is capped at 50% to ensure safety/habitability
|
| 69 |
-
- Batch scheduling decisions affect energy cost and deadline compliance
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
-
##
|
| 74 |
-
|
| 75 |
-
The environment provides **dense rewards every step** (not sparse, not binary). Each step returns:
|
| 76 |
-
- A scalar reward (sum of components)
|
| 77 |
-
- A dictionary of 7 weighted sub-components for transparency
|
| 78 |
|
| 79 |
-
|
| 80 |
-
|-----------|---------|-----------------|
|
| 81 |
-
| **cost_savings** | Minimize energy bill | Negative (cost increases) to positive (savings vs baseline) |
|
| 82 |
-
| **temp_constraint** | Maintain comfort | Gaussian bonus near 21°C, penalty outside 19–23°C bounds |
|
| 83 |
-
| **grid_response** | Shift load during stress | Bonus proportional to shed fraction when grid signal > 0.7 |
|
| 84 |
-
| **efficiency_bonus** | Exploit thermal storage | Reward charge/discharge timing and thermal arbitrage |
|
| 85 |
-
| **stability_penalty** | Smooth control | Small penalty for rapid oscillations in HVAC/storage |
|
| 86 |
-
| **deadline_penalty** | Meet job deadlines | Large penalty if batch job finishes after deadline |
|
| 87 |
-
| **carbon_reward** | Low-carbon consumption | Bonus for consuming during low-carbon grid periods |
|
| 88 |
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
|
|
|
|
|
|
| 97 |
|
| 98 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
|
| 100 |
---
|
| 101 |
|
| 102 |
-
##
|
| 103 |
|
| 104 |
-
|
| 105 |
|
| 106 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
-
|
| 109 |
|
| 110 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
|
| 112 |
-
|
| 113 |
-
- **Cost score (100%)** — Compares total episode energy cost to a deterministic baseline. Higher savings → higher score.
|
| 114 |
|
| 115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
---
|
| 118 |
|
| 119 |
-
##
|
| 120 |
-
|
| 121 |
-
**Objective:** Minimize cost while maintaining indoor temperature between **19–23°C** throughout the episode.
|
| 122 |
-
|
| 123 |
-
**Difficulty Rationale:** Introduces a hard constraint (temperature bounds). Agent must use thermal storage strategically to meet both cost and comfort goals.
|
| 124 |
|
| 125 |
-
|
| 126 |
-
-
|
| 127 |
-
|
|
|
|
|
|
|
| 128 |
|
| 129 |
-
**
|
| 130 |
-
|
| 131 |
-
**
|
| 132 |
|
| 133 |
---
|
| 134 |
|
| 135 |
-
##
|
| 136 |
-
|
| 137 |
-
**Objective:** Minimize cost, maintain temperature, respond to grid events, complete batch jobs on time, and minimize carbon emissions. This is a **multi-objective constraint satisfaction** problem.
|
| 138 |
-
|
| 139 |
-
**Difficulty Rationale:** Most realistic. Agent must balance five competing objectives simultaneously; any single failure is costly.
|
| 140 |
-
|
| 141 |
-
**Grader Metrics:**
|
| 142 |
-
- **Cost score (28%)** — Energy cost
|
| 143 |
-
- **Temperature score (20%)** — Time within comfort bounds
|
| 144 |
-
- **Grid response score (20%)** — Load shed during demand-response events (signal > 0.7)
|
| 145 |
-
- **Batch deadline score (12%)** — Fraction of jobs completed before deadline
|
| 146 |
-
- **Carbon reward score (20%)** — Shift load to low-carbon periods
|
| 147 |
-
|
| 148 |
-
**Baseline Breakdown:**
|
| 149 |
-
- Cost: 0.670, Temperature: 0.573, Grid: 0.214, Batch: 1.000, Carbon: 0.657
|
| 150 |
-
- **Overall: 0.5966**
|
| 151 |
-
|
| 152 |
-
**Challenge:** Grid response score (~0.21) shows that the baseline heuristic rarely sheds load opportunistically. Learning agents should discover that quick load shedding during high-price, high-stress periods yields significant cost savings.
|
| 153 |
-
|
| 154 |
-
**Grader Determinism:** Same seed always produces identical evaluations. Episodes are seeded internally; reproducible batches of evaluations can be generated for benchmark comparisons.
|
| 155 |
-
|
| 156 |
-
---
|
| 157 |
-
|
| 158 |
-
## 🚀 Setup & Usage
|
| 159 |
-
|
| 160 |
-
### Prerequisites
|
| 161 |
-
|
| 162 |
-
- **Docker** — [Download Docker Desktop](https://www.docker.com/products/docker-desktop/)
|
| 163 |
-
- **Python 3.10+** — [Download Python](https://www.python.org/downloads/)
|
| 164 |
-
- **Git** — [Download Git](https://git-scm.com/downloads)
|
| 165 |
-
|
| 166 |
-
### Quick Start (5 minutes)
|
| 167 |
-
|
| 168 |
-
#### 1. Clone the Repository
|
| 169 |
-
|
| 170 |
-
```bash
|
| 171 |
-
git clone https://github.com/LO-Kyu/gridmind-rl.git
|
| 172 |
-
cd gridmind-rl
|
| 173 |
-
```
|
| 174 |
|
| 175 |
-
###
|
| 176 |
|
| 177 |
```bash
|
| 178 |
docker build -t gridmind-rl .
|
| 179 |
-
docker run -
|
| 180 |
```
|
| 181 |
|
| 182 |
-
|
| 183 |
|
|
|
|
| 184 |
```bash
|
| 185 |
-
|
| 186 |
-
curl http://localhost:7860/health
|
| 187 |
-
# Expected: {"status":"ok","version":"1.0.0"}
|
| 188 |
```
|
| 189 |
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
Open a **new terminal** and install:
|
| 193 |
-
|
| 194 |
```bash
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
#### 4. Run Inference (No LLM — Fast)
|
| 199 |
-
|
| 200 |
-
Run a fast, deterministic baseline using heuristic policy:
|
| 201 |
|
| 202 |
-
|
| 203 |
python inference.py --fast-mode --episodes 1
|
| 204 |
-
```
|
| 205 |
-
|
| 206 |
-
Expected output (sample):
|
| 207 |
-
```
|
| 208 |
-
[START] task=Cost_Minimization env=gridmind model=heuristic
|
| 209 |
-
[STEP1] step=1 action={...} reward=10.5 done=false
|
| 210 |
-
[STEP2] step=2 action={...} reward=12.3 done=false
|
| 211 |
-
...
|
| 212 |
-
[STEP96] step=96 action={...} reward=8.9 done=true
|
| 213 |
-
[END] success=true steps=96 rewards=[10.5, 12.3, ..., 8.9]
|
| 214 |
-
```
|
| 215 |
-
|
| 216 |
-
Results saved to: `baseline_scores.json`
|
| 217 |
-
|
| 218 |
-
#### 5. (Optional) Run with LLM
|
| 219 |
-
|
| 220 |
-
To use an LLM agent for decision-making:
|
| 221 |
-
|
| 222 |
-
1. Get a **free API key** from [openrouter.ai/keys](https://openrouter.ai/keys) (no credit card needed)
|
| 223 |
-
2. Create `.env` file (copy from `.env.example`):
|
| 224 |
-
```bash
|
| 225 |
-
cp .env.example .env
|
| 226 |
-
```
|
| 227 |
-
3. Edit `.env` and add your API key:
|
| 228 |
-
```env
|
| 229 |
-
HF_TOKEN=sk-or-v1-your-key-here
|
| 230 |
-
# or
|
| 231 |
-
OPENAI_API_KEY=sk-or-v1-your-key-here
|
| 232 |
-
```
|
| 233 |
-
4. Run with LLM:
|
| 234 |
-
```bash
|
| 235 |
-
python inference.py --episodes 1
|
| 236 |
-
```
|
| 237 |
-
|
| 238 |
-
#### 6. Stop the Server (When Done)
|
| 239 |
-
|
| 240 |
-
```bash
|
| 241 |
-
docker stop gridmind
|
| 242 |
-
```
|
| 243 |
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
### Inference Script Reference
|
| 247 |
-
|
| 248 |
-
The `inference.py` script (project root) is the **hackathon submission entrypoint**.
|
| 249 |
-
|
| 250 |
-
**Environment Variables:**
|
| 251 |
-
|
| 252 |
-
| Variable | Default | Description |
|
| 253 |
-
|----------|---------|-------------|
|
| 254 |
-
| `HF_TOKEN` | (required for submission) | API key for LLM provider or HF Spaces |
|
| 255 |
-
| `OPENAI_API_KEY` | (optional fallback) | Alternative OpenAI-compatible key |
|
| 256 |
-
| `API_BASE_URL` | `https://openrouter.ai/api/v1` | LLM endpoint URL |
|
| 257 |
-
| `MODEL_NAME` | `meta-llama/llama-3.3-70b-instruct:free` | Model identifier |
|
| 258 |
-
| `ENV_URL` | `http://localhost:7860` | Environment server address |
|
| 259 |
-
|
| 260 |
-
**Command-Line Flags:**
|
| 261 |
-
|
| 262 |
-
| Flag | Default | Description |
|
| 263 |
-
|------|---------|-------------|
|
| 264 |
-
| `--episodes N` | 1 | Episodes per task (runs tasks 1, 2, 3 in sequence) |
|
| 265 |
-
| `--fast-mode` | off | Don't call LLM; use heuristic policy only (reproducible, no API calls) |
|
| 266 |
-
| `--llm-every N` | 4 | Reuse each LLM decision for N steps (reduces API calls) |
|
| 267 |
-
| `--max-steps N` | 96 | Stop episode early after N steps |
|
| 268 |
-
| `--env-url URL` | from env var | Override environment server URL |
|
| 269 |
-
| `--output FILE` | `baseline_scores.json` | Output results filename |
|
| 270 |
-
| `--verbose` | off | Print detailed logs for each step |
|
| 271 |
-
|
| 272 |
-
**Examples:**
|
| 273 |
-
|
| 274 |
-
```bash
|
| 275 |
-
# Run all 3 tasks with LLM (1 episode each)
|
| 276 |
python inference.py --episodes 1
|
| 277 |
-
|
| 278 |
-
# Reproduce baseline fast (no LLM)
|
| 279 |
-
python inference.py --fast-mode --episodes 1
|
| 280 |
-
|
| 281 |
-
# Only Task 2, heuristic, verbose output
|
| 282 |
-
python inference.py --fast-mode --episodes 1 --verbose
|
| 283 |
-
|
| 284 |
-
# Run 5 episodes per task with custom environment
|
| 285 |
-
python inference.py --episodes 5 --env-url http://my-server:7860
|
| 286 |
```
|
| 287 |
|
| 288 |
-
|
| 289 |
-
|
| 290 |
-
### HTTP API Reference
|
| 291 |
-
|
| 292 |
-
**Base URL:** `http://localhost:7860`
|
| 293 |
-
|
| 294 |
-
| Endpoint | Method | Purpose | Example Body |
|
| 295 |
-
|----------|--------|---------|---------------|
|
| 296 |
-
| `/health` | GET | Liveness check | — |
|
| 297 |
-
| `/ping` | GET | Lightweight ping | — |
|
| 298 |
-
| `/reset` | POST | Reset episode for a task | `{"task_id": 1, "seed": 42}` |
|
| 299 |
-
| `/step` | POST | Apply action, get next observation | `{"hvac_power_level": 0.5, "thermal_charge_rate": 0.1, ...}` |
|
| 300 |
-
| `/state` | GET | Current full state snapshot | — |
|
| 301 |
-
| `/grade` | GET | Episode score (0.0–1.0) with sub-scores | — |
|
| 302 |
-
| `/replay` | GET | Full step-by-step trajectory | — |
|
| 303 |
-
| `/tasks` | GET | Task definitions and grader weights | — |
|
| 304 |
-
| `/metrics` | GET | Prometheus-format metrics | — |
|
| 305 |
-
|
| 306 |
-
**Example Workflow:**
|
| 307 |
|
| 308 |
-
|
| 309 |
-
|
| 310 |
-
|
| 311 |
-
|
| 312 |
-
|
| 313 |
-
|
| 314 |
-
|
| 315 |
-
curl http://localhost:7860/state
|
| 316 |
-
|
| 317 |
-
# 3. Take an action
|
| 318 |
-
curl -X POST http://localhost:7860/step \
|
| 319 |
-
-H "Content-Type: application/json" \
|
| 320 |
-
-d '{
|
| 321 |
-
"hvac_power_level": 0.5,
|
| 322 |
-
"thermal_charge_rate": 0.1,
|
| 323 |
-
"batch_job_slot": 1,
|
| 324 |
-
"load_shed_fraction": 0.0
|
| 325 |
-
}'
|
| 326 |
-
|
| 327 |
-
# 4. Check final score after episode completes
|
| 328 |
-
curl http://localhost:7860/grade
|
| 329 |
-
```
|
| 330 |
|
| 331 |
---
|
| 332 |
|
| 333 |
-
##
|
| 334 |
-
|
| 335 |
-
The baseline is a **heuristic policy** (rule-based, no LLM) representing a reasonable but non-optimized control strategy. Your RL agent should aim to exceed these scores.
|
| 336 |
-
|
| 337 |
-
**Baseline Run:** `python inference.py --fast-mode --episodes 1`
|
| 338 |
-
|
| 339 |
-
### Summary Scores
|
| 340 |
-
|
| 341 |
-
| Task | Difficulty | Score | Model |
|
| 342 |
-
|------|:----------:|:-----:|-------|
|
| 343 |
-
| Task 1 — Cost Minimization | 🟢 Easy | **0.7063** | Heuristic |
|
| 344 |
-
| Task 2 — Temperature Control | 🟡 Medium | **0.6333** | Heuristic |
|
| 345 |
-
| Task 3 — Full Demand Response | 🔴 Hard | **0.5966** | Heuristic |
|
| 346 |
-
| **Overall Average** | — | **0.6454** | Heuristic |
|
| 347 |
|
| 348 |
-
|
| 349 |
|
| 350 |
-
|
| 351 |
-
-
|
| 352 |
-
|
| 353 |
-
|
| 354 |
-
|
| 355 |
-
|
| 356 |
-
|
| 357 |
-
|
| 358 |
-
|
| 359 |
-
|
| 360 |
-
|
| 361 |
-
- Temperature constraint: 0.531 (agent violated comfort bounds ~47% of the time)
|
| 362 |
-
- **Interpretation:** Temperature management is challenging for the heuristic. Tighter thermal control could improve this score significantly.
|
| 363 |
-
|
| 364 |
-
#### Task 3 Results (Most Interesting)
|
| 365 |
-
- **Task:** Multi-objective: cost, temperature, grid response, batch deadlines, carbon
|
| 366 |
-
- **Score:** 0.5966
|
| 367 |
-
- **Sub-scores:**
|
| 368 |
-
- Cost: 0.670
|
| 369 |
-
- Temperature: 0.573 (similar temperature control challenge as Task 2)
|
| 370 |
-
- **Grid response: 0.214** ← Heuristic rarely participates in demand-response
|
| 371 |
-
- Batch deadline: 1.000 (heuristic always completes jobs on time)
|
| 372 |
-
- Carbon: 0.657
|
| 373 |
-
|
| 374 |
-
**Key Insight:** The heuristic's low grid response score (0.21) suggests that learned agents have significant room for improvement by:
|
| 375 |
-
1. Recognizing high-price + high-stress periods
|
| 376 |
-
2. Proactively shedding load to reduce cost
|
| 377 |
-
3. Using thermal storage to recover comfort afterward
|
| 378 |
-
|
| 379 |
-
This multi-objective setting is where RL agents typically exceed heuristic baselines.
|
| 380 |
-
|
| 381 |
-
### Reproducibility & Evaluation
|
| 382 |
-
|
| 383 |
-
- **Deterministic:** Baseline scores are **deterministic** — same seed always produces identical actions and rewards
|
| 384 |
-
- **Seeding:** Each task uses a fixed base seed (1100, 1200, 1300) for reproducible evaluation
|
| 385 |
-
- **Your Submissions:** Your agent will be evaluated on the same seed distribution; compare your scores directly to baseline
|
| 386 |
|
| 387 |
---
|
| 388 |
|
| 389 |
-
##
|
| 390 |
|
| 391 |
-
|
| 392 |
-
┌─────────────────────────────────────────────────────────────────┐
|
| 393 |
-
│ inference.py (LLM Agent or Heuristic) │
|
| 394 |
-
│ │ │
|
| 395 |
-
│ │ HTTP: POST /reset, /step · GET /grade, /state │
|
| 396 |
-
│ ▼ │
|
| 397 |
-
│ ┌───────────────────────────────────────────────────────────┐ │
|
| 398 |
-
│ │ Docker Container │ │
|
| 399 |
-
│ │ │ │
|
| 400 |
-
│ │ ┌─────────────────────┐ ┌───────────────────────────┐ │ │
|
| 401 |
-
│ │ │ Go Environment │ │ Python Dashboard │ │ │
|
| 402 |
-
│ │ │ Server (:7860) │ │ FastAPI + UI (:7861) │ │ │
|
| 403 |
-
│ │ │ │ │ │ │ │
|
| 404 |
-
│ │ │ • Physics engine │ │ • Proxies /api → :7860 │ │ │
|
| 405 |
-
│ │ │ • Reward function │◄──│ • Real-time charts │ │ │
|
| 406 |
-
│ │ │ • Task graders │ │ • State visualization │ │ │
|
| 407 |
-
│ │ └─────────────────────┘ └───────────────────────────┘ │ │
|
| 408 |
-
│ │ │ │
|
| 409 |
-
│ │ Isolated · Reproducible · Non-root user │ │
|
| 410 |
-
│ └───────────────────────────────────────────────────────────┘ │
|
| 411 |
-
└─────────────────────────────────────────────────────────────────┘
|
| 412 |
-
```
|
| 413 |
|
| 414 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 415 |
|
| 416 |
-
|
| 417 |
-
gridmind/
|
| 418 |
-
├── inference.py ← Hackathon entrypoint (root)
|
| 419 |
-
├── openenv.yaml ← OpenEnv spec manifest
|
| 420 |
-
├── Dockerfile ← Multi-stage build (Go + Python)
|
| 421 |
-
├── .env ← API credentials (git-ignored)
|
| 422 |
-
├── baseline_scores.json ← Produced by inference.py
|
| 423 |
-
│
|
| 424 |
-
├── main.go ← HTTP server (routes, middleware, metrics)
|
| 425 |
-
├── env/ ← Core environment logic (Go)
|
| 426 |
-
│ ├── environment.go ← Simulation: physics, thermal dynamics
|
| 427 |
-
│ ├── models.go ← All data types (Observation, Action, etc.)
|
| 428 |
-
│ ├── rewards.go ← 7-component dense reward function
|
| 429 |
-
│ └── tasks.go ← 3 task definitions + deterministic graders
|
| 430 |
-
│
|
| 431 |
-
├── python/ ← Python support layer
|
| 432 |
-
│ ├── inference.py ← Full LLM agent + heuristic fallback
|
| 433 |
-
│ ├── models.py ← Typed Pydantic models (mirrors Go structs)
|
| 434 |
-
│ ├── validate.py ← OpenEnv spec validation suite
|
| 435 |
-
│ └── requirements.txt ← Python dependencies
|
| 436 |
-
│
|
| 437 |
-
├── tests/ ← Automated tests
|
| 438 |
-
│ ├── environment_test.go ← Go unit tests (determinism, bounds, etc.)
|
| 439 |
-
│ └── test_graders.py ← Python grader tests (pytest)
|
| 440 |
-
│
|
| 441 |
-
└── dashboard/ ← Optional web dashboard
|
| 442 |
-
├── server.py ← FastAPI server
|
| 443 |
-
└── static/ ← Frontend assets
|
| 444 |
-
```
|
| 445 |
|
| 446 |
---
|
| 447 |
|
| 448 |
-
##
|
| 449 |
-
|
| 450 |
-
| Action | Command |
|
| 451 |
-
|--------|---------|
|
| 452 |
-
| **Build** | `docker build -t gridmind-rl .` |
|
| 453 |
-
| **Run (foreground)** | `docker run --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
|
| 454 |
-
| **Run (background)** | `docker run --rm -d -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
|
| 455 |
-
| **Stop** | `docker stop gridmind` |
|
| 456 |
-
| **Run inference inside container** | `docker exec -it gridmind python /app/inference.py --fast-mode` |
|
| 457 |
-
|
| 458 |
-
The Dockerfile uses a **multi-stage build**:
|
| 459 |
-
1. **Stage 1** — Go 1.21 Alpine: compiles the environment server binary
|
| 460 |
-
2. **Stage 2** — Python 3.11 slim: runs the Go binary + Python dashboard via Supervisor
|
| 461 |
-
|
| 462 |
-
---
|
| 463 |
-
|
| 464 |
-
## ☁️ Hugging Face Space Deployment
|
| 465 |
-
|
| 466 |
-
### 1. Create a New Space
|
| 467 |
-
|
| 468 |
-
Go to [huggingface.co/new-space](https://huggingface.co/new-space):
|
| 469 |
-
- **SDK:** Docker
|
| 470 |
-
- **Hardware:** CPU Basic (2 vCPU, 16 GB — free tier)
|
| 471 |
-
|
| 472 |
-
### 2. Push to HF
|
| 473 |
|
| 474 |
-
```bash
|
| 475 |
-
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/gridmind-rl
|
| 476 |
-
git push hf main
|
| 477 |
```
|
| 478 |
-
|
| 479 |
-
|
| 480 |
-
|
| 481 |
-
|
| 482 |
-
|
| 483 |
-
|
| 484 |
-
|
| 485 |
-
|
| 486 |
-
|
| 487 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 488 |
```
|
| 489 |
|
| 490 |
-
> **Note:** HF Spaces exposes port **7860** publicly. The dashboard (7861) is for local development only.
|
| 491 |
-
|
| 492 |
---
|
| 493 |
|
| 494 |
-
##
|
| 495 |
|
| 496 |
-
###
|
| 497 |
|
| 498 |
```bash
|
| 499 |
-
|
| 500 |
-
go test ./tests/ -v
|
| 501 |
-
```
|
| 502 |
-
|
| 503 |
-
### Run Python Grader Tests (requires server running)
|
| 504 |
|
| 505 |
-
|
| 506 |
pytest tests/test_graders.py -v
|
| 507 |
```
|
| 508 |
|
| 509 |
-
###
|
| 510 |
|
| 511 |
```bash
|
| 512 |
-
python
|
| 513 |
-
```
|
| 514 |
-
|
| 515 |
-
---
|
| 516 |
-
|
| 517 |
-
## 📝 Inference Script Reference
|
| 518 |
-
|
| 519 |
-
The `inference.py` script at the project root is the **hackathon entrypoint**.
|
| 520 |
-
|
| 521 |
-
### Environment Variables
|
| 522 |
-
|
| 523 |
-
| Variable | Default | Description |
|
| 524 |
-
|----------|---------|-------------|
|
| 525 |
-
| `API_BASE_URL` | `https://openrouter.ai/api/v1` | LLM API endpoint |
|
| 526 |
-
| `MODEL_NAME` | `meta-llama/llama-3.1-8b-instruct:free` | Model to use |
|
| 527 |
-
| `OPENAI_API_KEY` | — | API key (any OpenAI-compatible provider) |
|
| 528 |
-
| `ENV_URL` | `http://localhost:7860` | Environment server URL |
|
| 529 |
-
|
| 530 |
-
### Command-Line Flags
|
| 531 |
-
|
| 532 |
-
| Flag | Default | Description |
|
| 533 |
-
|------|---------|-------------|
|
| 534 |
-
| `--episodes N` | 1 | Episodes per task (tasks 1–3 run in sequence) |
|
| 535 |
-
| `--fast-mode` | off | Use heuristic policy only (no LLM, fully reproducible) |
|
| 536 |
-
| `--llm-every N` | 4 | Reuse each LLM action for N steps (reduces API calls) |
|
| 537 |
-
| `--max-steps N` | 96 | Stop early after N steps |
|
| 538 |
-
| `--env-url URL` | from env | Override environment URL |
|
| 539 |
-
| `--output FILE` | `baseline_scores.json` | Output results file |
|
| 540 |
-
| `--verbose` | off | Print detailed step logs |
|
| 541 |
-
|
| 542 |
-
### Stdout Log Format
|
| 543 |
-
|
| 544 |
-
Each episode emits structured markers for automated evaluation:
|
| 545 |
-
|
| 546 |
-
```
|
| 547 |
-
[START]
|
| 548 |
-
[STEP1]
|
| 549 |
-
[STEP2]
|
| 550 |
-
...
|
| 551 |
-
[STEP96]
|
| 552 |
-
[END]
|
| 553 |
```
|
| 554 |
|
| 555 |
---
|
| 556 |
|
| 557 |
-
##
|
| 558 |
-
|
| 559 |
-
GridMind-RL fully implements the OpenEnv specification for standardized RL environments. All components are present and tested:
|
| 560 |
|
| 561 |
-
|
| 562 |
-
|-------------|:------:|-------|
|
| 563 |
-
| Manifest (`openenv.yaml`) | ✅ | All metadata, schema definitions, and version info |
|
| 564 |
-
| Observation Schema | ✅ | 11-field object: temperature, storage, price, grid signal, carbon, hour, batch queue, cost, step, building_id |
|
| 565 |
-
| Action Schema | ✅ | 5-field object: HVAC, thermal rate, batch slot, load shed, building_id |
|
| 566 |
-
| HTTP Endpoints | ✅ | `/reset`, `/step`, `/state`, `/grade`, `/replay`, `/tasks`, `/health`, `/metrics` |
|
| 567 |
-
| Determinism | ✅ | Seeded episode generation; identical seeds produce identical trajectories |
|
| 568 |
-
| Typed Models | ✅ | Pydantic models (Python) mirror Go structs exactly |
|
| 569 |
-
| Dense Rewards | ✅ | 7-component reward breakdown every step |
|
| 570 |
-
| Graders | ✅ | 3 tasks with programmatic, deterministic graders (0.0–1.0 range) |
|
| 571 |
-
| Exploit Detection | ✅ | Built into grading pipeline to flag unrealistic scores |
|
| 572 |
|
| 573 |
---
|
| 574 |
|
| 575 |
-
|
| 576 |
-
|
| 577 |
-
**Q: Can I use a different model?**
|
| 578 |
-
A: Yes. Set `MODEL_NAME` environment variable to any OpenAI-compatible model. The default (`meta-llama/llama-3.3-70b-instruct:free`) is free on OpenRouter with no credit card.
|
| 579 |
-
|
| 580 |
-
**Q: How do I avoid rate limiting?**
|
| 581 |
-
A: (1) Use `--fast-mode` for local testing (no API calls), (2) Set `--llm-every 4` to reuse decisions, (3) Use a paid API tier for submission, or (4) Train & submit an offline policy.
|
| 582 |
-
|
| 583 |
-
**Q: Will my API key be exposed in submissions?**
|
| 584 |
-
A: No. Store your API key in `.env` (git-ignored). On HF Spaces, set secrets via the Space settings UI; keys are never committed to the repo.
|
| 585 |
-
|
| 586 |
-
**Q: What's the difference between `HF_TOKEN` and `OPENAI_API_KEY`?**
|
| 587 |
-
A: `HF_TOKEN` is used in HF Space deployments and external evaluations. `OPENAI_API_KEY` is a fallback for local development. The code tries `HF_TOKEN` first, then `OPENAI_API_KEY`. At least one must be set.
|
| 588 |
-
|
| 589 |
-
**Q: Can I submit an offline/trained policy?**
|
| 590 |
-
A: Yes. Modify `python/inference.py` to use your trained agent instead of LLM calls. Ensure you still output the required `[START]`, `[STEP]`, `[END]` format.
|
| 591 |
-
|
| 592 |
-
**Q: What if my submission times out?**
|
| 593 |
-
A: Each episode is 96 steps. The environment runs 3 episodes (one per task). Optimize for latency: reduce LLM calls (use `--llm-every`), use a faster model, or submit a heuristic/trained offline policy.
|
| 594 |
-
|
| 595 |
-
---
|
| 596 |
-
|
| 597 |
-
## 🎯 Submission Checklist
|
| 598 |
-
|
| 599 |
-
Before submitting, verify:
|
| 600 |
-
|
| 601 |
-
- [ ] Clone repo, build Docker, run `docker run -p 7860:7860 -p 7861:7861 gridmind-rl`
|
| 602 |
-
- [ ] Run `python inference.py --fast-mode --episodes 1` locally — should produce `baseline_scores.json`
|
| 603 |
-
- [ ] Check `[START]`, `[STEP]`, `[END]` markers in stdout
|
| 604 |
-
- [ ] Set `HF_TOKEN` or `OPENAI_API_KEY` in `.env` for LLM runs
|
| 605 |
-
- [ ] Test with LLM: `python inference.py --episodes 1`
|
| 606 |
-
- [ ] Verify Dockerfile builds without errors: `docker build -t gridmind-rl .`
|
| 607 |
-
- [ ] Create HF Space (Docker SDK, CPU Basic)
|
| 608 |
-
- [ ] Push repo to HF Space: `git push hf main`
|
| 609 |
-
- [ ] Set secrets in HF Space UI: `HF_TOKEN`, `API_BASE_URL` (optional), `MODEL_NAME` (optional)
|
| 610 |
-
- [ ] Verify Space is running: `curl https://YOUR_USERNAME-gridmind-rl.hf.space/health`
|
| 611 |
-
- [ ] Submit Space URL to hackathon organizers
|
| 612 |
-
|
| 613 |
-
---
|
| 614 |
-
|
| 615 |
-
## 📚 Additional Resources
|
| 616 |
-
|
| 617 |
-
- **OpenEnv Spec:** https://github.com/meta-pytorch/OpenEnv
|
| 618 |
-
- **OpenRouter Free Models:** https://openrouter.ai/keys
|
| 619 |
-
- **HF Spaces Docs:** https://huggingface.co/docs/hub/spaces
|
| 620 |
-
- **GridMind Repository:** https://github.com/LO-Kyu/gridmind-rl
|
| 621 |
-
|
| 622 |
-
---
|
| 623 |
-
|
| 624 |
-
## 📄 License
|
| 625 |
-
|
| 626 |
-
See `LICENSE` in the repository.
|
|
|
|
| 1 |
+
# GridMind-RL
|
| 2 |
|
| 3 |
+
**Industrial building energy management reinforcement learning environment**
|
| 4 |
|
| 5 |
+
[](https://openenv.org/)
|
| 6 |
+
[](https://golang.org/)
|
| 7 |
+
[](https://www.python.org/)
|
| 8 |
+
[](https://www.docker.com/)
|
| 9 |
+
[](LICENSE)
|
| 10 |
|
| 11 |
---
|
| 12 |
|
| 13 |
+
## Overview
|
| 14 |
|
| 15 |
+
GridMind-RL is a reinforcement learning environment for training and evaluating intelligent control policies in industrial building energy management. The environment simulates realistic HVAC control, thermal storage management, batch job scheduling, and demand response scenarios under stochastic electricity pricing and grid stress events.
|
| 16 |
|
| 17 |
+
**Key challenges solved by the environment:**
|
| 18 |
+
- **Cost minimization**: Navigate complex electricity pricing curves across 24-hour periods
|
| 19 |
+
- **Comfort maintenance**: Keep indoor temperature within comfort bounds while optimizing cost
|
| 20 |
+
- **Grid responsiveness**: Respond to grid stress signals with intelligent load shedding
|
| 21 |
+
- **Carbon reduction**: Minimize grid carbon intensity through demand response
|
| 22 |
+
- **Batch scheduling**: Schedule compute-intensive batch jobs optimally
|
| 23 |
+
- **Storage management**: Efficiently use thermal storage for load shifting
|
| 24 |
|
| 25 |
+
This environment is ideal for training deep reinforcement learning agents, testing heuristic policies, and benchmarking control algorithms. It provides dense reward signals enabling efficient policy learning.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
---
|
| 28 |
|
| 29 |
+
## Architecture
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
GridMind-RL consists of three tightly integrated components:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
+
```
|
| 34 |
+
Agent (python/inference.py)
|
| 35 |
+
?? HTTP POST /step, /reset, /grade
|
| 36 |
+
?
|
| 37 |
+
Go Environment Server (main.go) � Port 7860
|
| 38 |
+
?
|
| 39 |
+
Physics Engine (env/environment.go) + Rewards (env/rewards.go) + Tasks (env/tasks.go)
|
| 40 |
+
?
|
| 41 |
+
Web Dashboard (dashboard/server.py) � Port 7861
|
| 42 |
+
```
|
| 43 |
|
| 44 |
+
**Design philosophy:**
|
| 45 |
+
- **Separation of concerns**: Physics engine (Go) decoupled from policy layer (Python)
|
| 46 |
+
- **OpenEnv compliance**: Standardized REST API enables any language agent
|
| 47 |
+
- **Deterministic simulation**: Seeded RNG for reproducible experiments
|
| 48 |
+
- **Dense rewards**: 7-component reward for effective learning
|
| 49 |
|
| 50 |
---
|
| 51 |
|
| 52 |
+
## Environment Specification
|
| 53 |
|
| 54 |
+
### Observation Space (11 fields)
|
| 55 |
|
| 56 |
+
| Field | Type | Range | Description |
|
| 57 |
+
|-------|------|-------|-------------|
|
| 58 |
+
| `indoor_temperature` | float | [15-27] �C | Building indoor temperature |
|
| 59 |
+
| `thermal_storage_level` | float | [0-1] | Thermal storage charge (0=empty, 1=full) |
|
| 60 |
+
| `process_demand` | float | [5-50] kW | Baseline demand |
|
| 61 |
+
| `current_price` | float | [0.03-0.25] $/kWh | Electricity price |
|
| 62 |
+
| `grid_stress_signal` | float | [0-1] | Grid stress (>0.7 = critical) |
|
| 63 |
+
| `carbon_intensity` | float | [50-800] gCO2/kWh | Grid carbon intensity |
|
| 64 |
+
| `hour_of_day` | int | [0-23] | Time of day |
|
| 65 |
+
| `batch_queue` | list | Up to 10 items | Batch job deadlines |
|
| 66 |
+
| `cumulative_cost` | float | [0-1000] $ | Total cost this episode |
|
| 67 |
+
| `step` | int | [0-95] | Current step (96 steps = 24 hours) |
|
| 68 |
+
| `building_id` | int | {0} | Building identifier |
|
| 69 |
|
| 70 |
+
### Action Space (5 fields)
|
| 71 |
|
| 72 |
+
| Field | Type | Range | Description |
|
| 73 |
+
|-------|------|-------|-------------|
|
| 74 |
+
| `hvac_power_level` | float | [0-1] | HVAC power (0=off, 1=max) |
|
| 75 |
+
| `thermal_charge_rate` | float | [-1-1] | Storage charge/discharge rate |
|
| 76 |
+
| `batch_job_slot` | int | [0-4] | Batch job scheduling slot |
|
| 77 |
+
| `load_shed_fraction` | float | [0-0.5] | Load shedding fraction |
|
| 78 |
+
| `building_id` | int | {0} | Building identifier |
|
| 79 |
|
| 80 |
+
### Reward Function (7 Components)
|
|
|
|
| 81 |
|
| 82 |
+
| Component | Description |
|
| 83 |
+
|-----------|-------------|
|
| 84 |
+
| **Cost Savings** | Negative cost per energy consumed |
|
| 85 |
+
| **Temperature Constraint** | Penalty if T outside [19-23]�C |
|
| 86 |
+
| **Grid Response** | Bonus for load shedding during stress |
|
| 87 |
+
| **Deadline Penalty** | Penalty for missed batch deadlines |
|
| 88 |
+
| **Efficiency Bonus** | Bonus for off-peak charging |
|
| 89 |
+
| **Stability Penalty** | Penalty for rapid control changes |
|
| 90 |
+
| **Carbon Reward** | Bonus for low-carbon periods |
|
| 91 |
|
| 92 |
---
|
| 93 |
|
| 94 |
+
## Tasks
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
+
| Task | Difficulty | Objective | Baseline Score |
|
| 97 |
+
|------|-----------|-----------|-----------------|
|
| 98 |
+
| Task 1 | Easy | Minimize cost only | **0.708** |
|
| 99 |
+
| Task 2 | Medium | Minimize cost + maintain comfort | **0.633** |
|
| 100 |
+
| Task 3 | Hard | Full demand response + scheduling | **0.598** |
|
| 101 |
|
| 102 |
+
**Task 1 (Easy)**: Cost minimization, no constraints
|
| 103 |
+
**Task 2 (Medium)**: Cost + temperature comfort (19-23�C)
|
| 104 |
+
**Task 3 (Hard)**: Cost + comfort + grid response + batch scheduling + carbon
|
| 105 |
|
| 106 |
---
|
| 107 |
|
| 108 |
+
## Quickstart
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
+
### Docker (Recommended)
|
| 111 |
|
| 112 |
```bash
|
| 113 |
docker build -t gridmind-rl .
|
| 114 |
+
docker run -p 7860:7860 -p 7861:7861 gridmind-rl
|
| 115 |
```
|
| 116 |
|
| 117 |
+
### Local Development
|
| 118 |
|
| 119 |
+
**Terminal 1: Start Go server**
|
| 120 |
```bash
|
| 121 |
+
go run main.go
|
|
|
|
|
|
|
| 122 |
```
|
| 123 |
|
| 124 |
+
**Terminal 2: Run agent**
|
|
|
|
|
|
|
|
|
|
| 125 |
```bash
|
| 126 |
+
export HF_TOKEN="your_api_key"
|
| 127 |
+
export API_BASE_URL="https://openrouter.ai/api/v1"
|
| 128 |
+
export MODEL_NAME="meta-llama/llama-3.3-70b-instruct:free"
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
+
# Heuristic policy (no LLM)
|
| 131 |
python inference.py --fast-mode --episodes 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
+
# LLM agent
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
python inference.py --episodes 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
```
|
| 136 |
|
| 137 |
+
### Environment Variables
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
|
| 139 |
+
| Variable | Required | Default | Description |
|
| 140 |
+
|----------|----------|---------|-------------|
|
| 141 |
+
| `HF_TOKEN` | Yes | � | LLM API key |
|
| 142 |
+
| `API_BASE_URL` | No | `https://openrouter.ai/api/v1` | LLM endpoint |
|
| 143 |
+
| `MODEL_NAME` | No | `meta-llama/llama-3.3-70b-instruct:free` | Model ID |
|
| 144 |
+
| `ENV_URL` | No | `http://localhost:7860` | Environment server URL |
|
| 145 |
+
| `OPENAI_API_KEY` | No | � | Alternative to HF_TOKEN |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
|
| 147 |
---
|
| 148 |
|
| 149 |
+
## API Reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
|
| 151 |
+
All endpoints on port 7860 (OpenEnv standard).
|
| 152 |
|
| 153 |
+
| Method | Endpoint | Description |
|
| 154 |
+
|--------|----------|-------------|
|
| 155 |
+
| `GET` | `/health` | Health check |
|
| 156 |
+
| `GET` | `/ping` | Liveness probe |
|
| 157 |
+
| `POST` | `/reset` | Start new episode |
|
| 158 |
+
| `POST` | `/step` | Take action step |
|
| 159 |
+
| `GET` | `/state` | Get current state |
|
| 160 |
+
| `GET` | `/grade` | Grade episode (0.0-1.0 score) |
|
| 161 |
+
| `GET` | `/tasks` | Available tasks |
|
| 162 |
+
| `GET` | `/metrics` | System metrics |
|
| 163 |
+
| `GET` | `/replay` | Episode history |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
|
| 165 |
---
|
| 166 |
|
| 167 |
+
## Baseline Performance
|
| 168 |
|
| 169 |
+
Reference heuristic policy scores (rule-based, deterministic):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
|
| 171 |
+
| Task | Score | Policy |
|
| 172 |
+
|------|-------|--------|
|
| 173 |
+
| Task 1 | 0.708 | Simple load-shifting heuristic |
|
| 174 |
+
| Task 2 | 0.633 | Temperature-aware heuristic |
|
| 175 |
+
| Task 3 | 0.598 | Full demand response heuristic |
|
| 176 |
|
| 177 |
+
LLM and RL agents are expected to exceed these scores.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
|
| 179 |
---
|
| 180 |
|
| 181 |
+
## Project Structure
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 182 |
|
|
|
|
|
|
|
|
|
|
| 183 |
```
|
| 184 |
+
gridmind-rl/
|
| 185 |
+
+-- main.go # HTTP server & OpenEnv API
|
| 186 |
+
+-- inference.py # Agent entry point
|
| 187 |
+
+-- openenv.yaml # OpenEnv spec
|
| 188 |
+
+-- Dockerfile # Container build
|
| 189 |
+
+-- env/
|
| 190 |
+
� +-- environment.go # Physics simulation
|
| 191 |
+
� +-- models.go # Data models
|
| 192 |
+
� +-- rewards.go # Reward computation
|
| 193 |
+
� +-- tasks.go # Task grading
|
| 194 |
+
+-- python/
|
| 195 |
+
� +-- inference.py # LLM agent
|
| 196 |
+
� +-- models.py # Pydantic models
|
| 197 |
+
� +-- requirements.txt
|
| 198 |
+
+-- dashboard/
|
| 199 |
+
� +-- server.py # Web server (port 7861)
|
| 200 |
+
� +-- static/ # Frontend assets
|
| 201 |
+
+-- data/
|
| 202 |
+
� +-- price_curves.json # Price data
|
| 203 |
+
� +-- generate_prices.py # Price generator
|
| 204 |
+
+-- tests/
|
| 205 |
+
� +-- test_graders.py # Python tests
|
| 206 |
+
� +-- environment_test.go # Go tests
|
| 207 |
+
+-- baseline_scores.json # Reference scores
|
| 208 |
+
+-- .env.example # Environment template
|
| 209 |
+
+-- LICENSE # MIT License
|
| 210 |
```
|
| 211 |
|
|
|
|
|
|
|
| 212 |
---
|
| 213 |
|
| 214 |
+
## Development
|
| 215 |
|
| 216 |
+
### Running Tests
|
| 217 |
|
| 218 |
```bash
|
| 219 |
+
# Go tests
|
| 220 |
+
go test ./tests/... -v
|
|
|
|
|
|
|
|
|
|
| 221 |
|
| 222 |
+
# Python tests (requires server running on 7860)
|
| 223 |
pytest tests/test_graders.py -v
|
| 224 |
```
|
| 225 |
|
| 226 |
+
### Rebuilding Price Data
|
| 227 |
|
| 228 |
```bash
|
| 229 |
+
python data/generate_prices.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 230 |
```
|
| 231 |
|
| 232 |
---
|
| 233 |
|
| 234 |
+
## License
|
|
|
|
|
|
|
| 235 |
|
| 236 |
+
MIT License. See [LICENSE](LICENSE) file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 237 |
|
| 238 |
---
|
| 239 |
|
| 240 |
+
**Questions?** Open an issue on GitHub.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
__pycache__/inference.cpython-311.pyc
DELETED
|
Binary file (691 Bytes)
|
|
|
dashboard/__pycache__/server.cpython-311.pyc
DELETED
|
Binary file (5.56 kB)
|
|
|
dashboard/__pycache__/server.cpython-314.pyc
DELETED
|
Binary file (5.09 kB)
|
|
|
data/generate_prices.py
CHANGED
|
@@ -1,4 +1,9 @@
|
|
| 1 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
import json
|
| 3 |
import math
|
| 4 |
import random
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
One-time script used to generate data/price_curves.json
|
| 3 |
+
Generate 30 days of realistic ISO New England-style hourly price data.
|
| 4 |
+
Run: python data/generate_prices.py
|
| 5 |
+
Output: data/price_curves.json
|
| 6 |
+
"""
|
| 7 |
import json
|
| 8 |
import math
|
| 9 |
import random
|
gridmind_rl.egg-info/PKG-INFO
DELETED
|
@@ -1,667 +0,0 @@
|
|
| 1 |
-
Metadata-Version: 2.4
|
| 2 |
-
Name: gridmind-rl
|
| 3 |
-
Version: 1.0.0
|
| 4 |
-
Summary: GridMind-RL: Industrial Load-Shaping and Demand-Response RL Environment. Control HVAC, thermal storage, and batch job scheduling under stochastic electricity prices and grid stress events.
|
| 5 |
-
Author: LOKyu Team
|
| 6 |
-
License: MIT
|
| 7 |
-
Project-URL: Homepage, https://github.com/meta-pytorch/OpenEnv
|
| 8 |
-
Project-URL: Repository, https://github.com/meta-pytorch/OpenEnv
|
| 9 |
-
Project-URL: Documentation, https://github.com/meta-pytorch/OpenEnv
|
| 10 |
-
Keywords: reinforcement-learning,openenv,energy-management,demand-response
|
| 11 |
-
Classifier: Development Status :: 4 - Beta
|
| 12 |
-
Classifier: Environment :: GPU
|
| 13 |
-
Classifier: Intended Audience :: Science/Research
|
| 14 |
-
Classifier: License :: OSI Approved :: MIT License
|
| 15 |
-
Classifier: Natural Language :: English
|
| 16 |
-
Classifier: Operating System :: OS Independent
|
| 17 |
-
Classifier: Programming Language :: Python :: 3
|
| 18 |
-
Classifier: Programming Language :: Python :: 3.9
|
| 19 |
-
Classifier: Programming Language :: Python :: 3.10
|
| 20 |
-
Classifier: Programming Language :: Python :: 3.11
|
| 21 |
-
Classifier: Programming Language :: Python :: 3.12
|
| 22 |
-
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
| 23 |
-
Requires-Python: >=3.9
|
| 24 |
-
Description-Content-Type: text/markdown
|
| 25 |
-
License-File: LICENSE
|
| 26 |
-
Requires-Dist: openai>=1.0.0
|
| 27 |
-
Requires-Dist: openenv-core>=0.2.0
|
| 28 |
-
Requires-Dist: fastapi>=0.100.0
|
| 29 |
-
Requires-Dist: uvicorn>=0.23.0
|
| 30 |
-
Requires-Dist: pydantic>=2.0.0
|
| 31 |
-
Requires-Dist: requests>=2.31.0
|
| 32 |
-
Requires-Dist: httpx>=0.24.0
|
| 33 |
-
Requires-Dist: pytest>=7.0.0
|
| 34 |
-
Requires-Dist: python-dotenv>=1.0.0
|
| 35 |
-
Provides-Extra: dev
|
| 36 |
-
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
| 37 |
-
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
| 38 |
-
Requires-Dist: black>=23.0.0; extra == "dev"
|
| 39 |
-
Requires-Dist: ruff>=0.1.0; extra == "dev"
|
| 40 |
-
Dynamic: license-file
|
| 41 |
-
|
| 42 |
-
# 🏢 GridMind-RL — Energy Management Reinforcement Learning Environment
|
| 43 |
-
|
| 44 |
-
**A real-world RL environment for intelligent building energy optimization.** Control HVAC systems, thermal storage, batch job scheduling, and demand-response under stochastic electricity prices and grid stress events.
|
| 45 |
-
|
| 46 |
-
Built on the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) specification. Containerized. Ready for Hugging Face Spaces deployment.
|
| 47 |
-
|
| 48 |
-
---
|
| 49 |
-
|
| 50 |
-
## 📖 Overview & Motivation
|
| 51 |
-
|
| 52 |
-
Building energy management is a **real-world optimization problem** facing utilities, facility operators, and industrial sites globally. Traditional rule-based controls waste billions in energy costs and miss opportunities for grid participation.
|
| 53 |
-
|
| 54 |
-
**GridMind-RL** simulates decisions that facility operators must make daily:
|
| 55 |
-
|
| 56 |
-
- **Cost Optimization** — Buy electricity when prices are low, avoid peak surcharges
|
| 57 |
-
- **Comfort & Safety** — Maintain indoor temperature within acceptable ranges while managing thermal inertia
|
| 58 |
-
- **Grid Participation** — Respond to demand-response signals and grid stress events
|
| 59 |
-
- **Batch Scheduling** — Coordinate industrial process timings to meet deadlines and minimize energy cost
|
| 60 |
-
- **Carbon Minimization** — Shift consumption to periods when grid carbon intensity is low
|
| 61 |
-
|
| 62 |
-
**Why this matters:** An RL agent trained in this environment can learn strategies that would be difficult or impossible for humans to hand-craft. The combination of continuous control (HVAC power, thermal storage), discrete decisions (batch scheduling), and multiple simultaneous objectives (cost, comfort, grid, deadlines, carbon) creates a realistic, challenging benchmark.
|
| 63 |
-
|
| 64 |
-
**Episode Length:** 96 steps = 24 hours at 15-minute resolution. A complete episode requires strategic decision-making across a full day-night cycle.
|
| 65 |
-
|
| 66 |
-
---
|
| 67 |
-
|
| 68 |
-
## � Observation Space
|
| 69 |
-
|
| 70 |
-
At each timestep, the environment provides the following observations. **Episode length: 96 steps** (15-minute intervals = 24 hours).
|
| 71 |
-
|
| 72 |
-
| Field | Data Type | Range / Values | Description |
|
| 73 |
-
|-------|-----------|-----------------|-------------|
|
| 74 |
-
| `indoor_temperature` | float | 10–40 °C | Current building interior temperature |
|
| 75 |
-
| `thermal_storage_level` | float | 0.0–1.0 | Thermal tank charge state (0 = empty, 1 = full) |
|
| 76 |
-
| `process_demand` | float | ≥ 0 kW | Current industrial batch process power draw |
|
| 77 |
-
| `current_price` | float | > 0 $/kWh | Real-time spot electricity price |
|
| 78 |
-
| `grid_stress_signal` | float | 0.0–1.0 | Utility demand-response urgency (0.7+ = critical) |
|
| 79 |
-
| `carbon_intensity` | float | ≥ 0 gCO₂/kWh | Current grid carbon intensity |
|
| 80 |
-
| `hour_of_day` | int | 0–23 | Time-of-day context |
|
| 81 |
-
| `batch_queue` | int array | — | Pending batch jobs with deadline slots |
|
| 82 |
-
| `cumulative_cost` | float | ≥ 0 $ | Energy cost accumulated in current episode so far |
|
| 83 |
-
| `step` | int | 0–95 | Current timestep (96 total = 24 hours) |
|
| 84 |
-
| `building_id` | int | 0+ | Building identifier (for multi-building scenarios) |
|
| 85 |
-
|
| 86 |
-
**Observation Properties:**
|
| 87 |
-
- Observations are **deterministic** given the seed — same seed produces identical sequences
|
| 88 |
-
- All fields are **normalized or bounded** for stable learning
|
| 89 |
-
- Prices follow realistic time-of-use patterns; carbon intensity varies with grid mix
|
| 90 |
-
- Batch queue starts empty; jobs appear stochastically based on the task/seed
|
| 91 |
-
|
| 92 |
-
---
|
| 93 |
-
|
| 94 |
-
## 🎮 Action Space
|
| 95 |
-
|
| 96 |
-
At each step, the agent sends an action controlling four independent subsystems:
|
| 97 |
-
|
| 98 |
-
| Field | Data Type | Range | Description |
|
| 99 |
-
|-------|-----------|-------|-------------|
|
| 100 |
-
| `hvac_power_level` | float | 0.0–1.0 | HVAC system power (0 = off, 1 = full) |
|
| 101 |
-
| `thermal_charge_rate` | float | -1.0–1.0 | Thermal storage control (+charge, -discharge) |
|
| 102 |
-
| `batch_job_slot` | int | 0–4 | Schedule next batch job: 0=immediate, 1–4=defer |
|
| 103 |
-
| `load_shed_fraction` | float | 0.0–0.5 | Non-critical load reduction (0–50%) for demand-response |
|
| 104 |
-
| `building_id` | int | 0+ | Building identifier (routing) |
|
| 105 |
-
|
| 106 |
-
**Action Space Properties:**
|
| 107 |
-
- **Continuous** (HVAC, thermal charging, load shedding) + **discrete** (batch scheduling) → hybrid control
|
| 108 |
-
- Actions are applied every 15-minute step
|
| 109 |
-
- Load shedding is capped at 50% to ensure safety/habitability
|
| 110 |
-
- Batch scheduling decisions affect energy cost and deadline compliance
|
| 111 |
-
|
| 112 |
-
---
|
| 113 |
-
|
| 114 |
-
## 💡 Reward Function
|
| 115 |
-
|
| 116 |
-
The environment provides **dense rewards every step** (not sparse, not binary). Each step returns:
|
| 117 |
-
- A scalar reward (sum of components)
|
| 118 |
-
- A dictionary of 7 weighted sub-components for transparency
|
| 119 |
-
|
| 120 |
-
| Component | Purpose | Possible Values |
|
| 121 |
-
|-----------|---------|-----------------|
|
| 122 |
-
| **cost_savings** | Minimize energy bill | Negative (cost increases) to positive (savings vs baseline) |
|
| 123 |
-
| **temp_constraint** | Maintain comfort | Gaussian bonus near 21°C, penalty outside 19–23°C bounds |
|
| 124 |
-
| **grid_response** | Shift load during stress | Bonus proportional to shed fraction when grid signal > 0.7 |
|
| 125 |
-
| **efficiency_bonus** | Exploit thermal storage | Reward charge/discharge timing and thermal arbitrage |
|
| 126 |
-
| **stability_penalty** | Smooth control | Small penalty for rapid oscillations in HVAC/storage |
|
| 127 |
-
| **deadline_penalty** | Meet job deadlines | Large penalty if batch job finishes after deadline |
|
| 128 |
-
| **carbon_reward** | Low-carbon consumption | Bonus for consuming during low-carbon grid periods |
|
| 129 |
-
|
| 130 |
-
**Example Reward Calculation:**
|
| 131 |
-
If an agent takes a well-timed action during high-price, high-stress period:
|
| 132 |
-
- Large positive `cost_savings` (avoided expensive hour)
|
| 133 |
-
- Positive `grid_response` (shed load successfully)
|
| 134 |
-
- Possible positive `carbon_reward` (if grid is clean)
|
| 135 |
-
- **Total step reward** = weighted sum of all components
|
| 136 |
-
|
| 137 |
-
This multi-objective reward structure encourages **learning tradeoffs** between cost, comfort, grid support, and carbon efficiency.
|
| 138 |
-
|
| 139 |
-
---
|
| 140 |
-
|
| 141 |
-
---
|
| 142 |
-
|
| 143 |
-
## 📋 Tasks & Difficulty Levels
|
| 144 |
-
|
| 145 |
-
Three independent tasks with **deterministic programmatic graders**. Scores range **0.0–1.0**; higher is better.
|
| 146 |
-
|
| 147 |
-
### Task 1 — Cost Minimization (🟢 Easy)
|
| 148 |
-
|
| 149 |
-
**Objective:** Minimize total energy cost in 24 hours with no other constraints.
|
| 150 |
-
|
| 151 |
-
**Difficulty Rationale:** Only one objective (cost) to optimize; temperature and grid constraints are relaxed.
|
| 152 |
-
|
| 153 |
-
**Grader Metrics:**
|
| 154 |
-
- **Cost score (100%)** — Compares total episode energy cost to a deterministic baseline. Higher savings → higher score.
|
| 155 |
-
|
| 156 |
-
**Baseline Score:** **0.7063**
|
| 157 |
-
|
| 158 |
-
---
|
| 159 |
-
|
| 160 |
-
### Task 2 — Constrained Temperature Control (🟡 Medium)
|
| 161 |
-
|
| 162 |
-
**Objective:** Minimize cost while maintaining indoor temperature between **19–23°C** throughout the episode.
|
| 163 |
-
|
| 164 |
-
**Difficulty Rationale:** Introduces a hard constraint (temperature bounds). Agent must use thermal storage strategically to meet both cost and comfort goals.
|
| 165 |
-
|
| 166 |
-
**Grader Metrics:**
|
| 167 |
-
- **Cost score (60%)** — Total energy cost vs baseline
|
| 168 |
-
- **Temperature score (40%)** — Fraction of steps within bounds (hard penalty for violations)
|
| 169 |
-
|
| 170 |
-
**Notes:** A naive agent might achieve low cost by disabling HVAC, but then temperatures drift out of bounds (0 score). Trade-off learning is required.
|
| 171 |
-
|
| 172 |
-
**Baseline Score:** **0.6333**
|
| 173 |
-
|
| 174 |
-
---
|
| 175 |
-
|
| 176 |
-
### Task 3 — Full Demand Response (🔴 Hard)
|
| 177 |
-
|
| 178 |
-
**Objective:** Minimize cost, maintain temperature, respond to grid events, complete batch jobs on time, and minimize carbon emissions. This is a **multi-objective constraint satisfaction** problem.
|
| 179 |
-
|
| 180 |
-
**Difficulty Rationale:** Most realistic. Agent must balance five competing objectives simultaneously; any single failure is costly.
|
| 181 |
-
|
| 182 |
-
**Grader Metrics:**
|
| 183 |
-
- **Cost score (28%)** — Energy cost
|
| 184 |
-
- **Temperature score (20%)** — Time within comfort bounds
|
| 185 |
-
- **Grid response score (20%)** — Load shed during demand-response events (signal > 0.7)
|
| 186 |
-
- **Batch deadline score (12%)** — Fraction of jobs completed before deadline
|
| 187 |
-
- **Carbon reward score (20%)** — Shift load to low-carbon periods
|
| 188 |
-
|
| 189 |
-
**Baseline Breakdown:**
|
| 190 |
-
- Cost: 0.670, Temperature: 0.573, Grid: 0.214, Batch: 1.000, Carbon: 0.657
|
| 191 |
-
- **Overall: 0.5966**
|
| 192 |
-
|
| 193 |
-
**Challenge:** Grid response score (~0.21) shows that the baseline heuristic rarely sheds load opportunistically. Learning agents should discover that quick load shedding during high-price, high-stress periods yields significant cost savings.
|
| 194 |
-
|
| 195 |
-
**Grader Determinism:** Same seed always produces identical evaluations. Episodes are seeded internally; reproducible batches of evaluations can be generated for benchmark comparisons.
|
| 196 |
-
|
| 197 |
-
---
|
| 198 |
-
|
| 199 |
-
## 🚀 Setup & Usage
|
| 200 |
-
|
| 201 |
-
### Prerequisites
|
| 202 |
-
|
| 203 |
-
- **Docker** — [Download Docker Desktop](https://www.docker.com/products/docker-desktop/)
|
| 204 |
-
- **Python 3.10+** — [Download Python](https://www.python.org/downloads/)
|
| 205 |
-
- **Git** — [Download Git](https://git-scm.com/downloads)
|
| 206 |
-
|
| 207 |
-
### Quick Start (5 minutes)
|
| 208 |
-
|
| 209 |
-
#### 1. Clone the Repository
|
| 210 |
-
|
| 211 |
-
```bash
|
| 212 |
-
git clone https://github.com/LO-Kyu/gridmind-rl.git
|
| 213 |
-
cd gridmind-rl
|
| 214 |
-
```
|
| 215 |
-
|
| 216 |
-
#### 2. Build and Start the Environment Server
|
| 217 |
-
|
| 218 |
-
```bash
|
| 219 |
-
docker build -t gridmind-rl .
|
| 220 |
-
docker run --rm -d -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl
|
| 221 |
-
```
|
| 222 |
-
|
| 223 |
-
Verify the server is running:
|
| 224 |
-
|
| 225 |
-
```bash
|
| 226 |
-
# Check health endpoint
|
| 227 |
-
curl http://localhost:7860/health
|
| 228 |
-
# Expected: {"status":"ok","version":"1.0.0"}
|
| 229 |
-
```
|
| 230 |
-
|
| 231 |
-
#### 3. Install Python Dependencies
|
| 232 |
-
|
| 233 |
-
Open a **new terminal** and install:
|
| 234 |
-
|
| 235 |
-
```bash
|
| 236 |
-
pip install -r python/requirements.txt
|
| 237 |
-
```
|
| 238 |
-
|
| 239 |
-
#### 4. Run Inference (No LLM — Fast)
|
| 240 |
-
|
| 241 |
-
Run a fast, deterministic baseline using heuristic policy:
|
| 242 |
-
|
| 243 |
-
```bash
|
| 244 |
-
python inference.py --fast-mode --episodes 1
|
| 245 |
-
```
|
| 246 |
-
|
| 247 |
-
Expected output (sample):
|
| 248 |
-
```
|
| 249 |
-
[START] task=Cost_Minimization env=gridmind model=heuristic
|
| 250 |
-
[STEP1] step=1 action={...} reward=10.5 done=false
|
| 251 |
-
[STEP2] step=2 action={...} reward=12.3 done=false
|
| 252 |
-
...
|
| 253 |
-
[STEP96] step=96 action={...} reward=8.9 done=true
|
| 254 |
-
[END] success=true steps=96 rewards=[10.5, 12.3, ..., 8.9]
|
| 255 |
-
```
|
| 256 |
-
|
| 257 |
-
Results saved to: `baseline_scores.json`
|
| 258 |
-
|
| 259 |
-
#### 5. (Optional) Run with LLM
|
| 260 |
-
|
| 261 |
-
To use an LLM agent for decision-making:
|
| 262 |
-
|
| 263 |
-
1. Get a **free API key** from [openrouter.ai/keys](https://openrouter.ai/keys) (no credit card needed)
|
| 264 |
-
2. Create `.env` file (copy from `.env.example`):
|
| 265 |
-
```bash
|
| 266 |
-
cp .env.example .env
|
| 267 |
-
```
|
| 268 |
-
3. Edit `.env` and add your API key:
|
| 269 |
-
```env
|
| 270 |
-
HF_TOKEN=sk-or-v1-your-key-here
|
| 271 |
-
# or
|
| 272 |
-
OPENAI_API_KEY=sk-or-v1-your-key-here
|
| 273 |
-
```
|
| 274 |
-
4. Run with LLM:
|
| 275 |
-
```bash
|
| 276 |
-
python inference.py --episodes 1
|
| 277 |
-
```
|
| 278 |
-
|
| 279 |
-
#### 6. Stop the Server (When Done)
|
| 280 |
-
|
| 281 |
-
```bash
|
| 282 |
-
docker stop gridmind
|
| 283 |
-
```
|
| 284 |
-
|
| 285 |
-
---
|
| 286 |
-
|
| 287 |
-
### Inference Script Reference
|
| 288 |
-
|
| 289 |
-
The `inference.py` script (project root) is the **hackathon submission entrypoint**.
|
| 290 |
-
|
| 291 |
-
**Environment Variables:**
|
| 292 |
-
|
| 293 |
-
| Variable | Default | Description |
|
| 294 |
-
|----------|---------|-------------|
|
| 295 |
-
| `HF_TOKEN` | (required for submission) | API key for LLM provider or HF Spaces |
|
| 296 |
-
| `OPENAI_API_KEY` | (optional fallback) | Alternative OpenAI-compatible key |
|
| 297 |
-
| `API_BASE_URL` | `https://openrouter.ai/api/v1` | LLM endpoint URL |
|
| 298 |
-
| `MODEL_NAME` | `meta-llama/llama-3.3-70b-instruct:free` | Model identifier |
|
| 299 |
-
| `ENV_URL` | `http://localhost:7860` | Environment server address |
|
| 300 |
-
|
| 301 |
-
**Command-Line Flags:**
|
| 302 |
-
|
| 303 |
-
| Flag | Default | Description |
|
| 304 |
-
|------|---------|-------------|
|
| 305 |
-
| `--episodes N` | 1 | Episodes per task (runs tasks 1, 2, 3 in sequence) |
|
| 306 |
-
| `--fast-mode` | off | Don't call LLM; use heuristic policy only (reproducible, no API calls) |
|
| 307 |
-
| `--llm-every N` | 4 | Reuse each LLM decision for N steps (reduces API calls) |
|
| 308 |
-
| `--max-steps N` | 96 | Stop episode early after N steps |
|
| 309 |
-
| `--env-url URL` | from env var | Override environment server URL |
|
| 310 |
-
| `--output FILE` | `baseline_scores.json` | Output results filename |
|
| 311 |
-
| `--verbose` | off | Print detailed logs for each step |
|
| 312 |
-
|
| 313 |
-
**Examples:**
|
| 314 |
-
|
| 315 |
-
```bash
|
| 316 |
-
# Run all 3 tasks with LLM (1 episode each)
|
| 317 |
-
python inference.py --episodes 1
|
| 318 |
-
|
| 319 |
-
# Reproduce baseline fast (no LLM)
|
| 320 |
-
python inference.py --fast-mode --episodes 1
|
| 321 |
-
|
| 322 |
-
# Only Task 2, heuristic, verbose output
|
| 323 |
-
python inference.py --fast-mode --episodes 1 --verbose
|
| 324 |
-
|
| 325 |
-
# Run 5 episodes per task with custom environment
|
| 326 |
-
python inference.py --episodes 5 --env-url http://my-server:7860
|
| 327 |
-
```
|
| 328 |
-
|
| 329 |
-
---
|
| 330 |
-
|
| 331 |
-
### HTTP API Reference
|
| 332 |
-
|
| 333 |
-
**Base URL:** `http://localhost:7860`
|
| 334 |
-
|
| 335 |
-
| Endpoint | Method | Purpose | Example Body |
|
| 336 |
-
|----------|--------|---------|---------------|
|
| 337 |
-
| `/health` | GET | Liveness check | — |
|
| 338 |
-
| `/ping` | GET | Lightweight ping | — |
|
| 339 |
-
| `/reset` | POST | Reset episode for a task | `{"task_id": 1, "seed": 42}` |
|
| 340 |
-
| `/step` | POST | Apply action, get next observation | `{"hvac_power_level": 0.5, "thermal_charge_rate": 0.1, ...}` |
|
| 341 |
-
| `/state` | GET | Current full state snapshot | — |
|
| 342 |
-
| `/grade` | GET | Episode score (0.0–1.0) with sub-scores | — |
|
| 343 |
-
| `/replay` | GET | Full step-by-step trajectory | — |
|
| 344 |
-
| `/tasks` | GET | Task definitions and grader weights | — |
|
| 345 |
-
| `/metrics` | GET | Prometheus-format metrics | — |
|
| 346 |
-
|
| 347 |
-
**Example Workflow:**
|
| 348 |
-
|
| 349 |
-
```bash
|
| 350 |
-
# 1. Reset to Task 1 with seed 42
|
| 351 |
-
curl -X POST http://localhost:7860/reset \
|
| 352 |
-
-H "Content-Type: application/json" \
|
| 353 |
-
-d '{"task_id": 1, "seed": 42}'
|
| 354 |
-
|
| 355 |
-
# 2. Get initial observation
|
| 356 |
-
curl http://localhost:7860/state
|
| 357 |
-
|
| 358 |
-
# 3. Take an action
|
| 359 |
-
curl -X POST http://localhost:7860/step \
|
| 360 |
-
-H "Content-Type: application/json" \
|
| 361 |
-
-d '{
|
| 362 |
-
"hvac_power_level": 0.5,
|
| 363 |
-
"thermal_charge_rate": 0.1,
|
| 364 |
-
"batch_job_slot": 1,
|
| 365 |
-
"load_shed_fraction": 0.0
|
| 366 |
-
}'
|
| 367 |
-
|
| 368 |
-
# 4. Check final score after episode completes
|
| 369 |
-
curl http://localhost:7860/grade
|
| 370 |
-
```
|
| 371 |
-
|
| 372 |
-
---
|
| 373 |
-
|
| 374 |
-
## 📊 Baseline Performance Scores
|
| 375 |
-
|
| 376 |
-
The baseline is a **heuristic policy** (rule-based, no LLM) representing a reasonable but non-optimized control strategy. Your RL agent should aim to exceed these scores.
|
| 377 |
-
|
| 378 |
-
**Baseline Run:** `python inference.py --fast-mode --episodes 1`
|
| 379 |
-
|
| 380 |
-
### Summary Scores
|
| 381 |
-
|
| 382 |
-
| Task | Difficulty | Score | Model |
|
| 383 |
-
|------|:----------:|:-----:|-------|
|
| 384 |
-
| Task 1 — Cost Minimization | 🟢 Easy | **0.7063** | Heuristic |
|
| 385 |
-
| Task 2 — Temperature Control | 🟡 Medium | **0.6333** | Heuristic |
|
| 386 |
-
| Task 3 — Full Demand Response | 🔴 Hard | **0.5966** | Heuristic |
|
| 387 |
-
| **Overall Average** | — | **0.6454** | Heuristic |
|
| 388 |
-
|
| 389 |
-
### Detailed Breakdown
|
| 390 |
-
|
| 391 |
-
#### Task 1 Results
|
| 392 |
-
- **Task:** Cost minimization (96 hours × 15 min = 24 hours)
|
| 393 |
-
- **Score:** 0.7063
|
| 394 |
-
- **Sub-score:** Cost = 0.706
|
| 395 |
-
- **Interpretation:** Heuristic achieves ~70% of optimal cost reduction vs baseline
|
| 396 |
-
|
| 397 |
-
#### Task 2 Results
|
| 398 |
-
- **Task:** Minimize cost while maintaining temperature 19–23°C
|
| 399 |
-
- **Score:** 0.6333
|
| 400 |
-
- **Sub-scores:**
|
| 401 |
-
- Cost: 0.701
|
| 402 |
-
- Temperature constraint: 0.531 (agent violated comfort bounds ~47% of the time)
|
| 403 |
-
- **Interpretation:** Temperature management is challenging for the heuristic. Tighter thermal control could improve this score significantly.
|
| 404 |
-
|
| 405 |
-
#### Task 3 Results (Most Interesting)
|
| 406 |
-
- **Task:** Multi-objective: cost, temperature, grid response, batch deadlines, carbon
|
| 407 |
-
- **Score:** 0.5966
|
| 408 |
-
- **Sub-scores:**
|
| 409 |
-
- Cost: 0.670
|
| 410 |
-
- Temperature: 0.573 (similar temperature control challenge as Task 2)
|
| 411 |
-
- **Grid response: 0.214** ← Heuristic rarely participates in demand-response
|
| 412 |
-
- Batch deadline: 1.000 (heuristic always completes jobs on time)
|
| 413 |
-
- Carbon: 0.657
|
| 414 |
-
|
| 415 |
-
**Key Insight:** The heuristic's low grid response score (0.21) suggests that learned agents have significant room for improvement by:
|
| 416 |
-
1. Recognizing high-price + high-stress periods
|
| 417 |
-
2. Proactively shedding load to reduce cost
|
| 418 |
-
3. Using thermal storage to recover comfort afterward
|
| 419 |
-
|
| 420 |
-
This multi-objective setting is where RL agents typically exceed heuristic baselines.
|
| 421 |
-
|
| 422 |
-
### Reproducibility & Evaluation
|
| 423 |
-
|
| 424 |
-
- **Deterministic:** Baseline scores are **deterministic** — same seed always produces identical actions and rewards
|
| 425 |
-
- **Seeding:** Each task uses a fixed base seed (1100, 1200, 1300) for reproducible evaluation
|
| 426 |
-
- **Your Submissions:** Your agent will be evaluated on the same seed distribution; compare your scores directly to baseline
|
| 427 |
-
|
| 428 |
-
---
|
| 429 |
-
|
| 430 |
-
## 🏗️ Architecture
|
| 431 |
-
|
| 432 |
-
```
|
| 433 |
-
┌─────────────────────────────────────────────────────────────────┐
|
| 434 |
-
│ inference.py (LLM Agent or Heuristic) │
|
| 435 |
-
│ │ │
|
| 436 |
-
│ │ HTTP: POST /reset, /step · GET /grade, /state │
|
| 437 |
-
│ ▼ │
|
| 438 |
-
│ ┌───────────────────────────────────────────────────────────┐ │
|
| 439 |
-
│ │ Docker Container │ │
|
| 440 |
-
│ │ │ │
|
| 441 |
-
│ │ ┌─────────────────────┐ ┌───────────────────────────┐ │ │
|
| 442 |
-
│ │ │ Go Environment │ │ Python Dashboard │ │ │
|
| 443 |
-
│ │ │ Server (:7860) │ │ FastAPI + UI (:7861) │ │ │
|
| 444 |
-
│ │ │ │ │ │ │ │
|
| 445 |
-
│ │ │ • Physics engine │ │ • Proxies /api → :7860 │ │ │
|
| 446 |
-
│ │ │ • Reward function │◄──│ • Real-time charts │ │ │
|
| 447 |
-
│ │ │ • Task graders │ │ • State visualization │ │ │
|
| 448 |
-
│ │ └─────────────────────┘ └───────────────────────────┘ │ │
|
| 449 |
-
│ │ │ │
|
| 450 |
-
│ │ Isolated · Reproducible · Non-root user │ │
|
| 451 |
-
│ └───────────────────────────────────────────────────────────┘ │
|
| 452 |
-
└─────────────────────────────────────────────────────────────────┘
|
| 453 |
-
```
|
| 454 |
-
|
| 455 |
-
### Project Structure
|
| 456 |
-
|
| 457 |
-
```
|
| 458 |
-
gridmind/
|
| 459 |
-
├── inference.py ← Hackathon entrypoint (root)
|
| 460 |
-
├── openenv.yaml ← OpenEnv spec manifest
|
| 461 |
-
├── Dockerfile ← Multi-stage build (Go + Python)
|
| 462 |
-
├── .env ← API credentials (git-ignored)
|
| 463 |
-
├── baseline_scores.json ← Produced by inference.py
|
| 464 |
-
│
|
| 465 |
-
├── main.go ← HTTP server (routes, middleware, metrics)
|
| 466 |
-
├── env/ ← Core environment logic (Go)
|
| 467 |
-
│ ├── environment.go ← Simulation: physics, thermal dynamics
|
| 468 |
-
│ ├── models.go ← All data types (Observation, Action, etc.)
|
| 469 |
-
│ ├── rewards.go ← 7-component dense reward function
|
| 470 |
-
│ └── tasks.go ← 3 task definitions + deterministic graders
|
| 471 |
-
│
|
| 472 |
-
├── python/ ← Python support layer
|
| 473 |
-
│ ├── inference.py ← Full LLM agent + heuristic fallback
|
| 474 |
-
│ ├── models.py ← Typed Pydantic models (mirrors Go structs)
|
| 475 |
-
│ ├── validate.py ← OpenEnv spec validation suite
|
| 476 |
-
│ └── requirements.txt ← Python dependencies
|
| 477 |
-
│
|
| 478 |
-
├── tests/ ← Automated tests
|
| 479 |
-
│ ├── environment_test.go ← Go unit tests (determinism, bounds, etc.)
|
| 480 |
-
│ └── test_graders.py ← Python grader tests (pytest)
|
| 481 |
-
│
|
| 482 |
-
└── dashboard/ ← Optional web dashboard
|
| 483 |
-
├── server.py ← FastAPI server
|
| 484 |
-
└── static/ ← Frontend assets
|
| 485 |
-
```
|
| 486 |
-
|
| 487 |
-
---
|
| 488 |
-
|
| 489 |
-
## 🐳 Docker
|
| 490 |
-
|
| 491 |
-
| Action | Command |
|
| 492 |
-
|--------|---------|
|
| 493 |
-
| **Build** | `docker build -t gridmind-rl .` |
|
| 494 |
-
| **Run (foreground)** | `docker run --rm -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
|
| 495 |
-
| **Run (background)** | `docker run --rm -d -p 7860:7860 -p 7861:7861 --name gridmind gridmind-rl` |
|
| 496 |
-
| **Stop** | `docker stop gridmind` |
|
| 497 |
-
| **Run inference inside container** | `docker exec -it gridmind python /app/inference.py --fast-mode` |
|
| 498 |
-
|
| 499 |
-
The Dockerfile uses a **multi-stage build**:
|
| 500 |
-
1. **Stage 1** — Go 1.21 Alpine: compiles the environment server binary
|
| 501 |
-
2. **Stage 2** — Python 3.11 slim: runs the Go binary + Python dashboard via Supervisor
|
| 502 |
-
|
| 503 |
-
---
|
| 504 |
-
|
| 505 |
-
## ☁️ Hugging Face Space Deployment
|
| 506 |
-
|
| 507 |
-
### 1. Create a New Space
|
| 508 |
-
|
| 509 |
-
Go to [huggingface.co/new-space](https://huggingface.co/new-space):
|
| 510 |
-
- **SDK:** Docker
|
| 511 |
-
- **Hardware:** CPU Basic (2 vCPU, 16 GB — free tier)
|
| 512 |
-
|
| 513 |
-
### 2. Push to HF
|
| 514 |
-
|
| 515 |
-
```bash
|
| 516 |
-
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/gridmind-rl
|
| 517 |
-
git push hf main
|
| 518 |
-
```
|
| 519 |
-
|
| 520 |
-
### 3. Verify
|
| 521 |
-
|
| 522 |
-
```bash
|
| 523 |
-
curl https://YOUR_USERNAME-gridmind-rl.hf.space/health
|
| 524 |
-
# → {"status":"ok","version":"1.0.0"}
|
| 525 |
-
|
| 526 |
-
curl -X POST https://YOUR_USERNAME-gridmind-rl.hf.space/reset \
|
| 527 |
-
-H "Content-Type: application/json" \
|
| 528 |
-
-d '{"task_id":1,"seed":42}'
|
| 529 |
-
```
|
| 530 |
-
|
| 531 |
-
> **Note:** HF Spaces exposes port **7860** publicly. The dashboard (7861) is for local development only.
|
| 532 |
-
|
| 533 |
-
---
|
| 534 |
-
|
| 535 |
-
## 🧪 Testing
|
| 536 |
-
|
| 537 |
-
### Run Go Unit Tests
|
| 538 |
-
|
| 539 |
-
```bash
|
| 540 |
-
cd gridmind
|
| 541 |
-
go test ./tests/ -v
|
| 542 |
-
```
|
| 543 |
-
|
| 544 |
-
### Run Python Grader Tests (requires server running)
|
| 545 |
-
|
| 546 |
-
```bash
|
| 547 |
-
pytest tests/test_graders.py -v
|
| 548 |
-
```
|
| 549 |
-
|
| 550 |
-
### Run Full OpenEnv Validation
|
| 551 |
-
|
| 552 |
-
```bash
|
| 553 |
-
python python/validate.py --env-url http://localhost:7860
|
| 554 |
-
```
|
| 555 |
-
|
| 556 |
-
---
|
| 557 |
-
|
| 558 |
-
## 📝 Inference Script Reference
|
| 559 |
-
|
| 560 |
-
The `inference.py` script at the project root is the **hackathon entrypoint**.
|
| 561 |
-
|
| 562 |
-
### Environment Variables
|
| 563 |
-
|
| 564 |
-
| Variable | Default | Description |
|
| 565 |
-
|----------|---------|-------------|
|
| 566 |
-
| `API_BASE_URL` | `https://openrouter.ai/api/v1` | LLM API endpoint |
|
| 567 |
-
| `MODEL_NAME` | `meta-llama/llama-3.1-8b-instruct:free` | Model to use |
|
| 568 |
-
| `OPENAI_API_KEY` | — | API key (any OpenAI-compatible provider) |
|
| 569 |
-
| `ENV_URL` | `http://localhost:7860` | Environment server URL |
|
| 570 |
-
|
| 571 |
-
### Command-Line Flags
|
| 572 |
-
|
| 573 |
-
| Flag | Default | Description |
|
| 574 |
-
|------|---------|-------------|
|
| 575 |
-
| `--episodes N` | 1 | Episodes per task (tasks 1–3 run in sequence) |
|
| 576 |
-
| `--fast-mode` | off | Use heuristic policy only (no LLM, fully reproducible) |
|
| 577 |
-
| `--llm-every N` | 4 | Reuse each LLM action for N steps (reduces API calls) |
|
| 578 |
-
| `--max-steps N` | 96 | Stop early after N steps |
|
| 579 |
-
| `--env-url URL` | from env | Override environment URL |
|
| 580 |
-
| `--output FILE` | `baseline_scores.json` | Output results file |
|
| 581 |
-
| `--verbose` | off | Print detailed step logs |
|
| 582 |
-
|
| 583 |
-
### Stdout Log Format
|
| 584 |
-
|
| 585 |
-
Each episode emits structured markers for automated evaluation:
|
| 586 |
-
|
| 587 |
-
```
|
| 588 |
-
[START]
|
| 589 |
-
[STEP1]
|
| 590 |
-
[STEP2]
|
| 591 |
-
...
|
| 592 |
-
[STEP96]
|
| 593 |
-
[END]
|
| 594 |
-
```
|
| 595 |
-
|
| 596 |
-
---
|
| 597 |
-
|
| 598 |
-
## ✅ OpenEnv Specification Compliance
|
| 599 |
-
|
| 600 |
-
GridMind-RL fully implements the OpenEnv specification for standardized RL environments. All components are present and tested:
|
| 601 |
-
|
| 602 |
-
| Requirement | Status | Notes |
|
| 603 |
-
|-------------|:------:|-------|
|
| 604 |
-
| Manifest (`openenv.yaml`) | ✅ | All metadata, schema definitions, and version info |
|
| 605 |
-
| Observation Schema | ✅ | 11-field object: temperature, storage, price, grid signal, carbon, hour, batch queue, cost, step, building_id |
|
| 606 |
-
| Action Schema | ✅ | 5-field object: HVAC, thermal rate, batch slot, load shed, building_id |
|
| 607 |
-
| HTTP Endpoints | ✅ | `/reset`, `/step`, `/state`, `/grade`, `/replay`, `/tasks`, `/health`, `/metrics` |
|
| 608 |
-
| Determinism | ✅ | Seeded episode generation; identical seeds produce identical trajectories |
|
| 609 |
-
| Typed Models | ✅ | Pydantic models (Python) mirror Go structs exactly |
|
| 610 |
-
| Dense Rewards | ✅ | 7-component reward breakdown every step |
|
| 611 |
-
| Graders | ✅ | 3 tasks with programmatic, deterministic graders (0.0–1.0 range) |
|
| 612 |
-
| Exploit Detection | ✅ | Built into grading pipeline to flag unrealistic scores |
|
| 613 |
-
|
| 614 |
-
---
|
| 615 |
-
|
| 616 |
-
## ❓ FAQ
|
| 617 |
-
|
| 618 |
-
**Q: Can I use a different model?**
|
| 619 |
-
A: Yes. Set `MODEL_NAME` environment variable to any OpenAI-compatible model. The default (`meta-llama/llama-3.3-70b-instruct:free`) is free on OpenRouter with no credit card.
|
| 620 |
-
|
| 621 |
-
**Q: How do I avoid rate limiting?**
|
| 622 |
-
A: (1) Use `--fast-mode` for local testing (no API calls), (2) Set `--llm-every 4` to reuse decisions, (3) Use a paid API tier for submission, or (4) Train & submit an offline policy.
|
| 623 |
-
|
| 624 |
-
**Q: Will my API key be exposed in submissions?**
|
| 625 |
-
A: No. Store your API key in `.env` (git-ignored). On HF Spaces, set secrets via the Space settings UI; keys are never committed to the repo.
|
| 626 |
-
|
| 627 |
-
**Q: What's the difference between `HF_TOKEN` and `OPENAI_API_KEY`?**
|
| 628 |
-
A: `HF_TOKEN` is used in HF Space deployments and external evaluations. `OPENAI_API_KEY` is a fallback for local development. The code tries `HF_TOKEN` first, then `OPENAI_API_KEY`. At least one must be set.
|
| 629 |
-
|
| 630 |
-
**Q: Can I submit an offline/trained policy?**
|
| 631 |
-
A: Yes. Modify `python/inference.py` to use your trained agent instead of LLM calls. Ensure you still output the required `[START]`, `[STEP]`, `[END]` format.
|
| 632 |
-
|
| 633 |
-
**Q: What if my submission times out?**
|
| 634 |
-
A: Each episode is 96 steps. The environment runs 3 episodes (one per task). Optimize for latency: reduce LLM calls (use `--llm-every`), use a faster model, or submit a heuristic/trained offline policy.
|
| 635 |
-
|
| 636 |
-
---
|
| 637 |
-
|
| 638 |
-
## 🎯 Submission Checklist
|
| 639 |
-
|
| 640 |
-
Before submitting, verify:
|
| 641 |
-
|
| 642 |
-
- [ ] Clone repo, build Docker, run `docker run -p 7860:7860 -p 7861:7861 gridmind-rl`
|
| 643 |
-
- [ ] Run `python inference.py --fast-mode --episodes 1` locally — should produce `baseline_scores.json`
|
| 644 |
-
- [ ] Check `[START]`, `[STEP]`, `[END]` markers in stdout
|
| 645 |
-
- [ ] Set `HF_TOKEN` or `OPENAI_API_KEY` in `.env` for LLM runs
|
| 646 |
-
- [ ] Test with LLM: `python inference.py --episodes 1`
|
| 647 |
-
- [ ] Verify Dockerfile builds without errors: `docker build -t gridmind-rl .`
|
| 648 |
-
- [ ] Create HF Space (Docker SDK, CPU Basic)
|
| 649 |
-
- [ ] Push repo to HF Space: `git push hf main`
|
| 650 |
-
- [ ] Set secrets in HF Space UI: `HF_TOKEN`, `API_BASE_URL` (optional), `MODEL_NAME` (optional)
|
| 651 |
-
- [ ] Verify Space is running: `curl https://YOUR_USERNAME-gridmind-rl.hf.space/health`
|
| 652 |
-
- [ ] Submit Space URL to hackathon organizers
|
| 653 |
-
|
| 654 |
-
---
|
| 655 |
-
|
| 656 |
-
## 📚 Additional Resources
|
| 657 |
-
|
| 658 |
-
- **OpenEnv Spec:** https://github.com/meta-pytorch/OpenEnv
|
| 659 |
-
- **OpenRouter Free Models:** https://openrouter.ai/keys
|
| 660 |
-
- **HF Spaces Docs:** https://huggingface.co/docs/hub/spaces
|
| 661 |
-
- **GridMind Repository:** https://github.com/LO-Kyu/gridmind-rl
|
| 662 |
-
|
| 663 |
-
---
|
| 664 |
-
|
| 665 |
-
## 📄 License
|
| 666 |
-
|
| 667 |
-
See `LICENSE` in the repository.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gridmind_rl.egg-info/SOURCES.txt
DELETED
|
@@ -1,16 +0,0 @@
|
|
| 1 |
-
LICENSE
|
| 2 |
-
README.md
|
| 3 |
-
pyproject.toml
|
| 4 |
-
gridmind_rl.egg-info/PKG-INFO
|
| 5 |
-
gridmind_rl.egg-info/SOURCES.txt
|
| 6 |
-
gridmind_rl.egg-info/dependency_links.txt
|
| 7 |
-
gridmind_rl.egg-info/entry_points.txt
|
| 8 |
-
gridmind_rl.egg-info/requires.txt
|
| 9 |
-
gridmind_rl.egg-info/top_level.txt
|
| 10 |
-
python/__init__.py
|
| 11 |
-
python/inference.py
|
| 12 |
-
python/models.py
|
| 13 |
-
python/validate.py
|
| 14 |
-
server/__init__.py
|
| 15 |
-
server/app.py
|
| 16 |
-
tests/test_graders.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gridmind_rl.egg-info/dependency_links.txt
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
|
|
|
|
|
|
gridmind_rl.egg-info/entry_points.txt
DELETED
|
@@ -1,2 +0,0 @@
|
|
| 1 |
-
[console_scripts]
|
| 2 |
-
gridmind-server = server.app:main
|
|
|
|
|
|
|
|
|
gridmind_rl.egg-info/requires.txt
DELETED
|
@@ -1,15 +0,0 @@
|
|
| 1 |
-
openai>=1.0.0
|
| 2 |
-
openenv-core>=0.2.0
|
| 3 |
-
fastapi>=0.100.0
|
| 4 |
-
uvicorn>=0.23.0
|
| 5 |
-
pydantic>=2.0.0
|
| 6 |
-
requests>=2.31.0
|
| 7 |
-
httpx>=0.24.0
|
| 8 |
-
pytest>=7.0.0
|
| 9 |
-
python-dotenv>=1.0.0
|
| 10 |
-
|
| 11 |
-
[dev]
|
| 12 |
-
pytest>=7.0.0
|
| 13 |
-
pytest-cov>=4.0.0
|
| 14 |
-
black>=23.0.0
|
| 15 |
-
ruff>=0.1.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gridmind_rl.egg-info/top_level.txt
DELETED
|
@@ -1,2 +0,0 @@
|
|
| 1 |
-
python
|
| 2 |
-
server
|
|
|
|
|
|
|
|
|
inference.py
CHANGED
|
@@ -1,13 +1,15 @@
|
|
| 1 |
"""
|
| 2 |
-
|
|
|
|
|
|
|
| 3 |
python inference.py
|
| 4 |
|
| 5 |
Reads environment variables:
|
| 6 |
- API_BASE_URL (default: https://openrouter.ai/api/v1)
|
| 7 |
- MODEL_NAME (default: meta-llama/llama-3.3-70b-instruct:free)
|
| 8 |
-
- HF_TOKEN (
|
| 9 |
|
| 10 |
-
Emits
|
| 11 |
[START] task=<name> env=gridmind model=<model>
|
| 12 |
[STEP] step=<n> action=<json> reward=<0.00> done=<true|false> error=<msg|null>
|
| 13 |
[END] success=<true|false> steps=<n> rewards=<r1,r2,...>
|
|
|
|
| 1 |
"""
|
| 2 |
+
GridMind-RL Agent Entry Point
|
| 3 |
+
|
| 4 |
+
Run from repo root with:
|
| 5 |
python inference.py
|
| 6 |
|
| 7 |
Reads environment variables:
|
| 8 |
- API_BASE_URL (default: https://openrouter.ai/api/v1)
|
| 9 |
- MODEL_NAME (default: meta-llama/llama-3.3-70b-instruct:free)
|
| 10 |
+
- HF_TOKEN (required, or OPENAI_API_KEY for testing)
|
| 11 |
|
| 12 |
+
Emits standard output format:
|
| 13 |
[START] task=<name> env=gridmind model=<model>
|
| 14 |
[STEP] step=<n> action=<json> reward=<0.00> done=<true|false> error=<msg|null>
|
| 15 |
[END] success=<true|false> steps=<n> rewards=<r1,r2,...>
|
python/__pycache__/inference.cpython-311.pyc
DELETED
|
Binary file (23.5 kB)
|
|
|
python/inference.py
CHANGED
|
@@ -48,10 +48,10 @@ ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
|
|
| 48 |
MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/llama-3.3-70b-instruct:free")
|
| 49 |
API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
|
| 50 |
|
| 51 |
-
# ──
|
| 52 |
-
#
|
| 53 |
# and passed directly to the OpenAI client for initialization.
|
| 54 |
-
# Primary: HF_TOKEN
|
| 55 |
# Fallback: OPENAI_API_KEY (for local testing/development)
|
| 56 |
HF_TOKEN = os.getenv("HF_TOKEN")
|
| 57 |
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or HF_TOKEN
|
|
|
|
| 48 |
MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/llama-3.3-70b-instruct:free")
|
| 49 |
API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
|
| 50 |
|
| 51 |
+
# ── Environment Variable Handling ─────────────────────────────────────────
|
| 52 |
+
# The LLM API credential is read from HF_TOKEN or OPENAI_API_KEY environment variables
|
| 53 |
# and passed directly to the OpenAI client for initialization.
|
| 54 |
+
# Primary: HF_TOKEN
|
| 55 |
# Fallback: OPENAI_API_KEY (for local testing/development)
|
| 56 |
HF_TOKEN = os.getenv("HF_TOKEN")
|
| 57 |
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or HF_TOKEN
|
server/__init__.py
DELETED
|
File without changes
|
test_suite_results.json
DELETED
|
@@ -1,307 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"passed": 50,
|
| 3 |
-
"failed": 0,
|
| 4 |
-
"warnings": 0,
|
| 5 |
-
"details": [
|
| 6 |
-
{
|
| 7 |
-
"id": "B1",
|
| 8 |
-
"description": "API_BASE_URL has default",
|
| 9 |
-
"status": "PASS",
|
| 10 |
-
"details": "https://openrouter.ai/api/v1"
|
| 11 |
-
},
|
| 12 |
-
{
|
| 13 |
-
"id": "B2",
|
| 14 |
-
"description": "MODEL_NAME has default",
|
| 15 |
-
"status": "PASS",
|
| 16 |
-
"details": "meta-llama/llama-3.3-70b-instruct:free"
|
| 17 |
-
},
|
| 18 |
-
{
|
| 19 |
-
"id": "B3",
|
| 20 |
-
"description": "HF_TOKEN is mandatory (raises ValueError)",
|
| 21 |
-
"status": "PASS",
|
| 22 |
-
"details": "checked in python/inference.py line 57-58"
|
| 23 |
-
},
|
| 24 |
-
{
|
| 25 |
-
"id": "B4",
|
| 26 |
-
"description": "OpenAI client initialized",
|
| 27 |
-
"status": "PASS",
|
| 28 |
-
"details": "base_url and api_key from env vars"
|
| 29 |
-
},
|
| 30 |
-
{
|
| 31 |
-
"id": "B5",
|
| 32 |
-
"description": "HF_TOKEN is PRIMARY key",
|
| 33 |
-
"status": "PASS",
|
| 34 |
-
"details": "OPENAI_API_KEY is fallback only"
|
| 35 |
-
},
|
| 36 |
-
{
|
| 37 |
-
"id": "C1",
|
| 38 |
-
"description": "[START] format exists",
|
| 39 |
-
"status": "PASS",
|
| 40 |
-
"details": ""
|
| 41 |
-
},
|
| 42 |
-
{
|
| 43 |
-
"id": "C2",
|
| 44 |
-
"description": "[STEP] format exists",
|
| 45 |
-
"status": "PASS",
|
| 46 |
-
"details": ""
|
| 47 |
-
},
|
| 48 |
-
{
|
| 49 |
-
"id": "C3",
|
| 50 |
-
"description": "[END] format exists",
|
| 51 |
-
"status": "PASS",
|
| 52 |
-
"details": ""
|
| 53 |
-
},
|
| 54 |
-
{
|
| 55 |
-
"id": "C4",
|
| 56 |
-
"description": "Reward formatted to 2 decimal places",
|
| 57 |
-
"status": "PASS",
|
| 58 |
-
"details": ""
|
| 59 |
-
},
|
| 60 |
-
{
|
| 61 |
-
"id": "C5",
|
| 62 |
-
"description": "Lowercase booleans (true/false)",
|
| 63 |
-
"status": "PASS",
|
| 64 |
-
"details": ""
|
| 65 |
-
},
|
| 66 |
-
{
|
| 67 |
-
"id": "C6",
|
| 68 |
-
"description": "error field uses null",
|
| 69 |
-
"status": "PASS",
|
| 70 |
-
"details": ""
|
| 71 |
-
},
|
| 72 |
-
{
|
| 73 |
-
"id": "D1",
|
| 74 |
-
"description": "OpenEnv spec structure valid, name=gridmind-rl",
|
| 75 |
-
"status": "PASS",
|
| 76 |
-
"details": ""
|
| 77 |
-
},
|
| 78 |
-
{
|
| 79 |
-
"id": "D1b",
|
| 80 |
-
"description": "Port is 7860: 7860",
|
| 81 |
-
"status": "PASS",
|
| 82 |
-
"details": ""
|
| 83 |
-
},
|
| 84 |
-
{
|
| 85 |
-
"id": "D1c",
|
| 86 |
-
"description": "Has 3 tasks: 3",
|
| 87 |
-
"status": "PASS",
|
| 88 |
-
"details": ""
|
| 89 |
-
},
|
| 90 |
-
{
|
| 91 |
-
"id": "D2",
|
| 92 |
-
"description": "All OpenEnv endpoints declared: 6/6",
|
| 93 |
-
"status": "PASS",
|
| 94 |
-
"details": ""
|
| 95 |
-
},
|
| 96 |
-
{
|
| 97 |
-
"id": "E1",
|
| 98 |
-
"description": "Exactly 3 tasks: 3",
|
| 99 |
-
"status": "PASS",
|
| 100 |
-
"details": ""
|
| 101 |
-
},
|
| 102 |
-
{
|
| 103 |
-
"id": "E1.1",
|
| 104 |
-
"description": "Task 1 difficulty is easy",
|
| 105 |
-
"status": "PASS",
|
| 106 |
-
"details": "expected easy"
|
| 107 |
-
},
|
| 108 |
-
{
|
| 109 |
-
"id": "E1.2",
|
| 110 |
-
"description": "Task 2 difficulty is medium",
|
| 111 |
-
"status": "PASS",
|
| 112 |
-
"details": "expected medium"
|
| 113 |
-
},
|
| 114 |
-
{
|
| 115 |
-
"id": "E1.3",
|
| 116 |
-
"description": "Task 3 difficulty is hard",
|
| 117 |
-
"status": "PASS",
|
| 118 |
-
"details": "expected hard"
|
| 119 |
-
},
|
| 120 |
-
{
|
| 121 |
-
"id": "E2",
|
| 122 |
-
"description": "Task 1 grader exists",
|
| 123 |
-
"status": "PASS",
|
| 124 |
-
"details": ""
|
| 125 |
-
},
|
| 126 |
-
{
|
| 127 |
-
"id": "E2",
|
| 128 |
-
"description": "Task 2 grader exists",
|
| 129 |
-
"status": "PASS",
|
| 130 |
-
"details": ""
|
| 131 |
-
},
|
| 132 |
-
{
|
| 133 |
-
"id": "E2",
|
| 134 |
-
"description": "Task 3 grader exists",
|
| 135 |
-
"status": "PASS",
|
| 136 |
-
"details": ""
|
| 137 |
-
},
|
| 138 |
-
{
|
| 139 |
-
"id": "E5",
|
| 140 |
-
"description": "Exploit detection exists",
|
| 141 |
-
"status": "PASS",
|
| 142 |
-
"details": ""
|
| 143 |
-
},
|
| 144 |
-
{
|
| 145 |
-
"id": "E6.1",
|
| 146 |
-
"description": "Task 1 weights sum: 1.00",
|
| 147 |
-
"status": "PASS",
|
| 148 |
-
"details": ""
|
| 149 |
-
},
|
| 150 |
-
{
|
| 151 |
-
"id": "E6.2",
|
| 152 |
-
"description": "Task 2 weights sum: 1.00",
|
| 153 |
-
"status": "PASS",
|
| 154 |
-
"details": ""
|
| 155 |
-
},
|
| 156 |
-
{
|
| 157 |
-
"id": "E6.3",
|
| 158 |
-
"description": "Task 3 weights sum: 1.00",
|
| 159 |
-
"status": "PASS",
|
| 160 |
-
"details": ""
|
| 161 |
-
},
|
| 162 |
-
{
|
| 163 |
-
"id": "F1",
|
| 164 |
-
"description": "All 7 reward components exist: 7/7",
|
| 165 |
-
"status": "PASS",
|
| 166 |
-
"details": ""
|
| 167 |
-
},
|
| 168 |
-
{
|
| 169 |
-
"id": "F2",
|
| 170 |
-
"description": "Reward computed every step",
|
| 171 |
-
"status": "PASS",
|
| 172 |
-
"details": ""
|
| 173 |
-
},
|
| 174 |
-
{
|
| 175 |
-
"id": "F3",
|
| 176 |
-
"description": "Penalties for bad behaviors",
|
| 177 |
-
"status": "PASS",
|
| 178 |
-
"details": ""
|
| 179 |
-
},
|
| 180 |
-
{
|
| 181 |
-
"id": "F4",
|
| 182 |
-
"description": "Reward aggregated properly",
|
| 183 |
-
"status": "PASS",
|
| 184 |
-
"details": ""
|
| 185 |
-
},
|
| 186 |
-
{
|
| 187 |
-
"id": "G1",
|
| 188 |
-
"description": "Multi-stage build (Go builder + Python runtime)",
|
| 189 |
-
"status": "PASS",
|
| 190 |
-
"details": ""
|
| 191 |
-
},
|
| 192 |
-
{
|
| 193 |
-
"id": "G2",
|
| 194 |
-
"description": "Go server compiled",
|
| 195 |
-
"status": "PASS",
|
| 196 |
-
"details": ""
|
| 197 |
-
},
|
| 198 |
-
{
|
| 199 |
-
"id": "G3",
|
| 200 |
-
"description": "supervisord manages processes",
|
| 201 |
-
"status": "PASS",
|
| 202 |
-
"details": ""
|
| 203 |
-
},
|
| 204 |
-
{
|
| 205 |
-
"id": "G4",
|
| 206 |
-
"description": "Go server on port 7860",
|
| 207 |
-
"status": "PASS",
|
| 208 |
-
"details": ""
|
| 209 |
-
},
|
| 210 |
-
{
|
| 211 |
-
"id": "G5",
|
| 212 |
-
"description": "Dashboard on port 7861",
|
| 213 |
-
"status": "PASS",
|
| 214 |
-
"details": ""
|
| 215 |
-
},
|
| 216 |
-
{
|
| 217 |
-
"id": "G7",
|
| 218 |
-
"description": "Both ports exposed",
|
| 219 |
-
"status": "PASS",
|
| 220 |
-
"details": ""
|
| 221 |
-
},
|
| 222 |
-
{
|
| 223 |
-
"id": "G8",
|
| 224 |
-
"description": "Dockerfile syntax valid",
|
| 225 |
-
"status": "PASS",
|
| 226 |
-
"details": ""
|
| 227 |
-
},
|
| 228 |
-
{
|
| 229 |
-
"id": "H1",
|
| 230 |
-
"description": "README has Overview/Motivation",
|
| 231 |
-
"status": "PASS",
|
| 232 |
-
"details": ""
|
| 233 |
-
},
|
| 234 |
-
{
|
| 235 |
-
"id": "H2",
|
| 236 |
-
"description": "README documents Observation Space",
|
| 237 |
-
"status": "PASS",
|
| 238 |
-
"details": ""
|
| 239 |
-
},
|
| 240 |
-
{
|
| 241 |
-
"id": "H3",
|
| 242 |
-
"description": "README documents Action Space",
|
| 243 |
-
"status": "PASS",
|
| 244 |
-
"details": ""
|
| 245 |
-
},
|
| 246 |
-
{
|
| 247 |
-
"id": "H4",
|
| 248 |
-
"description": "README has task descriptions",
|
| 249 |
-
"status": "PASS",
|
| 250 |
-
"details": ""
|
| 251 |
-
},
|
| 252 |
-
{
|
| 253 |
-
"id": "H5",
|
| 254 |
-
"description": "README has setup/usage",
|
| 255 |
-
"status": "PASS",
|
| 256 |
-
"details": ""
|
| 257 |
-
},
|
| 258 |
-
{
|
| 259 |
-
"id": "H6",
|
| 260 |
-
"description": "README mentions baseline",
|
| 261 |
-
"status": "PASS",
|
| 262 |
-
"details": ""
|
| 263 |
-
},
|
| 264 |
-
{
|
| 265 |
-
"id": "H7",
|
| 266 |
-
"description": "README mentions OpenEnv/HF",
|
| 267 |
-
"status": "PASS",
|
| 268 |
-
"details": ""
|
| 269 |
-
},
|
| 270 |
-
{
|
| 271 |
-
"id": "I1",
|
| 272 |
-
"description": "All 3 task scores present: 3",
|
| 273 |
-
"status": "PASS",
|
| 274 |
-
"details": ""
|
| 275 |
-
},
|
| 276 |
-
{
|
| 277 |
-
"id": "I2",
|
| 278 |
-
"description": "All scores in [0.0, 1.0]",
|
| 279 |
-
"status": "PASS",
|
| 280 |
-
"details": ""
|
| 281 |
-
},
|
| 282 |
-
{
|
| 283 |
-
"id": "I3",
|
| 284 |
-
"description": "Has model and api_base fields",
|
| 285 |
-
"status": "PASS",
|
| 286 |
-
"details": ""
|
| 287 |
-
},
|
| 288 |
-
{
|
| 289 |
-
"id": "J1",
|
| 290 |
-
"description": "requirements.txt exists and includes required packages",
|
| 291 |
-
"status": "PASS",
|
| 292 |
-
"details": ""
|
| 293 |
-
},
|
| 294 |
-
{
|
| 295 |
-
"id": "J2",
|
| 296 |
-
"description": "openai package is importable",
|
| 297 |
-
"status": "PASS",
|
| 298 |
-
"details": ""
|
| 299 |
-
},
|
| 300 |
-
{
|
| 301 |
-
"id": "J4",
|
| 302 |
-
"description": "requests package is importable",
|
| 303 |
-
"status": "PASS",
|
| 304 |
-
"details": ""
|
| 305 |
-
}
|
| 306 |
-
]
|
| 307 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|