File size: 16,751 Bytes
2611bbc
 
 
 
 
 
 
 
 
 
 
 
 
 
bc37871
 
a439f7a
bc37871
a439f7a
bc37871
a439f7a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bc37871
d446b52
 
e7ae456
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d446b52
 
a439f7a
 
 
 
 
 
 
 
 
d446b52
a439f7a
d446b52
a439f7a
 
 
 
 
d446b52
 
 
 
bc37871
d446b52
 
 
 
 
 
 
 
 
bc37871
d446b52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bc37871
 
 
 
d446b52
bc37871
 
d446b52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bc37871
 
 
edf2bb1
 
1873b55
 
edf2bb1
 
 
d446b52
1873b55
d446b52
edf2bb1
1873b55
 
d446b52
 
bc37871
 
 
d446b52
 
 
 
 
 
 
 
bc37871
d446b52
 
 
 
 
 
 
 
 
 
bc37871
 
 
 
 
 
 
 
d446b52
bc37871
 
d446b52
bc37871
 
e7ae456
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d446b52
bc37871
 
 
 
 
 
 
 
 
 
 
 
 
d446b52
 
 
 
 
 
 
 
 
 
 
 
bc37871
 
 
 
d446b52
 
 
 
bc37871
d446b52
bc37871
d446b52
 
bc37871
d446b52
 
bc37871
d446b52
 
 
bc37871
d446b52
bc37871
d446b52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
---
title: GridOps
emoji: 
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000
tags:
  - openenv
  - reinforcement-learning
  - microgrid
  - energy
---

# GridOps — Community Microgrid Bridge Operator

> A production-grade OpenEnv RL environment for Indian community microgrid operation. Plug-and-play. Deterministic. Benchmarkable.

**Live demo**: [77ethers-gridops.hf.space/dashboard/](https://77ethers-gridops.hf.space/dashboard/) | **HF Space**: [huggingface.co/spaces/77ethers/gridops](https://huggingface.co/spaces/77ethers/gridops)

---

## At a Glance

| | |
|---|---|
| **Domain** | Real-world Indian community microgrid operation (100 homes, summer) |
| **Interface** | Full OpenEnv spec: `reset()` -> `step(action)` -> `state()`, typed Pydantic models |
| **Actions** | 3D continuous: `battery_dispatch [-1,1]`, `diesel_dispatch [0,1]`, `demand_shedding [0,1]` |
| **Observations** | 30+ fields: demand, solar, price, SOC, forecasts, energy flows. Partial observability (noisy forecasts). |
| **Tasks** | 3 tasks (easy -> medium -> hard), each testing a different RL capability |
| **Grading** | Deterministic, programmatic, 0.0-1.0. Same seed = same score, every run. |
| **Reward** | Dense per-step signal, aligned with episode grader (50% cost + 25% reliability + 25% green) |
| **Anti-gaming** | 5 mechanisms: degradation, startup costs, rebound, smooth VoLL, grid cap |
| **Baseline** | Grok-4 LLM: 0.80/0.82/0.72 — beats hand-coded oracle on all tasks |
| **Deployment** | Docker + HF Space + `openenv validate` 6/6 pass |

---

## SFT Training Pipeline Upgrade

This branch adds a CarbonAlpha-style training harness around the original GridOps environment without changing the public OpenEnv API.

| Artifact | Link |
|---|---|
| Shared prompt/action contract | [`gridops/prompting.py`](gridops/prompting.py) |
| Reusable oracle + adversarial policies | [`gridops/policies.py`](gridops/policies.py) |
| 1,200-row curriculum dataset | [`sft_traces/gridops_curriculum_1200.jsonl`](sft_traces/gridops_curriculum_1200.jsonl) |
| Trace generator | [`scripts/generate_sft_traces.py`](scripts/generate_sft_traces.py) |
| OpenRouter/DeepSeek trace generator | [`scripts/generate_openrouter_deepseek_traces.py`](scripts/generate_openrouter_deepseek_traces.py) |
| Trace validator | [`scripts/validate_traces.py`](scripts/validate_traces.py) |
| Holdout/adversarial evaluator | [`scripts/evaluate_gridops_model.py`](scripts/evaluate_gridops_model.py) |
| Local adapter evaluator | [`scripts/evaluate_gridops_adapter.py`](scripts/evaluate_gridops_adapter.py) |
| Guarded SFT script | [`scripts/hf_sft_gridops.py`](scripts/hf_sft_gridops.py) |
| Eval plotter | [`scripts/plot_gridops_evals.py`](scripts/plot_gridops_evals.py) |
| Colab-ready notebook | [`notebooks/gridops_sft_pipeline.ipynb`](notebooks/gridops_sft_pipeline.ipynb) |
| Model card | [`GRIDOPS_MODEL_CARD.md`](GRIDOPS_MODEL_CARD.md) |

The first milestone is **SFT only**: teach a compact model to emit valid JSON actions for each hourly observation. The first adapter passed the SFT gate on held-out seeds `7001,7002,7003`.

| Model | Avg score | Valid JSON | Task 1 | Task 2 | Task 3 |
|---|---:|---:|---:|---:|---:|
| Do-nothing | 0.5133 | 100.00% | 0.5820 | 0.5057 | 0.4522 |
| GridOps SFT v1 | 0.6854 | 99.85% | 0.6615 | 0.7300 | 0.6648 |
| Oracle | 0.7688 | 100.00% | 0.7932 | 0.8087 | 0.7046 |

| Gate | Target |
|---|---:|
| Valid JSON action rate | >= 98% |
| Average holdout score | >= 0.65 |
| No task below do-nothing baseline | required |
| Task 3 crisis score | >= 0.55 |
| Fixed-seed determinism | stable |

Final SFT v1 artifact:

```text
Qwen/Qwen2.5-3B-Instruct -> QLoRA SFT adapter:
77ethers/gridops-models/sft_qwen25_3b_gridops_mixed1418_v1
```

Evidence:

- [SFT training curve](evals/plots/gridops_sft_training_curve.png)
- [Holdout scores](evals/plots/gridops_holdout_scores.png)
- [Battery throughput](evals/plots/gridops_battery_throughput.png)
- [Blackout reduction](evals/plots/gridops_blackout_kwh.png)
- [Holdout summary JSON](evals/plots/gridops_holdout_summary.json)

The existing leaderboard remains historical. The table above is reported separately as **GridOps SFT v1**.

---

## Why This Environment Exists

Community microgrid operation is a **real job** in India under the [RDSS](https://rdss.gov.in/) (Revamped Distribution Sector Scheme). IEX prosumer bidding is live. Over 50 million Indian homes will have rooftop solar by 2030, and someone — or some agent — needs to manage the battery-grid-diesel tradeoff in real time.

This is not a toy problem. This is what a microgrid operator at an Indian housing society actually decides every hour:

- **Should I charge the battery now** (grid is cheap at Rs 4/kWh) **or save capacity for tonight** (price will spike to Rs 15)?
- **Should I run diesel** (Rs 25/kWh + Rs 100 startup) **or risk a blackout** (Rs 150/kWh VoLL penalty)?
- **Should I ask residents to reduce AC usage** (Rs 40/kWh + 100% rebounds tomorrow)?

Simple heuristics provably fail. The environment requires multi-hour planning, price forecasting, and constraint management under partial observability.

### What makes this a strong benchmark

- **Any agent can plug in immediately** — typed JSON actions in, typed observations out, no custom hacks
- **Fully deterministic** — same seed, same actions = identical trajectory every time. Leaderboard-ready.
- **Tasks differentiate agents** — Do-Nothing scores 0.45-0.58, Oracle 0.70-0.81, Grok-4 LLM 0.72-0.82. Clear skill gradient.
- **Can't be gamed** — 5 anti-exploit mechanisms prevent reward hacking (detailed below)
- **Grader = ground truth** — programmatic, deterministic, partial credit, aligned with per-step reward

---

## The Problem at a Glance

**You have:**
- **Solar panels** — 250 kW peak, free, but only during daylight
- **Community battery** — 500 kWh storage, 100 kW max charge/discharge
- **Diesel generator** — 100 kW, but Rs 25/kWh + Rs 100 startup cost
- **National grid** — auto-imports/exports as slack (capped at 200 kW)

**You control (3 continuous actions):**

| Action | Range | What it does |
|--------|-------|-------------|
| `battery_dispatch` | -1 to +1 | Charge (-100 kW) or discharge (+100 kW). Rs 2.5/kWh degradation. |
| `diesel_dispatch` | 0 to 1 | Diesel output (0-100 kW). Rs 25/kWh + Rs 100 startup if was off. |
| `demand_shedding` | 0 to 1 | Ask residents to cut 0-20% usage. **100% rebounds next hour.** Rs 40/kWh penalty. |

**You do NOT control the grid.** It automatically absorbs whatever energy gap remains after your decisions. If the gap exceeds 200 kW, that's a **blackout** (Rs 150/kWh penalty).

---

## The Critical Bottleneck

At **8 PM every evening**, demand hits **250 kW** but the grid maxes out at **200 kW** and solar is zero.

The **50 kW gap** must come from your battery. If you discharged it for profit during the day, the neighborhood goes dark.

On a heatwave day (Task 2-3), demand spikes to **325-375 kW**. Now the gap is **125-175 kW** — you need battery + diesel + shedding just to survive. And in Task 3, the grid goes down entirely for 6 hours.

---

## What the Agent Sees (Observation)

| Field | Description |
|-------|-------------|
| `hour` | Current hour in episode (0-72, starting 6 AM) |
| `demand_kw` | What the 100 homes need right now |
| `solar_kw` | Free solar power available (0 at night, up to 250 kW midday) |
| `battery_soc` | Battery charge level (0-1, i.e. 0-500 kWh) |
| `grid_price` | Current IEX electricity price (Rs 3-20/kWh) |
| `diesel_fuel_remaining` | Diesel tank level (0-1) |
| `diesel_is_on` | Was diesel running last step? (startup cost if turning on) |
| `demand_forecast_4h` | Noisy 4-hour demand forecast (+-15%) |
| `solar_forecast_4h` | Noisy 4-hour solar forecast |
| `price_forecast_4h` | Noisy 4-hour price forecast |
| `cumulative_blackout_kwh` | Total blackout energy so far |
| `cumulative_cost` | Total money spent so far (Rs) |
| `flow_*` | Detailed energy flows (solar, grid import/export, battery in/out, diesel, demand) |

**Partial observability**: forecasts have +-15% Gaussian noise. The agent cannot perfectly predict heatwave intensity, cloud cover, or price spikes.

---

## 3 Tasks (Each Tests a Different RL Capability)

### Task 1: Normal Summer (Easy) — *Tests basic arbitrage*
- Clear skies, standard demand (~100 kW avg, 250 kW peak)
- Grid prices Rs 3-12 with clear cheap night / expensive evening pattern
- **What the agent must learn**: charge battery at night (cheap grid), discharge during evening peak (expensive grid), let solar cover midday

### Task 2: Heatwave + Price Spike (Medium) — *Tests temporal planning*
- Day 2-3 heatwave (+30% demand), intermittent clouds
- **Rs 20 price spike** on Day 2 evening — visible in 4-hour forecast
- **What the agent must learn**: read the forecast, hold battery charge for the spike instead of greedily discharging early. A greedy policy discharges mid-afternoon; an RL agent that reads the forecast holds until 6 PM.

### Task 3: Extreme Crisis + Grid Outage (Hard) — *Tests constraint management*
- Full 3-day heatwave, -30% solar from haze, +50% demand
- Limited diesel (33% tank = ~8 hours at full power)
- **6-hour grid outage** on Day 2 afternoon — grid cap drops to 0 kW
- **What the agent must learn**: aggressively pre-charge battery before the outage, ration diesel across the outage window, shed demand strategically to stretch resources. This is true microgrid islanding.

---

## Grading (0.0 - 1.0)

```
score = 0.50 x cost_efficiency + 0.25 x reliability + 0.25 x green_score
```

| Component | Formula | What it rewards |
|-----------|---------|----------------|
| **Cost efficiency** (50%) | `1 - (agent_cost / baseline_cost)` | Spending less than a dumb "max grid import" baseline |
| **Reliability** (25%) | `(demand_met - blackout) / demand_met` | Keeping the lights on |
| **Green score** (25%) | `1 - (diesel_used / total_demand)` | Minimizing diesel emissions |

**Baseline**: "import max grid every hour, no battery/diesel/shedding" — physically possible, but expensive and suffers blackouts during peak hours and grid outages.

**VoLL (Value of Lost Load)**: Rs 150/kWh blackout penalty. This is a smooth gradient — no hard reliability cliff. The agent always gets signal for reducing blackouts incrementally.

---

## Why Heuristics Fail

| Strategy | Why it fails |
|----------|-------------|
| "Always discharge battery" | Empty by evening peak. 50 kW gap = blackout. Score collapses. |
| "Always run diesel" | Rs 25/kWh vs Rs 5 grid at night. Hemorrhages money. Green score = 0. |
| "Shed demand whenever short" | Rs 40/kWh cost + 100% rebounds next hour. More expensive than diesel. |
| "Discharge when price > X" | Ignores battery state. Drains SOC before the real peak. |
| "Do nothing" | Grid alone can't cover evening peak. 3.6% blackout rate. |

The oracle (rule-based, time-of-day + price-aware) scores 0.70-0.81. There's a clear **0.20-0.35 gap** between heuristics and the oracle, proving the environment has real optimization headroom.

---

## Anti-Gaming Design

The environment has 5 mechanisms that prevent reward hacking:

1. **Shedding is expensive** — Rs 40/kWh + 100% rebound. Costlier than diesel. True emergency only.
2. **Battery degradation** — Rs 2.5/kWh throughput. Prevents infinite cycling for tiny arbitrage.
3. **Diesel startup cost** — Rs 100 per on-switch. Prevents on/off toggling.
4. **VoLL is smooth** — Rs 150/kWh with no cliff. Agent can't exploit a binary gate.
5. **Grid is capped** — 200 kW max (0 during outages). Can't just buy everything.

---

## Baseline Scores

| Strategy | Task 1 | Task 2 | Task 3 | What it does |
|----------|--------|--------|--------|-------------|
| **Grok-4 (LLM)** | **0.80** | **0.82** | **0.72** | Reads observations, reasons about tradeoffs |
| **Oracle (rule-based)** | 0.79 | 0.81 | 0.70 | Time-of-day + price + SOC heuristic |
| Do-Nothing (grid only) | 0.58 | 0.51 | 0.45 | Grid covers everything it can |
| Always-Discharge | 0.59 | 0.51 | 0.45 | Drains battery, empty by evening |
| Always-Diesel | 0.42 | 0.42 | 0.44 | Rs 25/kWh burns money |

- **LLM beats oracle**: Grok-4 matched or exceeded the hand-coded oracle on every task
- **Deterministic**: identical scores across 3 runs (seeded RNG)
- **Oracle ceiling < 1.0**: real physics constraints, not inflated scores
- **Clear separation**: LLM > oracle >> heuristics (0.20-0.38 gap from best to worst)
- **Task 3 hardest**: grid outage makes it genuinely challenging even for frontier LLMs

---

## Key Physics

| Component | Spec | Cost |
|-----------|------|------|
| **Solar** | 250 kW peak, bell curve 6 AM - 6 PM | Free |
| **Battery** | 500 kWh, 100 kW max, 90% round-trip (sqrt each way) | Rs 2.5/kWh degradation |
| **Diesel** | 100 kW max | Rs 25/kWh + Rs 100 startup |
| **Grid** | 200 kW max import/export (slack variable) | Market price Rs 3-20/kWh |
| **Blackout** | Unmet demand when all sources exhausted | Rs 150/kWh VoLL penalty |
| **Shedding** | Up to 20% demand reduction | Rs 40/kWh + 100% rebound next hour |

**Energy balance every step:**
```
supply  = solar + grid_import + battery_discharge + diesel
consume = effective_demand + grid_export + battery_charge
```
Supply always equals consumption. Any unmet demand beyond grid cap = blackout.

---

## Setup & Usage

```bash
# Install
pip install -e .

# Run server
uvicorn gridops.server.app:app --port 8000

# Interactive dashboard
open http://localhost:8000/dashboard/

# Validate oracle + determinism
python scripts/oracle_test.py

# Generate and validate the SFT curriculum
python scripts/generate_sft_traces.py
python scripts/validate_traces.py sft_traces/gridops_curriculum_1200.jsonl

# Optional: generate 10-at-a-time teacher traces with DeepSeek on OpenRouter
export API_BASE_URL="https://openrouter.ai/api/v1"
export OPENROUTER_API_KEY="your-token"
python scripts/generate_openrouter_deepseek_traces.py --model deepseek/deepseek-v4-pro

# Evaluate reusable policies on holdout seeds
python scripts/evaluate_gridops_model.py --policy oracle
python scripts/evaluate_gridops_model.py --policy do_nothing

# Evaluate an API-hosted or HF-router model with the SFT prompt contract
export HF_API_TOKEN="your-token"
export MODEL_NAME="your-gridops-sft-endpoint-or-model"
python scripts/evaluate_gridops_model.py --model-name "$MODEL_NAME"

# Run LLM baseline
export API_BASE_URL="https://router.huggingface.co/v1"
export HF_TOKEN="your-token"
export MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct"
python inference.py
```

## Docker

```bash
docker build -t gridops .
docker run -p 8000:8000 gridops
```

## OpenEnv Validation

```bash
# Local structure check
openenv validate

# Runtime check (against live server)
openenv validate --url http://localhost:8000
```

---

## Project Structure

```
gridops/
├── inference.py                 # LLM baseline (API_BASE_URL, MODEL_NAME, HF_TOKEN)
├── openenv.yaml                 # OpenEnv manifest
├── Dockerfile                   # Docker deployment
├── server/app.py                # Root entry point (openenv validate)
├── gridops/
│   ├── models.py                # GridOpsAction, GridOpsObservation (Pydantic)
│   ├── simulation/
│   │   ├── physics.py           # Energy balance, battery, VoLL, degradation, outages
│   │   └── scenarios.py         # Demand/solar/price curve generators
│   ├── tasks/
│   │   ├── definitions.py       # 3 task configs (normal, heatwave, crisis+outage)
│   │   └── graders.py           # 0-1 scoring: cost + reliability + green
│   └── server/
│       ├── app.py               # FastAPI + OpenEnv create_app
│       ├── environment.py       # OpenEnv Environment class
│       └── static/index.html    # Interactive dashboard with energy flows
└── scripts/
    └── oracle_test.py           # Oracle + heuristic validation + determinism check
```

---

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/schema` | GET | Action/observation/state JSON schemas |
| `/metadata` | GET | Environment name and description |
| `/reset` | POST | Reset environment (OpenEnv standard) |
| `/step` | POST | Execute action (OpenEnv standard) |
| `/state` | GET | Current state (OpenEnv standard) |
| `/ws` | WebSocket | Persistent session (OpenEnv standard) |
| `/api/reset` | POST | Stateful reset (dashboard) |
| `/api/step` | POST | Stateful step (dashboard) |
| `/api/state` | GET | Stateful state (dashboard) |
| `/tasks` | GET | List available tasks |
| `/dashboard/` | GET | Interactive web UI |