gridops / README.md
77ethers's picture
Upload README.md with huggingface_hub
a439f7a verified
|
raw
history blame
13.3 kB
---
title: GridOps
emoji:
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000
tags:
- openenv
- reinforcement-learning
- microgrid
- energy
---
# GridOps — Community Microgrid Bridge Operator
> A production-grade OpenEnv RL environment for Indian community microgrid operation. Plug-and-play. Deterministic. Benchmarkable.
**Live demo**: [77ethers-gridops.hf.space/dashboard/](https://77ethers-gridops.hf.space/dashboard/) | **HF Space**: [huggingface.co/spaces/77ethers/gridops](https://huggingface.co/spaces/77ethers/gridops)
---
## At a Glance
| | |
|---|---|
| **Domain** | Real-world Indian community microgrid operation (100 homes, summer) |
| **Interface** | Full OpenEnv spec: `reset()` -> `step(action)` -> `state()`, typed Pydantic models |
| **Actions** | 3D continuous: `battery_dispatch [-1,1]`, `diesel_dispatch [0,1]`, `demand_shedding [0,1]` |
| **Observations** | 30+ fields: demand, solar, price, SOC, forecasts, energy flows. Partial observability (noisy forecasts). |
| **Tasks** | 3 tasks (easy -> medium -> hard), each testing a different RL capability |
| **Grading** | Deterministic, programmatic, 0.0-1.0. Same seed = same score, every run. |
| **Reward** | Dense per-step signal, aligned with episode grader (50% cost + 25% reliability + 25% green) |
| **Anti-gaming** | 5 mechanisms: degradation, startup costs, rebound, smooth VoLL, grid cap |
| **Baseline** | Grok-4 LLM: 0.80/0.82/0.72 — beats hand-coded oracle on all tasks |
| **Deployment** | Docker + HF Space + `openenv validate` 6/6 pass |
---
## Why This Environment Exists
Community microgrid operation is a **real job** in India under the [RDSS](https://rdss.gov.in/) (Revamped Distribution Sector Scheme). IEX prosumer bidding is live. Over 50 million Indian homes will have rooftop solar by 2030, and someone — or some agent — needs to manage the battery-grid-diesel tradeoff in real time.
This is not a toy problem. This is what a microgrid operator at an Indian housing society actually decides every hour:
- **Should I charge the battery now** (grid is cheap at Rs 4/kWh) **or save capacity for tonight** (price will spike to Rs 15)?
- **Should I run diesel** (Rs 25/kWh + Rs 100 startup) **or risk a blackout** (Rs 150/kWh VoLL penalty)?
- **Should I ask residents to reduce AC usage** (Rs 40/kWh + 100% rebounds tomorrow)?
Simple heuristics provably fail. The environment requires multi-hour planning, price forecasting, and constraint management under partial observability.
### What makes this a strong benchmark
- **Any agent can plug in immediately** — typed JSON actions in, typed observations out, no custom hacks
- **Fully deterministic** — same seed, same actions = identical trajectory every time. Leaderboard-ready.
- **Tasks differentiate agents** — Do-Nothing scores 0.45-0.58, Oracle 0.70-0.81, Grok-4 LLM 0.72-0.82. Clear skill gradient.
- **Can't be gamed** — 5 anti-exploit mechanisms prevent reward hacking (detailed below)
- **Grader = ground truth** — programmatic, deterministic, partial credit, aligned with per-step reward
---
## The Problem at a Glance
**You have:**
- **Solar panels** — 250 kW peak, free, but only during daylight
- **Community battery** — 500 kWh storage, 100 kW max charge/discharge
- **Diesel generator** — 100 kW, but Rs 25/kWh + Rs 100 startup cost
- **National grid** — auto-imports/exports as slack (capped at 200 kW)
**You control (3 continuous actions):**
| Action | Range | What it does |
|--------|-------|-------------|
| `battery_dispatch` | -1 to +1 | Charge (-100 kW) or discharge (+100 kW). Rs 2.5/kWh degradation. |
| `diesel_dispatch` | 0 to 1 | Diesel output (0-100 kW). Rs 25/kWh + Rs 100 startup if was off. |
| `demand_shedding` | 0 to 1 | Ask residents to cut 0-20% usage. **100% rebounds next hour.** Rs 40/kWh penalty. |
**You do NOT control the grid.** It automatically absorbs whatever energy gap remains after your decisions. If the gap exceeds 200 kW, that's a **blackout** (Rs 150/kWh penalty).
---
## The Critical Bottleneck
At **8 PM every evening**, demand hits **250 kW** but the grid maxes out at **200 kW** and solar is zero.
The **50 kW gap** must come from your battery. If you discharged it for profit during the day, the neighborhood goes dark.
On a heatwave day (Task 2-3), demand spikes to **325-375 kW**. Now the gap is **125-175 kW** — you need battery + diesel + shedding just to survive. And in Task 3, the grid goes down entirely for 6 hours.
---
## What the Agent Sees (Observation)
| Field | Description |
|-------|-------------|
| `hour` | Current hour in episode (0-72, starting 6 AM) |
| `demand_kw` | What the 100 homes need right now |
| `solar_kw` | Free solar power available (0 at night, up to 250 kW midday) |
| `battery_soc` | Battery charge level (0-1, i.e. 0-500 kWh) |
| `grid_price` | Current IEX electricity price (Rs 3-20/kWh) |
| `diesel_fuel_remaining` | Diesel tank level (0-1) |
| `diesel_is_on` | Was diesel running last step? (startup cost if turning on) |
| `demand_forecast_4h` | Noisy 4-hour demand forecast (+-15%) |
| `solar_forecast_4h` | Noisy 4-hour solar forecast |
| `price_forecast_4h` | Noisy 4-hour price forecast |
| `cumulative_blackout_kwh` | Total blackout energy so far |
| `cumulative_cost` | Total money spent so far (Rs) |
| `flow_*` | Detailed energy flows (solar, grid import/export, battery in/out, diesel, demand) |
**Partial observability**: forecasts have +-15% Gaussian noise. The agent cannot perfectly predict heatwave intensity, cloud cover, or price spikes.
---
## 3 Tasks (Each Tests a Different RL Capability)
### Task 1: Normal Summer (Easy) — *Tests basic arbitrage*
- Clear skies, standard demand (~100 kW avg, 250 kW peak)
- Grid prices Rs 3-12 with clear cheap night / expensive evening pattern
- **What the agent must learn**: charge battery at night (cheap grid), discharge during evening peak (expensive grid), let solar cover midday
### Task 2: Heatwave + Price Spike (Medium) — *Tests temporal planning*
- Day 2-3 heatwave (+30% demand), intermittent clouds
- **Rs 20 price spike** on Day 2 evening — visible in 4-hour forecast
- **What the agent must learn**: read the forecast, hold battery charge for the spike instead of greedily discharging early. A greedy policy discharges mid-afternoon; an RL agent that reads the forecast holds until 6 PM.
### Task 3: Extreme Crisis + Grid Outage (Hard) — *Tests constraint management*
- Full 3-day heatwave, -30% solar from haze, +50% demand
- Limited diesel (33% tank = ~8 hours at full power)
- **6-hour grid outage** on Day 2 afternoon — grid cap drops to 0 kW
- **What the agent must learn**: aggressively pre-charge battery before the outage, ration diesel across the outage window, shed demand strategically to stretch resources. This is true microgrid islanding.
---
## Grading (0.0 - 1.0)
```
score = 0.50 x cost_efficiency + 0.25 x reliability + 0.25 x green_score
```
| Component | Formula | What it rewards |
|-----------|---------|----------------|
| **Cost efficiency** (50%) | `1 - (agent_cost / baseline_cost)` | Spending less than a dumb "max grid import" baseline |
| **Reliability** (25%) | `(demand_met - blackout) / demand_met` | Keeping the lights on |
| **Green score** (25%) | `1 - (diesel_used / total_demand)` | Minimizing diesel emissions |
**Baseline**: "import max grid every hour, no battery/diesel/shedding" — physically possible, but expensive and suffers blackouts during peak hours and grid outages.
**VoLL (Value of Lost Load)**: Rs 150/kWh blackout penalty. This is a smooth gradient — no hard reliability cliff. The agent always gets signal for reducing blackouts incrementally.
---
## Why Heuristics Fail
| Strategy | Why it fails |
|----------|-------------|
| "Always discharge battery" | Empty by evening peak. 50 kW gap = blackout. Score collapses. |
| "Always run diesel" | Rs 25/kWh vs Rs 5 grid at night. Hemorrhages money. Green score = 0. |
| "Shed demand whenever short" | Rs 40/kWh cost + 100% rebounds next hour. More expensive than diesel. |
| "Discharge when price > X" | Ignores battery state. Drains SOC before the real peak. |
| "Do nothing" | Grid alone can't cover evening peak. 3.6% blackout rate. |
The oracle (rule-based, time-of-day + price-aware) scores 0.70-0.81. There's a clear **0.20-0.35 gap** between heuristics and the oracle, proving the environment has real optimization headroom.
---
## Anti-Gaming Design
The environment has 5 mechanisms that prevent reward hacking:
1. **Shedding is expensive** — Rs 40/kWh + 100% rebound. Costlier than diesel. True emergency only.
2. **Battery degradation** — Rs 2.5/kWh throughput. Prevents infinite cycling for tiny arbitrage.
3. **Diesel startup cost** — Rs 100 per on-switch. Prevents on/off toggling.
4. **VoLL is smooth** — Rs 150/kWh with no cliff. Agent can't exploit a binary gate.
5. **Grid is capped** — 200 kW max (0 during outages). Can't just buy everything.
---
## Baseline Scores
| Strategy | Task 1 | Task 2 | Task 3 | What it does |
|----------|--------|--------|--------|-------------|
| **Grok-4 (LLM)** | **0.80** | **0.82** | **0.72** | Reads observations, reasons about tradeoffs |
| **Oracle (rule-based)** | 0.79 | 0.81 | 0.70 | Time-of-day + price + SOC heuristic |
| Do-Nothing (grid only) | 0.58 | 0.51 | 0.45 | Grid covers everything it can |
| Always-Discharge | 0.59 | 0.51 | 0.45 | Drains battery, empty by evening |
| Always-Diesel | 0.42 | 0.42 | 0.44 | Rs 25/kWh burns money |
- **LLM beats oracle**: Grok-4 matched or exceeded the hand-coded oracle on every task
- **Deterministic**: identical scores across 3 runs (seeded RNG)
- **Oracle ceiling < 1.0**: real physics constraints, not inflated scores
- **Clear separation**: LLM > oracle >> heuristics (0.20-0.38 gap from best to worst)
- **Task 3 hardest**: grid outage makes it genuinely challenging even for frontier LLMs
---
## Key Physics
| Component | Spec | Cost |
|-----------|------|------|
| **Solar** | 250 kW peak, bell curve 6 AM - 6 PM | Free |
| **Battery** | 500 kWh, 100 kW max, 90% round-trip (sqrt each way) | Rs 2.5/kWh degradation |
| **Diesel** | 100 kW max | Rs 25/kWh + Rs 100 startup |
| **Grid** | 200 kW max import/export (slack variable) | Market price Rs 3-20/kWh |
| **Blackout** | Unmet demand when all sources exhausted | Rs 150/kWh VoLL penalty |
| **Shedding** | Up to 20% demand reduction | Rs 40/kWh + 100% rebound next hour |
**Energy balance every step:**
```
supply = solar + grid_import + battery_discharge + diesel
consume = effective_demand + grid_export + battery_charge
```
Supply always equals consumption. Any unmet demand beyond grid cap = blackout.
---
## Setup & Usage
```bash
# Install
pip install -e .
# Run server
uvicorn gridops.server.app:app --port 8000
# Interactive dashboard
open http://localhost:8000/dashboard/
# Validate oracle + determinism
python scripts/oracle_test.py
# Run LLM baseline
export API_BASE_URL="https://router.huggingface.co/v1"
export HF_TOKEN="your-token"
export MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct"
python inference.py
```
## Docker
```bash
docker build -t gridops .
docker run -p 8000:8000 gridops
```
## OpenEnv Validation
```bash
# Local structure check
openenv validate
# Runtime check (against live server)
openenv validate --url http://localhost:8000
```
---
## Project Structure
```
gridops/
├── inference.py # LLM baseline (API_BASE_URL, MODEL_NAME, HF_TOKEN)
├── openenv.yaml # OpenEnv manifest
├── Dockerfile # Docker deployment
├── server/app.py # Root entry point (openenv validate)
├── gridops/
│ ├── models.py # GridOpsAction, GridOpsObservation (Pydantic)
│ ├── simulation/
│ │ ├── physics.py # Energy balance, battery, VoLL, degradation, outages
│ │ └── scenarios.py # Demand/solar/price curve generators
│ ├── tasks/
│ │ ├── definitions.py # 3 task configs (normal, heatwave, crisis+outage)
│ │ └── graders.py # 0-1 scoring: cost + reliability + green
│ └── server/
│ ├── app.py # FastAPI + OpenEnv create_app
│ ├── environment.py # OpenEnv Environment class
│ └── static/index.html # Interactive dashboard with energy flows
└── scripts/
└── oracle_test.py # Oracle + heuristic validation + determinism check
```
---
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/schema` | GET | Action/observation/state JSON schemas |
| `/metadata` | GET | Environment name and description |
| `/reset` | POST | Reset environment (OpenEnv standard) |
| `/step` | POST | Execute action (OpenEnv standard) |
| `/state` | GET | Current state (OpenEnv standard) |
| `/ws` | WebSocket | Persistent session (OpenEnv standard) |
| `/api/reset` | POST | Stateful reset (dashboard) |
| `/api/step` | POST | Stateful step (dashboard) |
| `/api/state` | GET | Stateful state (dashboard) |
| `/tasks` | GET | List available tasks |
| `/dashboard/` | GET | Interactive web UI |