| --- |
| title: GridOps |
| emoji: ⚡ |
| colorFrom: blue |
| colorTo: indigo |
| sdk: docker |
| app_port: 8000 |
| tags: |
| - openenv |
| - reinforcement-learning |
| - microgrid |
| - energy |
| --- |
| |
| # GridOps — Community Microgrid Bridge Operator |
|
|
| > A production-grade OpenEnv RL environment for Indian community microgrid operation. Plug-and-play. Deterministic. Benchmarkable. |
|
|
| **Live demo**: [77ethers-gridops.hf.space/dashboard/](https://77ethers-gridops.hf.space/dashboard/) | **HF Space**: [huggingface.co/spaces/77ethers/gridops](https://huggingface.co/spaces/77ethers/gridops) |
|
|
| --- |
|
|
| ## At a Glance |
|
|
| | | | |
| |---|---| |
| | **Domain** | Real-world Indian community microgrid operation (100 homes, summer) | |
| | **Interface** | Full OpenEnv spec: `reset()` -> `step(action)` -> `state()`, typed Pydantic models | |
| | **Actions** | 3D continuous: `battery_dispatch [-1,1]`, `diesel_dispatch [0,1]`, `demand_shedding [0,1]` | |
| | **Observations** | 30+ fields: demand, solar, price, SOC, forecasts, energy flows. Partial observability (noisy forecasts). | |
| | **Tasks** | 3 tasks (easy -> medium -> hard), each testing a different RL capability | |
| | **Grading** | Deterministic, programmatic, 0.0-1.0. Same seed = same score, every run. | |
| | **Reward** | Dense per-step signal, aligned with episode grader (50% cost + 25% reliability + 25% green) | |
| | **Anti-gaming** | 5 mechanisms: degradation, startup costs, rebound, smooth VoLL, grid cap | |
| | **Baseline** | Grok-4 LLM: 0.80/0.82/0.72 — beats hand-coded oracle on all tasks | |
| | **Deployment** | Docker + HF Space + `openenv validate` 6/6 pass | |
|
|
| --- |
|
|
| ## SFT Training Pipeline Upgrade |
|
|
| This branch adds a CarbonAlpha-style training harness around the original GridOps environment without changing the public OpenEnv API. |
|
|
| | Artifact | Link | |
| |---|---| |
| | Shared prompt/action contract | [`gridops/prompting.py`](gridops/prompting.py) | |
| | Reusable oracle + adversarial policies | [`gridops/policies.py`](gridops/policies.py) | |
| | 1,200-row curriculum dataset | [`sft_traces/gridops_curriculum_1200.jsonl`](sft_traces/gridops_curriculum_1200.jsonl) | |
| | Trace generator | [`scripts/generate_sft_traces.py`](scripts/generate_sft_traces.py) | |
| | OpenRouter/DeepSeek trace generator | [`scripts/generate_openrouter_deepseek_traces.py`](scripts/generate_openrouter_deepseek_traces.py) | |
| | Trace validator | [`scripts/validate_traces.py`](scripts/validate_traces.py) | |
| | Holdout/adversarial evaluator | [`scripts/evaluate_gridops_model.py`](scripts/evaluate_gridops_model.py) | |
| | Local adapter evaluator | [`scripts/evaluate_gridops_adapter.py`](scripts/evaluate_gridops_adapter.py) | |
| | Guarded SFT script | [`scripts/hf_sft_gridops.py`](scripts/hf_sft_gridops.py) | |
| | Eval plotter | [`scripts/plot_gridops_evals.py`](scripts/plot_gridops_evals.py) | |
| | Colab-ready notebook | [`notebooks/gridops_sft_pipeline.ipynb`](notebooks/gridops_sft_pipeline.ipynb) | |
| | Model card | [`GRIDOPS_MODEL_CARD.md`](GRIDOPS_MODEL_CARD.md) | |
|
|
| The first milestone is **SFT only**: teach a compact model to emit valid JSON actions for each hourly observation. The first adapter passed the SFT gate on held-out seeds `7001,7002,7003`. |
|
|
| | Model | Avg score | Valid JSON | Task 1 | Task 2 | Task 3 | |
| |---|---:|---:|---:|---:|---:| |
| | Do-nothing | 0.5133 | 100.00% | 0.5820 | 0.5057 | 0.4522 | |
| | GridOps SFT v1 | 0.6854 | 99.85% | 0.6615 | 0.7300 | 0.6648 | |
| | Oracle | 0.7688 | 100.00% | 0.7932 | 0.8087 | 0.7046 | |
|
|
| | Gate | Target | |
| |---|---:| |
| | Valid JSON action rate | >= 98% | |
| | Average holdout score | >= 0.65 | |
| | No task below do-nothing baseline | required | |
| | Task 3 crisis score | >= 0.55 | |
| | Fixed-seed determinism | stable | |
|
|
| Final SFT v1 artifact: |
|
|
| ```text |
| Qwen/Qwen2.5-3B-Instruct -> QLoRA SFT adapter: |
| 77ethers/gridops-models/sft_qwen25_3b_gridops_mixed1418_v1 |
| ``` |
|
|
| Evidence: |
|
|
| - [SFT training curve](evals/plots/gridops_sft_training_curve.png) |
| - [Holdout scores](evals/plots/gridops_holdout_scores.png) |
| - [Battery throughput](evals/plots/gridops_battery_throughput.png) |
| - [Blackout reduction](evals/plots/gridops_blackout_kwh.png) |
| - [Holdout summary JSON](evals/plots/gridops_holdout_summary.json) |
|
|
| The existing leaderboard remains historical. The table above is reported separately as **GridOps SFT v1**. |
|
|
| --- |
|
|
| ## Why This Environment Exists |
|
|
| Community microgrid operation is a **real job** in India under the [RDSS](https://rdss.gov.in/) (Revamped Distribution Sector Scheme). IEX prosumer bidding is live. Over 50 million Indian homes will have rooftop solar by 2030, and someone — or some agent — needs to manage the battery-grid-diesel tradeoff in real time. |
|
|
| This is not a toy problem. This is what a microgrid operator at an Indian housing society actually decides every hour: |
|
|
| - **Should I charge the battery now** (grid is cheap at Rs 4/kWh) **or save capacity for tonight** (price will spike to Rs 15)? |
| - **Should I run diesel** (Rs 25/kWh + Rs 100 startup) **or risk a blackout** (Rs 150/kWh VoLL penalty)? |
| - **Should I ask residents to reduce AC usage** (Rs 40/kWh + 100% rebounds tomorrow)? |
|
|
| Simple heuristics provably fail. The environment requires multi-hour planning, price forecasting, and constraint management under partial observability. |
|
|
| ### What makes this a strong benchmark |
|
|
| - **Any agent can plug in immediately** — typed JSON actions in, typed observations out, no custom hacks |
| - **Fully deterministic** — same seed, same actions = identical trajectory every time. Leaderboard-ready. |
| - **Tasks differentiate agents** — Do-Nothing scores 0.45-0.58, Oracle 0.70-0.81, Grok-4 LLM 0.72-0.82. Clear skill gradient. |
| - **Can't be gamed** — 5 anti-exploit mechanisms prevent reward hacking (detailed below) |
| - **Grader = ground truth** — programmatic, deterministic, partial credit, aligned with per-step reward |
|
|
| --- |
|
|
| ## The Problem at a Glance |
|
|
| **You have:** |
| - **Solar panels** — 250 kW peak, free, but only during daylight |
| - **Community battery** — 500 kWh storage, 100 kW max charge/discharge |
| - **Diesel generator** — 100 kW, but Rs 25/kWh + Rs 100 startup cost |
| - **National grid** — auto-imports/exports as slack (capped at 200 kW) |
|
|
| **You control (3 continuous actions):** |
|
|
| | Action | Range | What it does | |
| |--------|-------|-------------| |
| | `battery_dispatch` | -1 to +1 | Charge (-100 kW) or discharge (+100 kW). Rs 2.5/kWh degradation. | |
| | `diesel_dispatch` | 0 to 1 | Diesel output (0-100 kW). Rs 25/kWh + Rs 100 startup if was off. | |
| | `demand_shedding` | 0 to 1 | Ask residents to cut 0-20% usage. **100% rebounds next hour.** Rs 40/kWh penalty. | |
|
|
| **You do NOT control the grid.** It automatically absorbs whatever energy gap remains after your decisions. If the gap exceeds 200 kW, that's a **blackout** (Rs 150/kWh penalty). |
|
|
| --- |
|
|
| ## The Critical Bottleneck |
|
|
| At **8 PM every evening**, demand hits **250 kW** but the grid maxes out at **200 kW** and solar is zero. |
|
|
| The **50 kW gap** must come from your battery. If you discharged it for profit during the day, the neighborhood goes dark. |
|
|
| On a heatwave day (Task 2-3), demand spikes to **325-375 kW**. Now the gap is **125-175 kW** — you need battery + diesel + shedding just to survive. And in Task 3, the grid goes down entirely for 6 hours. |
|
|
| --- |
|
|
| ## What the Agent Sees (Observation) |
|
|
| | Field | Description | |
| |-------|-------------| |
| | `hour` | Current hour in episode (0-72, starting 6 AM) | |
| | `demand_kw` | What the 100 homes need right now | |
| | `solar_kw` | Free solar power available (0 at night, up to 250 kW midday) | |
| | `battery_soc` | Battery charge level (0-1, i.e. 0-500 kWh) | |
| | `grid_price` | Current IEX electricity price (Rs 3-20/kWh) | |
| | `diesel_fuel_remaining` | Diesel tank level (0-1) | |
| | `diesel_is_on` | Was diesel running last step? (startup cost if turning on) | |
| | `demand_forecast_4h` | Noisy 4-hour demand forecast (+-15%) | |
| | `solar_forecast_4h` | Noisy 4-hour solar forecast | |
| | `price_forecast_4h` | Noisy 4-hour price forecast | |
| | `cumulative_blackout_kwh` | Total blackout energy so far | |
| | `cumulative_cost` | Total money spent so far (Rs) | |
| | `flow_*` | Detailed energy flows (solar, grid import/export, battery in/out, diesel, demand) | |
|
|
| **Partial observability**: forecasts have +-15% Gaussian noise. The agent cannot perfectly predict heatwave intensity, cloud cover, or price spikes. |
|
|
| --- |
|
|
| ## 3 Tasks (Each Tests a Different RL Capability) |
|
|
| ### Task 1: Normal Summer (Easy) — *Tests basic arbitrage* |
| - Clear skies, standard demand (~100 kW avg, 250 kW peak) |
| - Grid prices Rs 3-12 with clear cheap night / expensive evening pattern |
| - **What the agent must learn**: charge battery at night (cheap grid), discharge during evening peak (expensive grid), let solar cover midday |
|
|
| ### Task 2: Heatwave + Price Spike (Medium) — *Tests temporal planning* |
| - Day 2-3 heatwave (+30% demand), intermittent clouds |
| - **Rs 20 price spike** on Day 2 evening — visible in 4-hour forecast |
| - **What the agent must learn**: read the forecast, hold battery charge for the spike instead of greedily discharging early. A greedy policy discharges mid-afternoon; an RL agent that reads the forecast holds until 6 PM. |
|
|
| ### Task 3: Extreme Crisis + Grid Outage (Hard) — *Tests constraint management* |
| - Full 3-day heatwave, -30% solar from haze, +50% demand |
| - Limited diesel (33% tank = ~8 hours at full power) |
| - **6-hour grid outage** on Day 2 afternoon — grid cap drops to 0 kW |
| - **What the agent must learn**: aggressively pre-charge battery before the outage, ration diesel across the outage window, shed demand strategically to stretch resources. This is true microgrid islanding. |
|
|
| --- |
|
|
| ## Grading (0.0 - 1.0) |
|
|
| ``` |
| score = 0.50 x cost_efficiency + 0.25 x reliability + 0.25 x green_score |
| ``` |
|
|
| | Component | Formula | What it rewards | |
| |-----------|---------|----------------| |
| | **Cost efficiency** (50%) | `1 - (agent_cost / baseline_cost)` | Spending less than a dumb "max grid import" baseline | |
| | **Reliability** (25%) | `(demand_met - blackout) / demand_met` | Keeping the lights on | |
| | **Green score** (25%) | `1 - (diesel_used / total_demand)` | Minimizing diesel emissions | |
|
|
| **Baseline**: "import max grid every hour, no battery/diesel/shedding" — physically possible, but expensive and suffers blackouts during peak hours and grid outages. |
|
|
| **VoLL (Value of Lost Load)**: Rs 150/kWh blackout penalty. This is a smooth gradient — no hard reliability cliff. The agent always gets signal for reducing blackouts incrementally. |
|
|
| --- |
|
|
| ## Why Heuristics Fail |
|
|
| | Strategy | Why it fails | |
| |----------|-------------| |
| | "Always discharge battery" | Empty by evening peak. 50 kW gap = blackout. Score collapses. | |
| | "Always run diesel" | Rs 25/kWh vs Rs 5 grid at night. Hemorrhages money. Green score = 0. | |
| | "Shed demand whenever short" | Rs 40/kWh cost + 100% rebounds next hour. More expensive than diesel. | |
| | "Discharge when price > X" | Ignores battery state. Drains SOC before the real peak. | |
| | "Do nothing" | Grid alone can't cover evening peak. 3.6% blackout rate. | |
|
|
| The oracle (rule-based, time-of-day + price-aware) scores 0.70-0.81. There's a clear **0.20-0.35 gap** between heuristics and the oracle, proving the environment has real optimization headroom. |
|
|
| --- |
|
|
| ## Anti-Gaming Design |
|
|
| The environment has 5 mechanisms that prevent reward hacking: |
|
|
| 1. **Shedding is expensive** — Rs 40/kWh + 100% rebound. Costlier than diesel. True emergency only. |
| 2. **Battery degradation** — Rs 2.5/kWh throughput. Prevents infinite cycling for tiny arbitrage. |
| 3. **Diesel startup cost** — Rs 100 per on-switch. Prevents on/off toggling. |
| 4. **VoLL is smooth** — Rs 150/kWh with no cliff. Agent can't exploit a binary gate. |
| 5. **Grid is capped** — 200 kW max (0 during outages). Can't just buy everything. |
|
|
| --- |
|
|
| ## Baseline Scores |
|
|
| | Strategy | Task 1 | Task 2 | Task 3 | What it does | |
| |----------|--------|--------|--------|-------------| |
| | **Grok-4 (LLM)** | **0.80** | **0.82** | **0.72** | Reads observations, reasons about tradeoffs | |
| | **Oracle (rule-based)** | 0.79 | 0.81 | 0.70 | Time-of-day + price + SOC heuristic | |
| | Do-Nothing (grid only) | 0.58 | 0.51 | 0.45 | Grid covers everything it can | |
| | Always-Discharge | 0.59 | 0.51 | 0.45 | Drains battery, empty by evening | |
| | Always-Diesel | 0.42 | 0.42 | 0.44 | Rs 25/kWh burns money | |
|
|
| - **LLM beats oracle**: Grok-4 matched or exceeded the hand-coded oracle on every task |
| - **Deterministic**: identical scores across 3 runs (seeded RNG) |
| - **Oracle ceiling < 1.0**: real physics constraints, not inflated scores |
| - **Clear separation**: LLM > oracle >> heuristics (0.20-0.38 gap from best to worst) |
| - **Task 3 hardest**: grid outage makes it genuinely challenging even for frontier LLMs |
|
|
| --- |
|
|
| ## Key Physics |
|
|
| | Component | Spec | Cost | |
| |-----------|------|------| |
| | **Solar** | 250 kW peak, bell curve 6 AM - 6 PM | Free | |
| | **Battery** | 500 kWh, 100 kW max, 90% round-trip (sqrt each way) | Rs 2.5/kWh degradation | |
| | **Diesel** | 100 kW max | Rs 25/kWh + Rs 100 startup | |
| | **Grid** | 200 kW max import/export (slack variable) | Market price Rs 3-20/kWh | |
| | **Blackout** | Unmet demand when all sources exhausted | Rs 150/kWh VoLL penalty | |
| | **Shedding** | Up to 20% demand reduction | Rs 40/kWh + 100% rebound next hour | |
|
|
| **Energy balance every step:** |
| ``` |
| supply = solar + grid_import + battery_discharge + diesel |
| consume = effective_demand + grid_export + battery_charge |
| ``` |
| Supply always equals consumption. Any unmet demand beyond grid cap = blackout. |
|
|
| --- |
|
|
| ## Setup & Usage |
|
|
| ```bash |
| # Install |
| pip install -e . |
| |
| # Run server |
| uvicorn gridops.server.app:app --port 8000 |
| |
| # Interactive dashboard |
| open http://localhost:8000/dashboard/ |
| |
| # Validate oracle + determinism |
| python scripts/oracle_test.py |
| |
| # Generate and validate the SFT curriculum |
| python scripts/generate_sft_traces.py |
| python scripts/validate_traces.py sft_traces/gridops_curriculum_1200.jsonl |
| |
| # Optional: generate 10-at-a-time teacher traces with DeepSeek on OpenRouter |
| export API_BASE_URL="https://openrouter.ai/api/v1" |
| export OPENROUTER_API_KEY="your-token" |
| python scripts/generate_openrouter_deepseek_traces.py --model deepseek/deepseek-v4-pro |
| |
| # Evaluate reusable policies on holdout seeds |
| python scripts/evaluate_gridops_model.py --policy oracle |
| python scripts/evaluate_gridops_model.py --policy do_nothing |
| |
| # Evaluate an API-hosted or HF-router model with the SFT prompt contract |
| export HF_API_TOKEN="your-token" |
| export MODEL_NAME="your-gridops-sft-endpoint-or-model" |
| python scripts/evaluate_gridops_model.py --model-name "$MODEL_NAME" |
| |
| # Run LLM baseline |
| export API_BASE_URL="https://router.huggingface.co/v1" |
| export HF_TOKEN="your-token" |
| export MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct" |
| python inference.py |
| ``` |
|
|
| ## Docker |
|
|
| ```bash |
| docker build -t gridops . |
| docker run -p 8000:8000 gridops |
| ``` |
|
|
| ## OpenEnv Validation |
|
|
| ```bash |
| # Local structure check |
| openenv validate |
| |
| # Runtime check (against live server) |
| openenv validate --url http://localhost:8000 |
| ``` |
|
|
| --- |
|
|
| ## Project Structure |
|
|
| ``` |
| gridops/ |
| ├── inference.py # LLM baseline (API_BASE_URL, MODEL_NAME, HF_TOKEN) |
| ├── openenv.yaml # OpenEnv manifest |
| ├── Dockerfile # Docker deployment |
| ├── server/app.py # Root entry point (openenv validate) |
| ├── gridops/ |
| │ ├── models.py # GridOpsAction, GridOpsObservation (Pydantic) |
| │ ├── simulation/ |
| │ │ ├── physics.py # Energy balance, battery, VoLL, degradation, outages |
| │ │ └── scenarios.py # Demand/solar/price curve generators |
| │ ├── tasks/ |
| │ │ ├── definitions.py # 3 task configs (normal, heatwave, crisis+outage) |
| │ │ └── graders.py # 0-1 scoring: cost + reliability + green |
| │ └── server/ |
| │ ├── app.py # FastAPI + OpenEnv create_app |
| │ ├── environment.py # OpenEnv Environment class |
| │ └── static/index.html # Interactive dashboard with energy flows |
| └── scripts/ |
| └── oracle_test.py # Oracle + heuristic validation + determinism check |
| ``` |
|
|
| --- |
|
|
| ## API Endpoints |
|
|
| | Endpoint | Method | Description | |
| |----------|--------|-------------| |
| | `/health` | GET | Health check | |
| | `/schema` | GET | Action/observation/state JSON schemas | |
| | `/metadata` | GET | Environment name and description | |
| | `/reset` | POST | Reset environment (OpenEnv standard) | |
| | `/step` | POST | Execute action (OpenEnv standard) | |
| | `/state` | GET | Current state (OpenEnv standard) | |
| | `/ws` | WebSocket | Persistent session (OpenEnv standard) | |
| | `/api/reset` | POST | Stateful reset (dashboard) | |
| | `/api/step` | POST | Stateful step (dashboard) | |
| | `/api/state` | GET | Stateful state (dashboard) | |
| | `/tasks` | GET | List available tasks | |
| | `/dashboard/` | GET | Interactive web UI | |
|
|