title: OpenEnv SmartGrid MarketSim
emoji: ⚡
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
⚡ OpenEnv SmartGrid MarketSim
In this market, physics has veto power.
Power markets optimize economics. Grid operators preserve stability. Physics enforces hard limits. OpenEnv SmartGrid MarketSim is a reinforcement learning environment where all three collide — and agents learn whether the grid survives.
OpenEnv SmartGrid MarketSim is a multi-agent reinforcement learning environment for training reliability-aware agents in strategic electricity markets under uncertainty, contingencies, and physical constraints.
Links
- HF Space: https://huggingface.co/spaces/Not-OmKar/grid
- HF Blog: https://huggingface.co/spaces/Not-OmKar/grid/blob/main/Blog.md
- Google Colaboratory Notebook: https://colab.research.google.com/drive/1vqpgNIqGJxZeG2zzoZQKLpiWl172rbB0?usp=sharing
- Media and Resources: https://drive.google.com/drive/folders/1fOHawUaYBbp9A4yEDU7jVADd8QbGaOtj?usp=sharing
This is not a toy market simulator.
It is a training ground for resilient infrastructure intelligence.
Agents do not merely learn how to maximize reward. They learn how to:
- Coordinate under conflicting incentives
- Respond to shocks and outages
- Preserve grid reliability under stress
- Optimize within physical feasibility limits
- Balance economic strategy with system resilience
Why This Environment Exists
Modern power systems face a structural intelligence problem.
Markets optimize price. Control systems optimize stability. Operators manage emergencies.
Real grids require all three simultaneously.
Most existing learning environments isolate these challenges. This environment combines them.
Core Hypothesis
If agents train in a world where economic strategy is filtered through dispatch intelligence and hard physical constraints, they can learn reliability-aware strategic behavior instead of brittle reward maximization.
That is the problem this benchmark targets.
What Makes This Environment Novel
This environment combines three interacting intelligence layers.
1. Strategic Multi-Agent Electricity Market
Agents participate as:
- Renewable prosumers
- Industrial load participants
- Peaker generators
- Flexible EV storage resources
Agents submit bids and interact through strategic market clearing influenced by leader price signals.
This creates a partially cooperative, partially competitive game.
2. Reliability Dispatch Control Agent
A dedicated dispatch intelligence layer observes:
- Scarcity conditions
- Forecast gaps
- Reserve risks
- Contingencies
- Renewable uncertainty
It intervenes through:
- Reserve activation
- Corrective redispatch
- Storage balancing
- Peaker adjustments
- Emergency support actions
This turns the environment into more than a market. It becomes a reliability coordination game.
3. Physics-Constrained Safety Shield
All proposed actions pass through a safety layer enforcing:
- EV SOC bounds
- Ramp-rate constraints
- Reserve adequacy
- Frequency and line-loading proxies
- Emergency support logic
- Constraint correction and feasibility enforcement
Policies may propose. Physics decides.
Unsafe strategies cannot exploit the environment.
Environment Architecture
Each environment step executes:
- Policy Action Selection
- Market Clearing
- Dispatch Control Decision
- Physics Safety Enforcement
- Reward Computation
- State Evolution Under Uncertainty
This creates a closed-loop strategic learning system.
Policy Actions
↓
Market Clearing
↓
Dispatch Control Agent
↓
Physics Safety Shield
↓
Reward Computation
↓
State Evolution
What Agents Learn
Agents are not trained to optimize price alone.
They learn tradeoffs among:
- Reliability
- Cost efficiency
- Stability
- Reserve adequacy
- Renewable utilization
- Constraint compliance
Sometimes the profitable move loses. The resilient move wins.
That is deliberate.
Reward Design
Reward is structured as staged rubrics.
Reliability Stage
Can the grid remain operational?
Service Stage
Is demand satisfied?
Optimization Stage
Is dispatch economically efficient?
Stability Stage
Are system risks controlled?
Final rewards incorporate anti-hacking penalties for:
- Blackouts
- Constraint violations
- Reserve shortfalls
- Unsafe exploitation
- Stability failures
High-level reward structure:
Reward = Reliability
+ Service
+ Optimization
+ Stability
- Safety Penalties
This prevents single-metric reward hacking. Agents must learn robust behavior.
RL Training in the Environment
This environment is built not only for simulation, but for training.
Current training stack includes:
- OpenEnv interaction loop
- Hugging Face TRL
- GRPO-style reinforcement optimization
- Curriculum learning across stress scenarios
- Multi-agent policy benchmarking
Policy comparisons include:
- Random baseline
- Heuristic policies
- Adaptive policies
- Trained RL agents
Training evaluates improvement through:
- Cumulative reward growth
- Blackout reduction
- Reserve shortfall reduction
- Stability-event reduction
- Candidate vs baseline win rates
Success is defined by improved behavior inside the environment.
Not better text outputs. Better policies.
OpenEnv Themes Alignment
This environment spans multiple OpenEnv hackathon themes.
Theme #1 — Multi-Agent Interactions
Strategic bidding, negotiation, competition and coordination.
Theme #2 — Long-Horizon Planning
Delayed consequences, contingency response and long-horizon resilience.
Theme #3 — World Modeling
Partial observability, tool-like control interaction, dynamic infrastructure simulation.
Environment Tasks
default
Standard strategic bidding with reliability-aware dispatch.
long_horizon
Longer planning horizons with delayed system effects.
stress_shock
Shock-heavy reliability stress testing.
outage
N-1 style outage and contingency scenarios.
renewable_collapse
Severe renewable drop and forecast error regimes.
These scenarios test both optimization and resilience.
Example Observation Signals
Agents observe:
- Demand levels
- Renewable availability
- Scarcity index
- Clearing prices
- Reserve conditions
- Forecast errors
- Contingency flags
- Stability risk indicators
This supports strategic reasoning under uncertainty.
Example Action Space
Joint actions include:
- Strategic supply and demand bids
- EV charge / discharge decisions
- Reserve activation
- Storage dispatch
- Corrective redispatch
Multi-agent strategy and operational control coexist.
Safety Constraints Enforced
Physics shield enforces:
- No infeasible dispatch
- No simultaneous charge/discharge exploits
- Ramp limits respected
- Reserve commitments maintained
- Emergency support triggered when necessary
Learned policies cannot bypass safety.
Benchmark Evidence
Evaluation artifacts include:
- Reward curves
- Policy comparisons
- Ablation results
- Resilience stress benchmarks
- Pairwise policy win-rate analysis
- Trajectory and metrics artifacts
Example Metrics
| Metric | Baseline | Trained Agent | Goal |
|---|---|---|---|
| Average Reward | -- | -- | Increase |
| Blackout Rate | -- | -- | Reduce |
| Reserve Shortfalls | -- | -- | Reduce |
| Stability Events | -- | -- | Reduce |
| Constraint Violations | -- | -- | Reduce |
Demo Walkthrough
Recommended judge walkthrough:
1. Normal Market Scenario
Show strategic equilibrium and reliability telemetry.
2. Inject Shock
Trigger renewable collapse or contingency.
Show stress emergence.
3. Safety Shield Intervention
Demonstrate physics-corrected behavior.
4. Baseline vs Trained Agent
Show measurable improvement.
The objective is not only to show the environment runs.
It is to show learning.
Why This Matters
This environment studies a broader question:
Can intelligent agents learn strategic behavior under economic incentives while respecting hard safety constraints?
Power systems are the domain. Reliability-aware intelligence is the larger problem.
Potential applications:
- Smart-grid autonomy
- Infrastructure agents
- Safe multi-agent RL
- Cyber-physical agent training
- Reliability-constrained autonomous systems
Repository Layout
main.py
openenv.yaml
smartgrid_mas/
├── env.py
├── tasks.py
├── models.py
├── engine/
│ ├── market.py
│ ├── control.py
│ ├── ldu.py
│ ├── reward.py
│ └── dynamics.py
tests/
artifacts/
Quick Start
Install
python -m venv .venv
source .venv/bin/activate
pip install -e .
Start Server
python main.py
Default endpoints:
- API: http://localhost:7860
- Docs: http://localhost:7860/docs
- Demo: http://localhost:7860/demo
Run Example
Reset environment:
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task_id":"stress_shock","seed":42}'
Run deterministic demo:
curl -X POST http://localhost:7860/run-demo-mode
Run resilience comparison:
curl -X POST http://localhost:7860/run-resilience-demo
Training & Benchmarking
Generate deterministic artifacts:
generate-demo-artifacts
Run resilience benchmark:
train-baseline
Run tests:
pytest -q
OpenEnv Compliance
Includes:
- OpenEnv-compatible environment interface
- reset / step / state API
- openenv.yaml metadata
- Hosted Space deployment
- RL training integration support
- Reproducible benchmark artifacts
Research Framing
This project can be viewed as a benchmark for:
- Safety-shielded RL
- Reliability-aware multi-agent intelligence
- Strategic infrastructure world modeling
- Reward-hacking resistant environment design
It is not merely a simulator.
It is a trainable environment for studying resilient intelligence.
Citation / If Using This Environment
If this environment contributes to research or experimentation, please cite the repository and benchmark artifacts.
Closing Thought
Most reinforcement learning environments teach agents how to optimize.
This environment asks whether agents can learn how to preserve critical systems under uncertainty.
That is the benchmark. That is the experiment. That is OpenEnv SmartGrid MarketSim.