Spaces:

Not-OmKar
/

grid

Sleeping

App Files Files Community

grid / README.md

Not-OmKar

Link addition

98fad8b about 1 month ago

preview code

raw

history blame contribute delete

11.1 kB

metadata

title: OpenEnv SmartGrid MarketSim
emoji: ⚡
colorFrom: green
colorTo: blue
sdk: docker
pinned: false

⚡ OpenEnv SmartGrid MarketSim

In this market, physics has veto power.

Power markets optimize economics. Grid operators preserve stability. Physics enforces hard limits. OpenEnv SmartGrid MarketSim is a reinforcement learning environment where all three collide — and agents learn whether the grid survives.

OpenEnv SmartGrid MarketSim is a multi-agent reinforcement learning environment for training reliability-aware agents in strategic electricity markets under uncertainty, contingencies, and physical constraints.

Why This Environment Exists

Modern power systems face a structural intelligence problem.

Markets optimize price. Control systems optimize stability. Operators manage emergencies.

Real grids require all three simultaneously.

Most existing learning environments isolate these challenges. This environment combines them.

Core Hypothesis

If agents train in a world where economic strategy is filtered through dispatch intelligence and hard physical constraints, they can learn reliability-aware strategic behavior instead of brittle reward maximization.

That is the problem this benchmark targets.

What Makes This Environment Novel

This environment combines three interacting intelligence layers.

1. Strategic Multi-Agent Electricity Market

Agents participate as:

Renewable prosumers
Industrial load participants
Peaker generators
Flexible EV storage resources

Agents submit bids and interact through strategic market clearing influenced by leader price signals.

This creates a partially cooperative, partially competitive game.

2. Reliability Dispatch Control Agent

A dedicated dispatch intelligence layer observes:

Scarcity conditions
Forecast gaps
Reserve risks
Contingencies
Renewable uncertainty

It intervenes through:

Reserve activation
Corrective redispatch
Storage balancing
Peaker adjustments
Emergency support actions

This turns the environment into more than a market. It becomes a reliability coordination game.

3. Physics-Constrained Safety Shield

All proposed actions pass through a safety layer enforcing:

EV SOC bounds
Ramp-rate constraints
Reserve adequacy
Frequency and line-loading proxies
Emergency support logic
Constraint correction and feasibility enforcement

Policies may propose. Physics decides.

Unsafe strategies cannot exploit the environment.

Environment Architecture

Each environment step executes:

Policy Action Selection
Market Clearing
Dispatch Control Decision
Physics Safety Enforcement
Reward Computation
State Evolution Under Uncertainty

This creates a closed-loop strategic learning system.

Policy Actions
   ↓
Market Clearing
   ↓
Dispatch Control Agent
   ↓
Physics Safety Shield
   ↓
Reward Computation
   ↓
State Evolution

What Agents Learn

Agents are not trained to optimize price alone.

They learn tradeoffs among:

Reliability
Cost efficiency
Stability
Reserve adequacy
Renewable utilization
Constraint compliance

Sometimes the profitable move loses. The resilient move wins.

That is deliberate.

Reward Design

Reward is structured as staged rubrics.

Reliability Stage

Can the grid remain operational?

Service Stage

Is demand satisfied?

Optimization Stage

Is dispatch economically efficient?

Stability Stage

Are system risks controlled?

Final rewards incorporate anti-hacking penalties for:

Blackouts
Constraint violations
Reserve shortfalls
Unsafe exploitation
Stability failures

High-level reward structure:

Reward = Reliability
       + Service
       + Optimization
       + Stability
       - Safety Penalties

This prevents single-metric reward hacking. Agents must learn robust behavior.

RL Training in the Environment

This environment is built not only for simulation, but for training.

Current training stack includes:

OpenEnv interaction loop
Hugging Face TRL
GRPO-style reinforcement optimization
Curriculum learning across stress scenarios
Multi-agent policy benchmarking

Policy comparisons include:

Random baseline
Heuristic policies
Adaptive policies
Trained RL agents

Training evaluates improvement through:

Cumulative reward growth
Blackout reduction
Reserve shortfall reduction
Stability-event reduction
Candidate vs baseline win rates

Success is defined by improved behavior inside the environment.

Not better text outputs. Better policies.

OpenEnv Themes Alignment

This environment spans multiple OpenEnv hackathon themes.

Theme #1 — Multi-Agent Interactions

Strategic bidding, negotiation, competition and coordination.

Theme #2 — Long-Horizon Planning

Delayed consequences, contingency response and long-horizon resilience.

Theme #3 — World Modeling

Partial observability, tool-like control interaction, dynamic infrastructure simulation.

Environment Tasks

default

Standard strategic bidding with reliability-aware dispatch.

long_horizon

Longer planning horizons with delayed system effects.

stress_shock

Shock-heavy reliability stress testing.

outage

N-1 style outage and contingency scenarios.

renewable_collapse

Severe renewable drop and forecast error regimes.

These scenarios test both optimization and resilience.

Example Observation Signals

Agents observe:

Demand levels
Renewable availability
Scarcity index
Clearing prices
Reserve conditions
Forecast errors
Contingency flags
Stability risk indicators

This supports strategic reasoning under uncertainty.

Example Action Space

Joint actions include:

Strategic supply and demand bids
EV charge / discharge decisions
Reserve activation
Storage dispatch
Corrective redispatch

Multi-agent strategy and operational control coexist.

Safety Constraints Enforced

Physics shield enforces:

No infeasible dispatch
No simultaneous charge/discharge exploits
Ramp limits respected
Reserve commitments maintained
Emergency support triggered when necessary

Learned policies cannot bypass safety.

Benchmark Evidence

Evaluation artifacts include:

Reward curves
Policy comparisons
Ablation results
Resilience stress benchmarks
Pairwise policy win-rate analysis
Trajectory and metrics artifacts

Example Metrics

Metric	Baseline	Trained Agent	Goal
Average Reward	--	--	Increase
Blackout Rate	--	--	Reduce
Reserve Shortfalls	--	--	Reduce
Stability Events	--	--	Reduce
Constraint Violations	--	--	Reduce

Demo Walkthrough

Recommended judge walkthrough:

1. Normal Market Scenario

Show strategic equilibrium and reliability telemetry.

2. Inject Shock

Trigger renewable collapse or contingency.

Show stress emergence.

3. Safety Shield Intervention

Demonstrate physics-corrected behavior.

4. Baseline vs Trained Agent

Show measurable improvement.

The objective is not only to show the environment runs.

It is to show learning.

Why This Matters

This environment studies a broader question:

Can intelligent agents learn strategic behavior under economic incentives while respecting hard safety constraints?

Power systems are the domain. Reliability-aware intelligence is the larger problem.

Potential applications:

Smart-grid autonomy
Infrastructure agents
Safe multi-agent RL
Cyber-physical agent training
Reliability-constrained autonomous systems

Repository Layout

main.py
openenv.yaml
smartgrid_mas/
 ├── env.py
 ├── tasks.py
 ├── models.py
 ├── engine/
 │    ├── market.py
 │    ├── control.py
 │    ├── ldu.py
 │    ├── reward.py
 │    └── dynamics.py
tests/
artifacts/

Quick Start

Install

python -m venv .venv
source .venv/bin/activate
pip install -e .

Start Server

python main.py

Default endpoints:

Run Example

Reset environment:

curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task_id":"stress_shock","seed":42}'

Run deterministic demo:

curl -X POST http://localhost:7860/run-demo-mode

Run resilience comparison:

curl -X POST http://localhost:7860/run-resilience-demo

Training & Benchmarking

Generate deterministic artifacts:

generate-demo-artifacts

Run resilience benchmark:

train-baseline

Run tests:

pytest -q

OpenEnv Compliance

Includes:

OpenEnv-compatible environment interface
reset / step / state API
openenv.yaml metadata
Hosted Space deployment
RL training integration support
Reproducible benchmark artifacts

Research Framing

This project can be viewed as a benchmark for:

Safety-shielded RL
Reliability-aware multi-agent intelligence
Strategic infrastructure world modeling
Reward-hacking resistant environment design

It is not merely a simulator.

It is a trainable environment for studying resilient intelligence.

Citation / If Using This Environment

If this environment contributes to research or experimentation, please cite the repository and benchmark artifacts.

Closing Thought

Most reinforcement learning environments teach agents how to optimize.

This environment asks whether agents can learn how to preserve critical systems under uncertainty.

That is the benchmark. That is the experiment. That is OpenEnv SmartGrid MarketSim.