Spaces:

Not-OmKar
/

grid

Sleeping

App Files Files Community

grid / README.md

Not-OmKar

Link addition

98fad8b about 1 month ago

preview code

raw

history blame contribute delete

11.1 kB

	---
	title: OpenEnv SmartGrid MarketSim
	emoji: ⚡
	colorFrom: green
	colorTo: blue
	sdk: docker
	pinned: false
	---
	# ⚡ OpenEnv SmartGrid MarketSim

	## In this market, physics has veto power.

	Power markets optimize economics. Grid operators preserve stability. Physics enforces hard limits.
	OpenEnv SmartGrid MarketSim is a reinforcement learning environment where all three collide — and agents learn whether the grid survives.

	OpenEnv SmartGrid MarketSim is a multi-agent reinforcement learning environment for training reliability-aware agents in strategic electricity markets under uncertainty, contingencies, and physical constraints.

	## Links

	* HF Space: [https://huggingface.co/spaces/Not-OmKar/grid](https://huggingface.co/spaces/Not-OmKar/grid)
	* HF Blog: [https://huggingface.co/spaces/Not-OmKar/grid/blob/main/Blog.md](https://huggingface.co/spaces/Not-OmKar/grid/blob/main/Blog.md)
	* Google Colaboratory Notebook: [https://colab.research.google.com/drive/1vqpgNIqGJxZeG2zzoZQKLpiWl172rbB0?usp=sharing](https://colab.research.google.com/drive/1vqpgNIqGJxZeG2zzoZQKLpiWl172rbB0?usp=sharing)
	* Media and Resources: [https://drive.google.com/drive/folders/1fOHawUaYBbp9A4yEDU7jVADd8QbGaOtj?usp=sharing](https://drive.google.com/drive/folders/1fOHawUaYBbp9A4yEDU7jVADd8QbGaOtj?usp=sharing)

	This is not a toy market simulator.

	It is a training ground for resilient infrastructure intelligence.

	Agents do not merely learn how to maximize reward.
	They learn how to:

	* Coordinate under conflicting incentives
	* Respond to shocks and outages
	* Preserve grid reliability under stress
	* Optimize within physical feasibility limits
	* Balance economic strategy with system resilience

	---

	# Why This Environment Exists

	Modern power systems face a structural intelligence problem.

	Markets optimize price.
	Control systems optimize stability.
	Operators manage emergencies.

	Real grids require all three simultaneously.

	Most existing learning environments isolate these challenges.
	This environment combines them.

	## Core Hypothesis

	If agents train in a world where economic strategy is filtered through dispatch intelligence and hard physical constraints, they can learn reliability-aware strategic behavior instead of brittle reward maximization.

	That is the problem this benchmark targets.

	---

	# What Makes This Environment Novel

	This environment combines three interacting intelligence layers.

	## 1. Strategic Multi-Agent Electricity Market

	Agents participate as:

	* Renewable prosumers
	* Industrial load participants
	* Peaker generators
	* Flexible EV storage resources

	Agents submit bids and interact through strategic market clearing influenced by leader price signals.

	This creates a partially cooperative, partially competitive game.

	---

	## 2. Reliability Dispatch Control Agent

	A dedicated dispatch intelligence layer observes:

	* Scarcity conditions
	* Forecast gaps
	* Reserve risks
	* Contingencies
	* Renewable uncertainty

	It intervenes through:

	* Reserve activation
	* Corrective redispatch
	* Storage balancing
	* Peaker adjustments
	* Emergency support actions

	This turns the environment into more than a market.
	It becomes a reliability coordination game.

	---

	## 3. Physics-Constrained Safety Shield

	All proposed actions pass through a safety layer enforcing:

	* EV SOC bounds
	* Ramp-rate constraints
	* Reserve adequacy
	* Frequency and line-loading proxies
	* Emergency support logic
	* Constraint correction and feasibility enforcement

	Policies may propose.
	Physics decides.

	Unsafe strategies cannot exploit the environment.

	---

	# Environment Architecture

	Each environment step executes:

	1. Policy Action Selection
	2. Market Clearing
	3. Dispatch Control Decision
	4. Physics Safety Enforcement
	5. Reward Computation
	6. State Evolution Under Uncertainty

	This creates a closed-loop strategic learning system.

	```text
	Policy Actions
	↓
	Market Clearing
	↓
	Dispatch Control Agent
	↓
	Physics Safety Shield
	↓
	Reward Computation
	↓
	State Evolution
	```

	---

	# What Agents Learn

	Agents are not trained to optimize price alone.

	They learn tradeoffs among:

	* Reliability
	* Cost efficiency
	* Stability
	* Reserve adequacy
	* Renewable utilization
	* Constraint compliance

	Sometimes the profitable move loses.
	The resilient move wins.

	That is deliberate.

	---

	# Reward Design

	Reward is structured as staged rubrics.

	## Reliability Stage

	Can the grid remain operational?

	## Service Stage

	Is demand satisfied?

	## Optimization Stage

	Is dispatch economically efficient?

	## Stability Stage

	Are system risks controlled?

	Final rewards incorporate anti-hacking penalties for:

	* Blackouts
	* Constraint violations
	* Reserve shortfalls
	* Unsafe exploitation
	* Stability failures

	High-level reward structure:

	```text
	Reward = Reliability
	+ Service
	+ Optimization
	+ Stability
	- Safety Penalties
	```

	This prevents single-metric reward hacking.
	Agents must learn robust behavior.

	---

	# RL Training in the Environment

	This environment is built not only for simulation, but for training.

	Current training stack includes:

	* OpenEnv interaction loop
	* Hugging Face TRL
	* GRPO-style reinforcement optimization
	* Curriculum learning across stress scenarios
	* Multi-agent policy benchmarking

	Policy comparisons include:

	* Random baseline
	* Heuristic policies
	* Adaptive policies
	* Trained RL agents

	Training evaluates improvement through:

	* Cumulative reward growth
	* Blackout reduction
	* Reserve shortfall reduction
	* Stability-event reduction
	* Candidate vs baseline win rates

	Success is defined by improved behavior inside the environment.

	Not better text outputs.
	Better policies.

	---

	# OpenEnv Themes Alignment

	This environment spans multiple OpenEnv hackathon themes.

	## Theme #1 — Multi-Agent Interactions

	Strategic bidding, negotiation, competition and coordination.

	## Theme #2 — Long-Horizon Planning

	Delayed consequences, contingency response and long-horizon resilience.

	## Theme #3 — World Modeling

	Partial observability, tool-like control interaction, dynamic infrastructure simulation.

	---

	# Environment Tasks

	## default

	Standard strategic bidding with reliability-aware dispatch.

	## long_horizon

	Longer planning horizons with delayed system effects.

	## stress_shock

	Shock-heavy reliability stress testing.

	## outage

	N-1 style outage and contingency scenarios.

	## renewable_collapse

	Severe renewable drop and forecast error regimes.

	These scenarios test both optimization and resilience.

	---

	# Example Observation Signals

	Agents observe:

	* Demand levels
	* Renewable availability
	* Scarcity index
	* Clearing prices
	* Reserve conditions
	* Forecast errors
	* Contingency flags
	* Stability risk indicators

	This supports strategic reasoning under uncertainty.

	---

	# Example Action Space

	Joint actions include:

	* Strategic supply and demand bids
	* EV charge / discharge decisions
	* Reserve activation
	* Storage dispatch
	* Corrective redispatch

	Multi-agent strategy and operational control coexist.

	---

	# Safety Constraints Enforced

	Physics shield enforces:

	* No infeasible dispatch
	* No simultaneous charge/discharge exploits
	* Ramp limits respected
	* Reserve commitments maintained
	* Emergency support triggered when necessary

	Learned policies cannot bypass safety.

	---

	# Benchmark Evidence

	Evaluation artifacts include:

	* Reward curves
	* Policy comparisons
	* Ablation results
	* Resilience stress benchmarks
	* Pairwise policy win-rate analysis
	* Trajectory and metrics artifacts

	## Example Metrics

	\| Metric \| Baseline \| Trained Agent \| Goal \|
	\| --------------------- \| -------- \| ------------- \| -------- \|
	\| Average Reward \| -- \| -- \| Increase \|
	\| Blackout Rate \| -- \| -- \| Reduce \|
	\| Reserve Shortfalls \| -- \| -- \| Reduce \|
	\| Stability Events \| -- \| -- \| Reduce \|
	\| Constraint Violations \| -- \| -- \| Reduce \|

	---

	# Demo Walkthrough

	Recommended judge walkthrough:

	## 1. Normal Market Scenario

	Show strategic equilibrium and reliability telemetry.

	## 2. Inject Shock

	Trigger renewable collapse or contingency.

	Show stress emergence.

	## 3. Safety Shield Intervention

	Demonstrate physics-corrected behavior.

	## 4. Baseline vs Trained Agent

	Show measurable improvement.

	The objective is not only to show the environment runs.

	It is to show learning.

	---

	# Why This Matters

	This environment studies a broader question:

	Can intelligent agents learn strategic behavior under economic incentives while respecting hard safety constraints?

	Power systems are the domain.
	Reliability-aware intelligence is the larger problem.

	Potential applications:

	* Smart-grid autonomy
	* Infrastructure agents
	* Safe multi-agent RL
	* Cyber-physical agent training
	* Reliability-constrained autonomous systems

	---

	# Repository Layout

	```text
	main.py
	openenv.yaml
	smartgrid_mas/
	├── env.py
	├── tasks.py
	├── models.py
	├── engine/
	│ ├── market.py
	│ ├── control.py
	│ ├── ldu.py
	│ ├── reward.py
	│ └── dynamics.py
	tests/
	artifacts/
	```

	---

	# Quick Start

	## Install

	```bash
	python -m venv .venv
	source .venv/bin/activate
	pip install -e .
	```

	---

	## Start Server

	```bash
	python main.py
	```

	Default endpoints:

	* API: [http://localhost:7860](http://localhost:7860)
	* Docs: [http://localhost:7860/docs](http://localhost:7860/docs)
	* Demo: [http://localhost:7860/demo](http://localhost:7860/demo)




	---

	# Run Example

	Reset environment:

	```bash
	curl -X POST http://localhost:7860/reset \
	-H "Content-Type: application/json" \
	-d '{"task_id":"stress_shock","seed":42}'
	```

	Run deterministic demo:

	```bash
	curl -X POST http://localhost:7860/run-demo-mode
	```

	Run resilience comparison:

	```bash
	curl -X POST http://localhost:7860/run-resilience-demo
	```

	---

	# Training & Benchmarking

	Generate deterministic artifacts:

	```bash
	generate-demo-artifacts
	```

	Run resilience benchmark:

	```bash
	train-baseline
	```

	Run tests:

	```bash
	pytest -q
	```

	---

	# OpenEnv Compliance

	Includes:

	* OpenEnv-compatible environment interface
	* reset / step / state API
	* openenv.yaml metadata
	* Hosted Space deployment
	* RL training integration support
	* Reproducible benchmark artifacts

	---

	# Research Framing

	This project can be viewed as a benchmark for:

	* Safety-shielded RL
	* Reliability-aware multi-agent intelligence
	* Strategic infrastructure world modeling
	* Reward-hacking resistant environment design

	It is not merely a simulator.

	It is a trainable environment for studying resilient intelligence.

	---

	# Citation / If Using This Environment

	If this environment contributes to research or experimentation, please cite the repository and benchmark artifacts.

	---

	# Closing Thought

	Most reinforcement learning environments teach agents how to optimize.

	This environment asks whether agents can learn how to preserve critical systems under uncertainty.

	That is the benchmark.
	That is the experiment.
	That is OpenEnv SmartGrid MarketSim.