Spaces:

Mayank-22
/

GridCharge-RL

Sleeping

App Files Files Community

GridCharge-RL / implementation_plan.md

Mayank-22

Upload GridCharge-RL project files

3c56bf5 verified about 2 months ago

preview code

raw

history blame contribute delete

6.65 kB

A newer version of the Streamlit SDK is available: 1.57.0

Upgrade

Grid-Aware EV Charging Orchestrator — RL Implementation Plan

A beginner-friendly, hackathon-winning Reinforcement Learning project from scratch.

🧠 What Exactly Are We Building?

A Reinforcement Learning Agent that plays the role of a smart charging station manager.

50 EVs are plugged in at any time
The agent looks at the grid load (high demand = risk of blackout) and each car's departure time
Every minute (simulated), it assigns each car one of 3 actions:
- ⚡ Fast Charge — draws high power, charges quickly
- 🔋 Slow Charge — draws low power, charges slowly
- ⏸️ Wait — draws zero power, car waits

The agent learns over thousands of simulated episodes that it should:

Charge urgent cars faster (leaving soon = high priority)
Slow down charging when the grid is overloaded
Never let a car leave with < 80% battery

📚 Tech Stack Explained (For Beginners)

1. Python 🐍

Why: The de facto language for ML/AI. All the best libraries exist here.

2. `gymnasium` (formerly OpenAI Gym)

What it is: A standard toolkit for building RL environments. Why: It gives us a clean step(), reset(), render() API that any RL algorithm can plug into. Think of it as the "game engine" for our simulation.

3. `stable-baselines3` (SB3)

What it is: Pre-built, production-grade RL algorithms. Why: Instead of coding PPO/DQN from scratch (hard!), SB3 gives us battle-tested implementations in 3 lines of code. Algorithm we use: PPO (Proximal Policy Optimization) — the gold standard for discrete action spaces. Used by OpenAI for GPT training and robot control.

4. `numpy`

What it is: Fast array/math operations. Why: Our "state" (50 cars, each with battery %, time left) is a numerical array. Numpy handles this efficiently.

5. `matplotlib` + `rich`

What it is: Plotting (matplotlib) and beautiful terminal output (rich). Why: A hackathon needs stunning visuals. We'll plot training curves and show a live dashboard.

6. `streamlit` (Optional but Impressive)

What it is: Turns a Python script into a web dashboard with zero HTML/CSS. Why: Judges LOVE interactive demos. One command launches a browser UI showing the agent making real-time decisions.

🏗️ Project Architecture

meta/
├── ev_env/
│   ├── __init__.py
│   └── charging_env.py       ← The Gymnasium environment (our simulation)
├── train.py                  ← Train the RL agent
├── evaluate.py               ← Test the trained agent + plot results
├── dashboard.py              ← Streamlit live demo
├── models/                   ← Saved trained models
├── logs/                     ← Training metrics (TensorBoard)
└── requirements.txt

🔬 The RL Components (Explained Simply)

State Space (What the Agent "Sees")

Think of this as the agent's eyes. At each timestep it sees:

Feature	Per Car	Total
Battery % (0–100)	✅	50 values
Minutes until departure	✅	50 values
Grid load % (0–100)	Global	1 value
Current hour of day	Global	1 value

Total state vector size: 102 numbers

Action Space (What the Agent Can Do)

3 actions per car × 50 cars = Too many combinations!
Smart simplification: We treat it as a Multi-Binary or use a priority-based heuristic wrapper to select the top N cars for fast charging.
Beginner-friendly approach: Flattened Discrete — agent picks a "charging policy profile" (e.g., Profile 3 = charge top 15 urgent cars fast, rest slow).

Reward Function (How the Agent Learns)

This is the heart of RL. We reward good behavior and penalize bad:

Reward = 
  + 10  × (cars that reach 80% before departure)    ← success!
  - 5   × (cars that leave with < 80%)               ← failure!
  - 0.1 × (grid_load > 85%)                         ← grid stress penalty per step
  - 0.01 × (total power consumed per step)           ← efficiency bonus
  + 50  (episode bonus if 0 cars fail departure)     ← grand prize

📋 Implementation Phases

Phase 1 — Environment (`charging_env.py`)

Build the Gymnasium-compatible simulation:

Initialize 50 cars with random battery (20–60%) and departure time (30–180 min)
Simulate grid load curve (peaks at 6–8 PM, low at 2 AM)
Implement step() — apply actions, update battery, compute reward
Implement reset() — spawn new episode

Phase 2 — Training (`train.py`)

Wrap env with SB3's make_vec_env for parallel training
Initialize PPO with tuned hyperparameters
Train for 500K–1M timesteps (~5 minutes on CPU)
Save model + TensorBoard logs

Phase 3 — Evaluation (`evaluate.py`)

Load trained model, run 100 test episodes
Plot: reward curve, car success rate, grid load vs. charge rate
Compare against Baseline: naive "charge everyone at full power"

Phase 4 — Dashboard (`dashboard.py`)

Streamlit app showing real-time agent decisions
Animated grid showing 50 cars color-coded by status
Live charts of grid load and charging power

🎯 Hyperparameters (PPO)

Parameter	Value	Why
`learning_rate`	3e-4	Standard starting point
`n_steps`	2048	Steps before each policy update
`batch_size`	64	Mini-batch for gradient updates
`n_epochs`	10	Policy update iterations
`gamma`	0.99	Discount factor (care about future)
`ent_coef`	0.01	Encourages exploration
`total_timesteps`	500_000	~5 min training on CPU

🏆 Hackathon Winning Elements

Clear Problem Statement — Grid overload is a real, urgent problem
Working Demo — Streamlit dashboard with live agent decisions
Baseline Comparison — Show 40% improvement over naive charging
Beautiful Plots — Training curve, car heatmap, grid load chart
Explainability — Simple reward function judges can understand

📦 Installation

pip install gymnasium stable-baselines3 numpy matplotlib streamlit rich tensorboard

✅ Verification Plan

Train agent → reward should increase from ~-200 to ~+300 over training
Evaluate: ≥ 85% cars successfully charged before departure
Grid overload events: reduce by ≥ 50% vs. baseline
Streamlit dashboard loads and shows live decision-making