Spaces:
Sleeping
A newer version of the Streamlit SDK is available: 1.57.0
Grid-Aware EV Charging Orchestrator β RL Implementation Plan
A beginner-friendly, hackathon-winning Reinforcement Learning project from scratch.
π§ What Exactly Are We Building?
A Reinforcement Learning Agent that plays the role of a smart charging station manager.
- 50 EVs are plugged in at any time
- The agent looks at the grid load (high demand = risk of blackout) and each car's departure time
- Every minute (simulated), it assigns each car one of 3 actions:
- β‘ Fast Charge β draws high power, charges quickly
- π Slow Charge β draws low power, charges slowly
- βΈοΈ Wait β draws zero power, car waits
The agent learns over thousands of simulated episodes that it should:
- Charge urgent cars faster (leaving soon = high priority)
- Slow down charging when the grid is overloaded
- Never let a car leave with < 80% battery
π Tech Stack Explained (For Beginners)
1. Python π
Why: The de facto language for ML/AI. All the best libraries exist here.
2. gymnasium (formerly OpenAI Gym)
What it is: A standard toolkit for building RL environments.
Why: It gives us a clean step(), reset(), render() API that any RL algorithm can plug into. Think of it as the "game engine" for our simulation.
3. stable-baselines3 (SB3)
What it is: Pre-built, production-grade RL algorithms. Why: Instead of coding PPO/DQN from scratch (hard!), SB3 gives us battle-tested implementations in 3 lines of code. Algorithm we use: PPO (Proximal Policy Optimization) β the gold standard for discrete action spaces. Used by OpenAI for GPT training and robot control.
4. numpy
What it is: Fast array/math operations. Why: Our "state" (50 cars, each with battery %, time left) is a numerical array. Numpy handles this efficiently.
5. matplotlib + rich
What it is: Plotting (matplotlib) and beautiful terminal output (rich). Why: A hackathon needs stunning visuals. We'll plot training curves and show a live dashboard.
6. streamlit (Optional but Impressive)
What it is: Turns a Python script into a web dashboard with zero HTML/CSS. Why: Judges LOVE interactive demos. One command launches a browser UI showing the agent making real-time decisions.
ποΈ Project Architecture
meta/
βββ ev_env/
β βββ __init__.py
β βββ charging_env.py β The Gymnasium environment (our simulation)
βββ train.py β Train the RL agent
βββ evaluate.py β Test the trained agent + plot results
βββ dashboard.py β Streamlit live demo
βββ models/ β Saved trained models
βββ logs/ β Training metrics (TensorBoard)
βββ requirements.txt
π¬ The RL Components (Explained Simply)
State Space (What the Agent "Sees")
Think of this as the agent's eyes. At each timestep it sees:
| Feature | Per Car | Total |
|---|---|---|
| Battery % (0β100) | β | 50 values |
| Minutes until departure | β | 50 values |
| Grid load % (0β100) | Global | 1 value |
| Current hour of day | Global | 1 value |
Total state vector size: 102 numbers
Action Space (What the Agent Can Do)
- 3 actions per car Γ 50 cars = Too many combinations!
- Smart simplification: We treat it as a Multi-Binary or use a priority-based heuristic wrapper to select the top N cars for fast charging.
- Beginner-friendly approach: Flattened Discrete β agent picks a "charging policy profile" (e.g., Profile 3 = charge top 15 urgent cars fast, rest slow).
Reward Function (How the Agent Learns)
This is the heart of RL. We reward good behavior and penalize bad:
Reward =
+ 10 Γ (cars that reach 80% before departure) β success!
- 5 Γ (cars that leave with < 80%) β failure!
- 0.1 Γ (grid_load > 85%) β grid stress penalty per step
- 0.01 Γ (total power consumed per step) β efficiency bonus
+ 50 (episode bonus if 0 cars fail departure) β grand prize
π Implementation Phases
Phase 1 β Environment (charging_env.py)
Build the Gymnasium-compatible simulation:
- Initialize 50 cars with random battery (20β60%) and departure time (30β180 min)
- Simulate grid load curve (peaks at 6β8 PM, low at 2 AM)
- Implement
step()β apply actions, update battery, compute reward - Implement
reset()β spawn new episode
Phase 2 β Training (train.py)
- Wrap env with SB3's
make_vec_envfor parallel training - Initialize PPO with tuned hyperparameters
- Train for 500Kβ1M timesteps (~5 minutes on CPU)
- Save model + TensorBoard logs
Phase 3 β Evaluation (evaluate.py)
- Load trained model, run 100 test episodes
- Plot: reward curve, car success rate, grid load vs. charge rate
- Compare against Baseline: naive "charge everyone at full power"
Phase 4 β Dashboard (dashboard.py)
- Streamlit app showing real-time agent decisions
- Animated grid showing 50 cars color-coded by status
- Live charts of grid load and charging power
π― Hyperparameters (PPO)
| Parameter | Value | Why |
|---|---|---|
learning_rate |
3e-4 | Standard starting point |
n_steps |
2048 | Steps before each policy update |
batch_size |
64 | Mini-batch for gradient updates |
n_epochs |
10 | Policy update iterations |
gamma |
0.99 | Discount factor (care about future) |
ent_coef |
0.01 | Encourages exploration |
total_timesteps |
500_000 | ~5 min training on CPU |
π Hackathon Winning Elements
- Clear Problem Statement β Grid overload is a real, urgent problem
- Working Demo β Streamlit dashboard with live agent decisions
- Baseline Comparison β Show 40% improvement over naive charging
- Beautiful Plots β Training curve, car heatmap, grid load chart
- Explainability β Simple reward function judges can understand
π¦ Installation
pip install gymnasium stable-baselines3 numpy matplotlib streamlit rich tensorboard
β Verification Plan
- Train agent β reward should increase from ~-200 to ~+300 over training
- Evaluate: β₯ 85% cars successfully charged before departure
- Grid overload events: reduce by β₯ 50% vs. baseline
- Streamlit dashboard loads and shows live decision-making