GridCharge-RL / implementation_plan.md
Mayank-22's picture
Upload GridCharge-RL project files
3c56bf5 verified

A newer version of the Streamlit SDK is available: 1.57.0

Upgrade

Grid-Aware EV Charging Orchestrator β€” RL Implementation Plan

A beginner-friendly, hackathon-winning Reinforcement Learning project from scratch.


🧠 What Exactly Are We Building?

A Reinforcement Learning Agent that plays the role of a smart charging station manager.

  • 50 EVs are plugged in at any time
  • The agent looks at the grid load (high demand = risk of blackout) and each car's departure time
  • Every minute (simulated), it assigns each car one of 3 actions:
    • ⚑ Fast Charge β€” draws high power, charges quickly
    • πŸ”‹ Slow Charge β€” draws low power, charges slowly
    • ⏸️ Wait β€” draws zero power, car waits

The agent learns over thousands of simulated episodes that it should:

  • Charge urgent cars faster (leaving soon = high priority)
  • Slow down charging when the grid is overloaded
  • Never let a car leave with < 80% battery

πŸ“š Tech Stack Explained (For Beginners)

1. Python 🐍

Why: The de facto language for ML/AI. All the best libraries exist here.

2. gymnasium (formerly OpenAI Gym)

What it is: A standard toolkit for building RL environments. Why: It gives us a clean step(), reset(), render() API that any RL algorithm can plug into. Think of it as the "game engine" for our simulation.

3. stable-baselines3 (SB3)

What it is: Pre-built, production-grade RL algorithms. Why: Instead of coding PPO/DQN from scratch (hard!), SB3 gives us battle-tested implementations in 3 lines of code. Algorithm we use: PPO (Proximal Policy Optimization) β€” the gold standard for discrete action spaces. Used by OpenAI for GPT training and robot control.

4. numpy

What it is: Fast array/math operations. Why: Our "state" (50 cars, each with battery %, time left) is a numerical array. Numpy handles this efficiently.

5. matplotlib + rich

What it is: Plotting (matplotlib) and beautiful terminal output (rich). Why: A hackathon needs stunning visuals. We'll plot training curves and show a live dashboard.

6. streamlit (Optional but Impressive)

What it is: Turns a Python script into a web dashboard with zero HTML/CSS. Why: Judges LOVE interactive demos. One command launches a browser UI showing the agent making real-time decisions.


πŸ—οΈ Project Architecture

meta/
β”œβ”€β”€ ev_env/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── charging_env.py       ← The Gymnasium environment (our simulation)
β”œβ”€β”€ train.py                  ← Train the RL agent
β”œβ”€β”€ evaluate.py               ← Test the trained agent + plot results
β”œβ”€β”€ dashboard.py              ← Streamlit live demo
β”œβ”€β”€ models/                   ← Saved trained models
β”œβ”€β”€ logs/                     ← Training metrics (TensorBoard)
└── requirements.txt

πŸ”¬ The RL Components (Explained Simply)

State Space (What the Agent "Sees")

Think of this as the agent's eyes. At each timestep it sees:

Feature Per Car Total
Battery % (0–100) βœ… 50 values
Minutes until departure βœ… 50 values
Grid load % (0–100) Global 1 value
Current hour of day Global 1 value

Total state vector size: 102 numbers

Action Space (What the Agent Can Do)

  • 3 actions per car Γ— 50 cars = Too many combinations!
  • Smart simplification: We treat it as a Multi-Binary or use a priority-based heuristic wrapper to select the top N cars for fast charging.
  • Beginner-friendly approach: Flattened Discrete β€” agent picks a "charging policy profile" (e.g., Profile 3 = charge top 15 urgent cars fast, rest slow).

Reward Function (How the Agent Learns)

This is the heart of RL. We reward good behavior and penalize bad:

Reward = 
  + 10  Γ— (cars that reach 80% before departure)    ← success!
  - 5   Γ— (cars that leave with < 80%)               ← failure!
  - 0.1 Γ— (grid_load > 85%)                         ← grid stress penalty per step
  - 0.01 Γ— (total power consumed per step)           ← efficiency bonus
  + 50  (episode bonus if 0 cars fail departure)     ← grand prize

πŸ“‹ Implementation Phases

Phase 1 β€” Environment (charging_env.py)

Build the Gymnasium-compatible simulation:

  • Initialize 50 cars with random battery (20–60%) and departure time (30–180 min)
  • Simulate grid load curve (peaks at 6–8 PM, low at 2 AM)
  • Implement step() β€” apply actions, update battery, compute reward
  • Implement reset() β€” spawn new episode

Phase 2 β€” Training (train.py)

  • Wrap env with SB3's make_vec_env for parallel training
  • Initialize PPO with tuned hyperparameters
  • Train for 500K–1M timesteps (~5 minutes on CPU)
  • Save model + TensorBoard logs

Phase 3 β€” Evaluation (evaluate.py)

  • Load trained model, run 100 test episodes
  • Plot: reward curve, car success rate, grid load vs. charge rate
  • Compare against Baseline: naive "charge everyone at full power"

Phase 4 β€” Dashboard (dashboard.py)

  • Streamlit app showing real-time agent decisions
  • Animated grid showing 50 cars color-coded by status
  • Live charts of grid load and charging power

🎯 Hyperparameters (PPO)

Parameter Value Why
learning_rate 3e-4 Standard starting point
n_steps 2048 Steps before each policy update
batch_size 64 Mini-batch for gradient updates
n_epochs 10 Policy update iterations
gamma 0.99 Discount factor (care about future)
ent_coef 0.01 Encourages exploration
total_timesteps 500_000 ~5 min training on CPU

πŸ† Hackathon Winning Elements

  1. Clear Problem Statement β€” Grid overload is a real, urgent problem
  2. Working Demo β€” Streamlit dashboard with live agent decisions
  3. Baseline Comparison β€” Show 40% improvement over naive charging
  4. Beautiful Plots β€” Training curve, car heatmap, grid load chart
  5. Explainability β€” Simple reward function judges can understand

πŸ“¦ Installation

pip install gymnasium stable-baselines3 numpy matplotlib streamlit rich tensorboard

βœ… Verification Plan

  • Train agent β†’ reward should increase from ~-200 to ~+300 over training
  • Evaluate: β‰₯ 85% cars successfully charged before departure
  • Grid overload events: reduce by β‰₯ 50% vs. baseline
  • Streamlit dashboard loads and shows live decision-making