NBA Sage - Technical Explanation
An AI-powered NBA game prediction system with real-time data, machine learning, and a modern web interface.
๐ฏ What Does This Project Do?
NBA Sage is a full-stack application that:
- Predicts NBA game outcomes before they happen
- Shows live scores with real-time updates
- Tracks prediction accuracy over time
- Calculates MVP race standings based on current stats
- Estimates championship odds for all 30 teams
๐ Key Features
| Feature |
Description |
| Live Game Dashboard |
Real-time scores, game status, win probabilities |
| Win Predictions |
Probability % for each team to win |
| Starting 5 Lineups |
Projected starters with PPG stats from NBA API |
| MVP Race |
Top 10 MVP candidates with scores |
| Championship Odds |
All 30 teams ranked by title probability |
| Model Accuracy |
Track how well predictions perform over time |
๐ ๏ธ Technology Stack
Backend (Python)
| Technology |
Purpose |
| Flask |
REST API framework |
| nba_api |
Official NBA data (stats.nba.com) |
| XGBoost + LightGBM |
Machine learning ensemble model |
| APScheduler |
Background job scheduling |
| ChromaDB Cloud |
Persistent prediction storage |
| Pandas/NumPy |
Data processing |
Frontend (React)
| Technology |
Purpose |
| React 18 |
UI framework |
| Vite |
Build tool & dev server |
| Custom CSS |
Modern design system |
Infrastructure
| Technology |
Purpose |
| Docker |
Container deployment |
| Hugging Face Spaces |
Cloud hosting |
| Git LFS |
Large file versioning |
๐ฌ How Predictions Work
The Prediction Algorithm
Predictions are made using a multi-factor formula:
Win Probability = Log5 Formula of:
โโโ 40% - Current Season Record (Win %)
โโโ 30% - Recent Form (Last 10 games performance)
โโโ 20% - ELO Rating (Historical team strength)
โโโ 10% - Baseline
Adjustments Applied:
โโโ +3.5% for Home Court Advantage
โโโ -2% per Injury Impact Point
ELO Rating System
ELO is a chess-inspired rating system adapted for NBA:
- Starting rating: 1500 (average team)
- K-factor: 20 (how much ratings change per game)
- Home advantage: +100 ELO points equivalent
- Season regression: Ratings regress 25% to mean each season
How it works:
- Win against better team โ Big ELO gain
- Win against weaker team โ Small ELO gain
- Lose against better team โ Small ELO loss
- Lose against weaker team โ Big ELO loss
๐ Data Sources
Real-Time Data
- NBA Live API (
nba_api.live)
- Live scores updated every 30 seconds
- Game status (scheduled, in progress, final)
- Box scores and player stats
Historical Data
- NBA Stats API (
nba_api.stats)
- 23 years of game data (2003-2026)
- Team statistics (basic, advanced, clutch, hustle)
- Player statistics
- Current season stats for predictions
Data Storage
- Parquet files: Cached API responses (~140 files)
- ChromaDB Cloud: Prediction history and accuracy tracking
- Joblib files: Trained ML model and processed datasets
๐ง Machine Learning Components
Trained Model: XGBoost + LightGBM Ensemble
Two gradient boosting models trained on 41,000+ historical games:
Game Features โโโฌโโโบ XGBoost (50%) โโโ
โ โโโโบ Ensemble Prediction
โโโโบ LightGBM (50%) โโ
Features Used:
- ELO ratings and differentials
- Rolling averages (5, 10, 20 game windows)
- Rest days and back-to-back games
- Home/away status
- Season record statistics
Training Pipeline
Data Collection โโโบ Feature Engineering โโโบ Model Training โโโบ Evaluation
โ โ โ
โผ โผ โผ
NBA API Data ELO Calculation XGBoost+LightGBM
Era Normalization
Rolling Windows
Auto-Training System
The system automatically retrains itself:
- Ingests completed games every hour
- Waits for all daily games to complete
- Compares new model accuracy to existing
- Only updates if improved (prevents regression)
๐ System Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ React Frontend โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โLiveGames โ โPredictionsโ โMVP Race โ โ Accuracy โ โ
โ โโโโโโฒโโโโโโ โโโโโโฒโโโโโโ โโโโโโฒโโโโโโ โโโโโโฒโโโโโโ โ
โโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โ โ โ โ
โโโโโโโโโโโโโโดโโโโโโฌโโโโโโโดโโโโโโโโโโโโโ
โ REST API
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Flask Server โ
โ โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ โ
โ โ Endpoints โ โ Caching โ โ Scheduler โ โ
โ โ /api/live โ โ In-Memory โ โ APScheduler โ โ
โ โ /api/roster โ โ 1-hour rostersโ โ Auto-retrain โ โ
โ โโโโโโโโโโฌโโโโโโโโ โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Prediction Pipeline โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โ โLive Collectorโ โFeature Gen โ โ ELO System โ โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ External Services โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ NBA API โ โ ChromaDB โ โ Hugging Faceโ โ
โ โ stats.nba โ โ Cloud โ โ Spaces โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Project Structure
NBA ML/
โโโ server.py # Production server (Hugging Face)
โโโ api/api.py # Development server
โ
โโโ src/ # Core logic
โ โโโ prediction_pipeline.py # Main orchestrator
โ โโโ feature_engineering.py # ELO + features
โ โโโ data_collector.py # Historical data
โ โโโ live_data_collector.py # Real-time data
โ โโโ prediction_tracker.py # Accuracy tracking
โ โโโ models/
โ โโโ game_predictor.py # ML model
โ
โโโ web/ # React frontend
โ โโโ src/
โ โโโ App.jsx
โ โโโ pages/ # UI pages
โ โโโ index.css # Design system
โ
โโโ data/
โ โโโ api_data/ # 140+ parquet files
โ
โโโ models/
โโโ game_predictor.joblib # Trained model (9.6KB)
๐ Deployment
Local Development
python api/api.py
cd web && npm run dev
Production (Hugging Face Spaces)
python server.py
๐ Performance & Accuracy
Prediction Accuracy
- Overall: Tracked via ChromaDB Cloud
- By Confidence: High/Medium/Low confidence splits
- By Team: Per-team prediction accuracy
Speed Optimizations
- In-memory caching: Roster data cached for 1 hour
- Startup warming: All 30 teams pre-loaded on server start
- Background refresh: Cache updated every 2 hours
๐ฎ Future Improvements
- Integrate ML model into live predictions (currently formula-based)
- Add player-level features (injuries, rest days per player)
- Implement spread predictions (margin of victory)
- Add playoff predictions with series outcomes
๐ Stats at a Glance
| Metric |
Value |
| Historical games |
41,000+ |
| Seasons covered |
23 (2003-2026) |
| Teams tracked |
30 |
| ML model type |
XGBoost + LightGBM |
| API endpoints |
10+ |
| Frontend pages |
6 |
๐ Complete ML Feature List (90+ Features)
The model uses approximately 90 features organized into these categories:
1๏ธโฃ ELO Rating Features (5 features)
| Feature |
Description |
team_elo |
Team's current ELO rating |
opponent_elo |
Opponent's current ELO rating |
elo_diff |
Difference between team and opponent ELO |
elo_win_prob |
Expected win probability from ELO |
home_elo_boost |
ELO boost for home court (100 points) |
2๏ธโฃ Basic Stats - Rolling Averages (21 features)
For each of 7 stats ร 3 windows (5, 10, 20 games):
| Base Stat |
Windows |
PTS (Points) |
PTS_last5, PTS_last10, PTS_last20 |
AST (Assists) |
AST_last5, AST_last10, AST_last20 |
REB (Rebounds) |
REB_last5, REB_last10, REB_last20 |
FG_PCT (Field Goal %) |
FG_PCT_last5, FG_PCT_last10, FG_PCT_last20 |
FG3_PCT (3-Point %) |
FG3_PCT_last5, FG3_PCT_last10, FG3_PCT_last20 |
FT_PCT (Free Throw %) |
FT_PCT_last5, FT_PCT_last10, FT_PCT_last20 |
PLUS_MINUS (Point Diff) |
PLUS_MINUS_last5, PLUS_MINUS_last10, PLUS_MINUS_last20 |
3๏ธโฃ Season Statistics (9 features)
| Feature |
Description |
PTS_season_avg |
Season average points |
AST_season_avg |
Season average assists |
REB_season_avg |
Season average rebounds |
FG_PCT_season_avg |
Season field goal % |
FG3_PCT_season_avg |
Season 3-point % |
FT_PCT_season_avg |
Season free throw % |
PLUS_MINUS_season_avg |
Season point differential |
win_pct_season |
Season win percentage |
games_played |
Games played in season |
4๏ธโฃ Defensive Features (4 features)
| Feature |
Description |
STL_last10 |
Steals per game (last 10) |
BLK_last10 |
Blocks per game (last 10) |
DREB_last10 |
Defensive rebounds (last 10) |
pts_allowed_last10 |
Points allowed (last 10) |
5๏ธโฃ Momentum Features (6 features)
| Feature |
Description |
wins_last5 |
Wins in last 5 games (0-5) |
wins_last10 |
Wins in last 10 games (0-10) |
hot_streak |
1 if 4+ wins in last 5 |
cold_streak |
1 if 1 or fewer wins in last 5 |
plus_minus_last5 |
Point differential trend |
form_trend |
Comparison of last 3 vs previous 3 |
6๏ธโฃ Rest & Fatigue Features (4 features)
| Feature |
Description |
days_rest |
Days since last game |
back_to_back |
1 if playing consecutive days |
well_rested |
1 if 3+ days rest |
games_last_week |
Games played in last 7 days |
7๏ธโฃ Form Index Features (3 features)
| Feature |
Description |
form_index |
Exponentially-weighted recent performance (0-1) |
form_trend |
Trend direction (improving/declining) |
form_plus_minus |
Weighted point differential |
8๏ธโฃ Basic Stat Columns (17 raw features)
BASIC_STATS = [
"PTS", "AST", "REB", "STL", "BLK", "TOV",
"FGM", "FGA", "FG_PCT",
"FG3M", "FG3A", "FG3_PCT",
"FTM", "FTA", "FT_PCT",
"OREB", "DREB"
]
9๏ธโฃ Advanced Team Stats (11 features)
ADVANCED_STATS = [
"E_OFF_RATING",
"E_DEF_RATING",
"E_NET_RATING",
"E_PACE",
"E_AST_RATIO",
"E_OREB_PCT",
"E_DREB_PCT",
"E_REB_PCT",
"E_TM_TOV_PCT",
"E_EFG_PCT",
"E_TS_PCT"
]
๐ Clutch Stats (4 features)
CLUTCH_STATS = [
"CLUTCH_PTS",
"CLUTCH_FG_PCT",
"CLUTCH_FG3_PCT",
"CLUTCH_PLUS_MINUS"
]
1๏ธโฃ1๏ธโฃ Hustle Stats (5 features)
HUSTLE_STATS = [
"DEFLECTIONS",
"LOOSE_BALLS_RECOVERED",
"CHARGES_DRAWN",
"CONTESTED_SHOTS",
"SCREEN_ASSISTS"
]
1๏ธโฃ2๏ธโฃ Top Player Stats (6 features)
| Feature |
Description |
top_players_avg_pts |
Avg points of top 5 players |
top_players_avg_ast |
Avg assists of top 5 players |
top_players_avg_reb |
Avg rebounds of top 5 players |
top_players_avg_stl |
Avg steals of top 5 players |
top_players_avg_blk |
Avg blocks of top 5 players |
star_concentration |
% of scoring from top player |
1๏ธโฃ3๏ธโฃ Game Context (1 feature)
| Feature |
Description |
is_home |
1 if home team, 0 if away |
๐ Feature Summary
| Category |
Feature Count |
| ELO Ratings |
5 |
| Rolling Averages (5/10/20) |
21 |
| Season Statistics |
9 |
| Defensive Stats |
4 |
| Momentum Features |
6 |
| Rest/Fatigue |
4 |
| Form Index |
3 |
| Advanced Team Stats |
11 |
| Clutch Stats |
4 |
| Hustle Stats |
5 |
| Top Player Stats |
6 |
| Game Context |
1 |
| TOTAL |
~79 core features |
Plus Z-score normalized versions of stats for era adjustment = 90+ total features
Built with Python, React, and a passion for basketball analytics ๐