# NBA Sage - Technical Explanation > **An AI-powered NBA game prediction system with real-time data, machine learning, and a modern web interface.** --- ## ๐ŸŽฏ What Does This Project Do? NBA Sage is a full-stack application that: 1. **Predicts NBA game outcomes** before they happen 2. **Shows live scores** with real-time updates 3. **Tracks prediction accuracy** over time 4. **Calculates MVP race standings** based on current stats 5. **Estimates championship odds** for all 30 teams --- ## ๐Ÿ† Key Features | Feature | Description | |---------|-------------| | **Live Game Dashboard** | Real-time scores, game status, win probabilities | | **Win Predictions** | Probability % for each team to win | | **Starting 5 Lineups** | Projected starters with PPG stats from NBA API | | **MVP Race** | Top 10 MVP candidates with scores | | **Championship Odds** | All 30 teams ranked by title probability | | **Model Accuracy** | Track how well predictions perform over time | --- ## ๐Ÿ› ๏ธ Technology Stack ### Backend (Python) | Technology | Purpose | |------------|---------| | **Flask** | REST API framework | | **nba_api** | Official NBA data (stats.nba.com) | | **XGBoost + LightGBM** | Machine learning ensemble model | | **APScheduler** | Background job scheduling | | **ChromaDB Cloud** | Persistent prediction storage | | **Pandas/NumPy** | Data processing | ### Frontend (React) | Technology | Purpose | |------------|---------| | **React 18** | UI framework | | **Vite** | Build tool & dev server | | **Custom CSS** | Modern design system | ### Infrastructure | Technology | Purpose | |------------|---------| | **Docker** | Container deployment | | **Hugging Face Spaces** | Cloud hosting | | **Git LFS** | Large file versioning | --- ## ๐Ÿ”ฌ How Predictions Work ### The Prediction Algorithm Predictions are made using a **multi-factor formula**: ``` Win Probability = Log5 Formula of: โ”œโ”€โ”€ 40% - Current Season Record (Win %) โ”œโ”€โ”€ 30% - Recent Form (Last 10 games performance) โ”œโ”€โ”€ 20% - ELO Rating (Historical team strength) โ””โ”€โ”€ 10% - Baseline Adjustments Applied: โ”œโ”€โ”€ +3.5% for Home Court Advantage โ””โ”€โ”€ -2% per Injury Impact Point ``` ### ELO Rating System ELO is a chess-inspired rating system adapted for NBA: - **Starting rating**: 1500 (average team) - **K-factor**: 20 (how much ratings change per game) - **Home advantage**: +100 ELO points equivalent - **Season regression**: Ratings regress 25% to mean each season **How it works:** - Win against better team โ†’ Big ELO gain - Win against weaker team โ†’ Small ELO gain - Lose against better team โ†’ Small ELO loss - Lose against weaker team โ†’ Big ELO loss --- ## ๐Ÿ“Š Data Sources ### Real-Time Data - **NBA Live API** (`nba_api.live`) - Live scores updated every 30 seconds - Game status (scheduled, in progress, final) - Box scores and player stats ### Historical Data - **NBA Stats API** (`nba_api.stats`) - 23 years of game data (2003-2026) - Team statistics (basic, advanced, clutch, hustle) - Player statistics - Current season stats for predictions ### Data Storage - **Parquet files**: Cached API responses (~140 files) - **ChromaDB Cloud**: Prediction history and accuracy tracking - **Joblib files**: Trained ML model and processed datasets --- ## ๐Ÿง  Machine Learning Components ### Trained Model: XGBoost + LightGBM Ensemble Two gradient boosting models trained on 41,000+ historical games: ``` Game Features โ”€โ”€โ”ฌโ”€โ”€โ–บ XGBoost (50%) โ”€โ”€โ” โ”‚ โ”‚โ”€โ”€โ–บ Ensemble Prediction โ””โ”€โ”€โ–บ LightGBM (50%) โ”€โ”˜ ``` **Features Used:** - ELO ratings and differentials - Rolling averages (5, 10, 20 game windows) - Rest days and back-to-back games - Home/away status - Season record statistics ### Training Pipeline ``` Data Collection โ”€โ”€โ–บ Feature Engineering โ”€โ”€โ–บ Model Training โ”€โ”€โ–บ Evaluation โ”‚ โ”‚ โ”‚ โ–ผ โ–ผ โ–ผ NBA API Data ELO Calculation XGBoost+LightGBM Era Normalization Rolling Windows ``` ### Auto-Training System The system automatically retrains itself: 1. **Ingests completed games** every hour 2. **Waits for all daily games** to complete 3. **Compares new model accuracy** to existing 4. **Only updates if improved** (prevents regression) --- ## ๐ŸŒ System Architecture ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ React Frontend โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚LiveGames โ”‚ โ”‚Predictionsโ”‚ โ”‚MVP Race โ”‚ โ”‚ Accuracy โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ–ฒโ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ–ฒโ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ–ฒโ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ–ฒโ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ REST API โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Flask Server โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ Endpoints โ”‚ โ”‚ Caching โ”‚ โ”‚ Scheduler โ”‚ โ”‚ โ”‚ โ”‚ /api/live โ”‚ โ”‚ In-Memory โ”‚ โ”‚ APScheduler โ”‚ โ”‚ โ”‚ โ”‚ /api/roster โ”‚ โ”‚ 1-hour rostersโ”‚ โ”‚ Auto-retrain โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ–ผ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ Prediction Pipeline โ”‚ โ”‚ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ โ”‚ โ”‚Live Collectorโ”‚ โ”‚Feature Gen โ”‚ โ”‚ ELO System โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ External Services โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ NBA API โ”‚ โ”‚ ChromaDB โ”‚ โ”‚ Hugging Faceโ”‚ โ”‚ โ”‚ โ”‚ stats.nba โ”‚ โ”‚ Cloud โ”‚ โ”‚ Spaces โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` --- ## ๐Ÿ“ Project Structure ``` NBA ML/ โ”œโ”€โ”€ server.py # Production server (Hugging Face) โ”œโ”€โ”€ api/api.py # Development server โ”‚ โ”œโ”€โ”€ src/ # Core logic โ”‚ โ”œโ”€โ”€ prediction_pipeline.py # Main orchestrator โ”‚ โ”œโ”€โ”€ feature_engineering.py # ELO + features โ”‚ โ”œโ”€โ”€ data_collector.py # Historical data โ”‚ โ”œโ”€โ”€ live_data_collector.py # Real-time data โ”‚ โ”œโ”€โ”€ prediction_tracker.py # Accuracy tracking โ”‚ โ””โ”€โ”€ models/ โ”‚ โ””โ”€โ”€ game_predictor.py # ML model โ”‚ โ”œโ”€โ”€ web/ # React frontend โ”‚ โ””โ”€โ”€ src/ โ”‚ โ”œโ”€โ”€ App.jsx โ”‚ โ”œโ”€โ”€ pages/ # UI pages โ”‚ โ””โ”€โ”€ index.css # Design system โ”‚ โ”œโ”€โ”€ data/ โ”‚ โ””โ”€โ”€ api_data/ # 140+ parquet files โ”‚ โ””โ”€โ”€ models/ โ””โ”€โ”€ game_predictor.joblib # Trained model (9.6KB) ``` --- ## ๐Ÿš€ Deployment ### Local Development ```bash # Backend python api/api.py # Runs on localhost:8000 # Frontend cd web && npm run dev # Runs on localhost:5173 ``` ### Production (Hugging Face Spaces) ```bash # Docker container python server.py # Serves both API + React on port 7860 ``` --- ## ๐Ÿ“ˆ Performance & Accuracy ### Prediction Accuracy - **Overall**: Tracked via ChromaDB Cloud - **By Confidence**: High/Medium/Low confidence splits - **By Team**: Per-team prediction accuracy ### Speed Optimizations - **In-memory caching**: Roster data cached for 1 hour - **Startup warming**: All 30 teams pre-loaded on server start - **Background refresh**: Cache updated every 2 hours --- ## ๐Ÿ”ฎ Future Improvements 1. **Integrate ML model** into live predictions (currently formula-based) 2. **Add player-level features** (injuries, rest days per player) 3. **Implement spread predictions** (margin of victory) 4. **Add playoff predictions** with series outcomes --- ## ๐Ÿ“Š Stats at a Glance | Metric | Value | |--------|-------| | Historical games | 41,000+ | | Seasons covered | 23 (2003-2026) | | Teams tracked | 30 | | ML model type | XGBoost + LightGBM | | API endpoints | 10+ | | Frontend pages | 6 | --- ## ๐Ÿ“‹ Complete ML Feature List (90+ Features) The model uses approximately **90 features** organized into these categories: ### 1๏ธโƒฃ ELO Rating Features (5 features) | Feature | Description | |---------|-------------| | `team_elo` | Team's current ELO rating | | `opponent_elo` | Opponent's current ELO rating | | `elo_diff` | Difference between team and opponent ELO | | `elo_win_prob` | Expected win probability from ELO | | `home_elo_boost` | ELO boost for home court (100 points) | ### 2๏ธโƒฃ Basic Stats - Rolling Averages (21 features) For each of 7 stats ร— 3 windows (5, 10, 20 games): | Base Stat | Windows | |-----------|---------| | `PTS` (Points) | `PTS_last5`, `PTS_last10`, `PTS_last20` | | `AST` (Assists) | `AST_last5`, `AST_last10`, `AST_last20` | | `REB` (Rebounds) | `REB_last5`, `REB_last10`, `REB_last20` | | `FG_PCT` (Field Goal %) | `FG_PCT_last5`, `FG_PCT_last10`, `FG_PCT_last20` | | `FG3_PCT` (3-Point %) | `FG3_PCT_last5`, `FG3_PCT_last10`, `FG3_PCT_last20` | | `FT_PCT` (Free Throw %) | `FT_PCT_last5`, `FT_PCT_last10`, `FT_PCT_last20` | | `PLUS_MINUS` (Point Diff) | `PLUS_MINUS_last5`, `PLUS_MINUS_last10`, `PLUS_MINUS_last20` | ### 3๏ธโƒฃ Season Statistics (9 features) | Feature | Description | |---------|-------------| | `PTS_season_avg` | Season average points | | `AST_season_avg` | Season average assists | | `REB_season_avg` | Season average rebounds | | `FG_PCT_season_avg` | Season field goal % | | `FG3_PCT_season_avg` | Season 3-point % | | `FT_PCT_season_avg` | Season free throw % | | `PLUS_MINUS_season_avg` | Season point differential | | `win_pct_season` | Season win percentage | | `games_played` | Games played in season | ### 4๏ธโƒฃ Defensive Features (4 features) | Feature | Description | |---------|-------------| | `STL_last10` | Steals per game (last 10) | | `BLK_last10` | Blocks per game (last 10) | | `DREB_last10` | Defensive rebounds (last 10) | | `pts_allowed_last10` | Points allowed (last 10) | ### 5๏ธโƒฃ Momentum Features (6 features) | Feature | Description | |---------|-------------| | `wins_last5` | Wins in last 5 games (0-5) | | `wins_last10` | Wins in last 10 games (0-10) | | `hot_streak` | 1 if 4+ wins in last 5 | | `cold_streak` | 1 if 1 or fewer wins in last 5 | | `plus_minus_last5` | Point differential trend | | `form_trend` | Comparison of last 3 vs previous 3 | ### 6๏ธโƒฃ Rest & Fatigue Features (4 features) | Feature | Description | |---------|-------------| | `days_rest` | Days since last game | | `back_to_back` | 1 if playing consecutive days | | `well_rested` | 1 if 3+ days rest | | `games_last_week` | Games played in last 7 days | ### 7๏ธโƒฃ Form Index Features (3 features) | Feature | Description | |---------|-------------| | `form_index` | Exponentially-weighted recent performance (0-1) | | `form_trend` | Trend direction (improving/declining) | | `form_plus_minus` | Weighted point differential | ### 8๏ธโƒฃ Basic Stat Columns (17 raw features) ```python BASIC_STATS = [ "PTS", "AST", "REB", "STL", "BLK", "TOV", "FGM", "FGA", "FG_PCT", "FG3M", "FG3A", "FG3_PCT", "FTM", "FTA", "FT_PCT", "OREB", "DREB" ] ``` ### 9๏ธโƒฃ Advanced Team Stats (11 features) ```python ADVANCED_STATS = [ "E_OFF_RATING", # Offensive Rating "E_DEF_RATING", # Defensive Rating "E_NET_RATING", # Net Rating "E_PACE", # Pace (possessions per game) "E_AST_RATIO", # Assist Ratio "E_OREB_PCT", # Offensive Rebound % "E_DREB_PCT", # Defensive Rebound % "E_REB_PCT", # Total Rebound % "E_TM_TOV_PCT", # Team Turnover % "E_EFG_PCT", # Effective FG% "E_TS_PCT" # True Shooting % ] ``` ### ๐Ÿ”Ÿ Clutch Stats (4 features) ```python CLUTCH_STATS = [ "CLUTCH_PTS", # Points in clutch time "CLUTCH_FG_PCT", # FG% in clutch "CLUTCH_FG3_PCT", # 3PT% in clutch "CLUTCH_PLUS_MINUS" # +/- in clutch ] ``` ### 1๏ธโƒฃ1๏ธโƒฃ Hustle Stats (5 features) ```python HUSTLE_STATS = [ "DEFLECTIONS", # Passes deflected "LOOSE_BALLS_RECOVERED", # Loose balls recovered "CHARGES_DRAWN", # Offensive fouls drawn "CONTESTED_SHOTS", # Shots contested "SCREEN_ASSISTS" # Screen assists ] ``` ### 1๏ธโƒฃ2๏ธโƒฃ Top Player Stats (6 features) | Feature | Description | |---------|-------------| | `top_players_avg_pts` | Avg points of top 5 players | | `top_players_avg_ast` | Avg assists of top 5 players | | `top_players_avg_reb` | Avg rebounds of top 5 players | | `top_players_avg_stl` | Avg steals of top 5 players | | `top_players_avg_blk` | Avg blocks of top 5 players | | `star_concentration` | % of scoring from top player | ### 1๏ธโƒฃ3๏ธโƒฃ Game Context (1 feature) | Feature | Description | |---------|-------------| | `is_home` | 1 if home team, 0 if away | --- ## ๐Ÿ“Š Feature Summary | Category | Feature Count | |----------|---------------| | ELO Ratings | 5 | | Rolling Averages (5/10/20) | 21 | | Season Statistics | 9 | | Defensive Stats | 4 | | Momentum Features | 6 | | Rest/Fatigue | 4 | | Form Index | 3 | | Advanced Team Stats | 11 | | Clutch Stats | 4 | | Hustle Stats | 5 | | Top Player Stats | 6 | | Game Context | 1 | | **TOTAL** | **~79 core features** | *Plus Z-score normalized versions of stats for era adjustment = **90+ total features*** --- *Built with Python, React, and a passion for basketball analytics* ๐Ÿ€