IPL Match Winner Prediction Model
This repository contains machine learning models for predicting IPL (Indian Premier League) cricket match outcomes using historical ball-by-ball data from 2008β2025.
Dataset
- Source:
prasad-gade05/ipl-enriched-2008-2025on Hugging Face - Coverage: 278,205 ball-by-ball records across 1,169 matches (2008β2025)
- Splits: Time-based train/test (seasons 2008β2023 train, 2024β2025 test)
Models
1. Pre-Match Winner Prediction (ipl_model_v2/)
Predicts the winner before the match starts using engineered features:
- Elo ratings for each team
- Multi-window rolling form (last 3, 5, 10 matches)
- Season-level performance stats
- Venue-specific chasing win rates and average scores
- Toss impact features
- Head-to-head historical records
Models trained:
- XGBoost
- Random Forest
- Gradient Boosting
- Logistic Regression
- Weighted Ensemble of all four
2. In-Match Win Probability (ipl_inmatch_model/)
Predicts the win probability during the match at different stages:
- After 5 overs: ~67% accuracy, AUC 0.70
- After 10 overs: ~69% accuracy, AUC 0.75
- After 15 overs: ~82% accuracy, AUC 0.89
- After 20 overs: ~92% accuracy, AUC 0.98
Each stage uses the current match state:
- Runs scored, wickets lost, run rates
- Required run rate vs current run rate
- Balls and wickets remaining
- Target score
Performance Summary
| Model | Stage | Accuracy | AUC-ROC | Log Loss |
|---|---|---|---|---|
| XGBoost | 5 overs | 0.671 | 0.700 | 0.656 |
| XGBoost | 10 overs | 0.685 | 0.754 | 0.648 |
| XGBoost | 15 overs | 0.818 | 0.892 | 0.427 |
| XGBoost | 20 overs | 0.916 | 0.978 | 0.204 |
Baseline (always predict majority class): ~52.4%
Files
xgboost_stage_5.pklβ In-match model at 5 oversxgboost_stage_10.pklβ In-match model at 10 oversxgboost_stage_15.pklβ In-match model at 15 oversxgboost_stage_20.pklβ In-match model at 20 oversteam_encoder.pklβ Label encoder for team namesstage_results.jsonβ Evaluation metrics per stage
Usage
import joblib
import pandas as pd
# Load model for a specific stage
model = joblib.load("xgboost_stage_15.pkl")
encoder = joblib.load("team_encoder.pkl")
# Predict win probability for batting first team
features = pd.DataFrame({
'batting_first_enc': [encoder.transform(["Royal Challengers Bengaluru"])[0]],
'batting_second_enc': [encoder.transform(["Mumbai Indians"])[0]],
'toss_decision_bat': [1],
't1_runs': [120],
't1_balls': [90],
't1_wickets': [3],
't1_boundaries': [12],
't1_dots': [25],
't1_run_rate': [8.0],
't2_runs': [80],
't2_balls': [60],
't2_wickets': [2],
't2_boundaries': [8],
't2_dots': [20],
't2_run_rate': [8.0],
'target': [180],
'second_innings_active': [1],
'runs_remaining': [100],
'balls_remaining': [60],
'required_run_rate': [10.0],
'rr_diff': [0.0],
't2_wickets_remaining': [8],
})
prob = model.predict_proba(features)[0][1]
print(f"Batting first team win probability: {prob:.2%}")
License
Dataset used: CC0-1.0 (public domain)
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support