IPL Match Winner Prediction Model

This repository contains machine learning models for predicting IPL (Indian Premier League) cricket match outcomes using historical ball-by-ball data from 2008–2025.

Dataset

  • Source: prasad-gade05/ipl-enriched-2008-2025 on Hugging Face
  • Coverage: 278,205 ball-by-ball records across 1,169 matches (2008–2025)
  • Splits: Time-based train/test (seasons 2008–2023 train, 2024–2025 test)

Models

1. Pre-Match Winner Prediction (ipl_model_v2/)

Predicts the winner before the match starts using engineered features:

  • Elo ratings for each team
  • Multi-window rolling form (last 3, 5, 10 matches)
  • Season-level performance stats
  • Venue-specific chasing win rates and average scores
  • Toss impact features
  • Head-to-head historical records

Models trained:

  • XGBoost
  • Random Forest
  • Gradient Boosting
  • Logistic Regression
  • Weighted Ensemble of all four

2. In-Match Win Probability (ipl_inmatch_model/)

Predicts the win probability during the match at different stages:

  • After 5 overs: ~67% accuracy, AUC 0.70
  • After 10 overs: ~69% accuracy, AUC 0.75
  • After 15 overs: ~82% accuracy, AUC 0.89
  • After 20 overs: ~92% accuracy, AUC 0.98

Each stage uses the current match state:

  • Runs scored, wickets lost, run rates
  • Required run rate vs current run rate
  • Balls and wickets remaining
  • Target score

Performance Summary

Model Stage Accuracy AUC-ROC Log Loss
XGBoost 5 overs 0.671 0.700 0.656
XGBoost 10 overs 0.685 0.754 0.648
XGBoost 15 overs 0.818 0.892 0.427
XGBoost 20 overs 0.916 0.978 0.204

Baseline (always predict majority class): ~52.4%

Files

  • xgboost_stage_5.pkl β€” In-match model at 5 overs
  • xgboost_stage_10.pkl β€” In-match model at 10 overs
  • xgboost_stage_15.pkl β€” In-match model at 15 overs
  • xgboost_stage_20.pkl β€” In-match model at 20 overs
  • team_encoder.pkl β€” Label encoder for team names
  • stage_results.json β€” Evaluation metrics per stage

Usage

import joblib
import pandas as pd

# Load model for a specific stage
model = joblib.load("xgboost_stage_15.pkl")
encoder = joblib.load("team_encoder.pkl")

# Predict win probability for batting first team
features = pd.DataFrame({
    'batting_first_enc': [encoder.transform(["Royal Challengers Bengaluru"])[0]],
    'batting_second_enc': [encoder.transform(["Mumbai Indians"])[0]],
    'toss_decision_bat': [1],
    't1_runs': [120],
    't1_balls': [90],
    't1_wickets': [3],
    't1_boundaries': [12],
    't1_dots': [25],
    't1_run_rate': [8.0],
    't2_runs': [80],
    't2_balls': [60],
    't2_wickets': [2],
    't2_boundaries': [8],
    't2_dots': [20],
    't2_run_rate': [8.0],
    'target': [180],
    'second_innings_active': [1],
    'runs_remaining': [100],
    'balls_remaining': [60],
    'required_run_rate': [10.0],
    'rr_diff': [0.0],
    't2_wickets_remaining': [8],
})

prob = model.predict_proba(features)[0][1]
print(f"Batting first team win probability: {prob:.2%}")

License

Dataset used: CC0-1.0 (public domain)

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support