| Video: | |
| https://youtu.be/wvdXkOgies4 | |
| # Gladiator Winning Model | |
| This repository contains a trained **Gradient Boosting classifier** used to predict gladiator fight outcomes. | |
| ## Dataset | |
| The full dataset is available on Kaggle: | |
| https://www.kaggle.com/datasets/anthonytherrien/gladiator-combat-records-and-profiles-dataset | |
| A representative sample may be included in this repository for demonstration purposes. | |
| ## Model Performance | |
| - **F1-score:** 0.910 | |
| - **Accuracy:** ~90% | |
| - **ROC-AUC:** 0.970 | |
| The Gradient Boosting model performed the best out of all tested classifiers. | |
| ## Model Comparison | |
| Several models were tested, including Logistic Regression, Random Forest, and Gradient Boosting. | |
| In both the regression-style and classification tasks, **Gradient Boosting consistently performed the best**, | |
| showing higher accuracy and a better balance between precision and recall. | |
| This makes it the most reliable model for predicting gladiator outcomes. | |
| ## Exploratory Data Analysis (EDA) | |
| Before modeling, the dataset was explored to understand the distribution of gladiator attributes and fighting outcomes. | |
| Outliers were handled carefully to ensure they did not distort model training. | |
| ### Key Findings | |
| - The dataset contains **detailed gladiator profiles**, including age, height, weight, fighting style, armor type, victory count, and more. | |
| - Many features show **clear relationships with the final battle outcome** (Win/Loss), as the numerical features: Battle Experience, Public Favor. | |
| - Categorical features such as **Gladiator Type, Weapon Type, Fighting Style, Crowd Appeal Techniques and Previous Occupation** showed meaningful differences between winners and non-winners. | |
| - Correlation analysis indicated that **experience-based features** (e.g., previous wins) have stronger predictive power than purely physical attributes. | |
| Strong correlation between Wins and Public Favor & Battle Experience: | |
|  | |
| Categorical features with strong correlation with the target variable Wins: | |
|  | |
| ### Data Cleaning Steps | |
| - Missing values were imputed or removed depending on relevance. | |
| - Boolean and categorical features were encoded into numeric form. | |
| - New engineered features such as **BMI**, **interaction terms**, and **readiness scores** were added to improve predictive performance. | |
| - The target variable was converted into a binary class using a median split (`Win` vs. `Not Win`). | |
| ### Feature Insights | |
| During EDA, battle experience emerged as the strongest predictor, showing a very high correlation (~0.95) with the target outcome. | |
| However, because this feature was essentially a direct indicator of the final result, it created data leakage. | |
| To ensure a fair and realistic model, this feature was removed from training, which made the remaining predictors more meaningful and prevented overly optimistic performance. | |
| This EDA process provided important insights that shaped feature engineering and model selection. | |
| ## Usage | |
| ```python | |
| import pickle | |
| with open("gladiator_gradient_boosting_classifier.pkl", "rb") as f: | |
| model = pickle.load(f) | |
| ] |