Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Gladiator Winning Model
|
| 2 |
+
|
| 3 |
+
This repository contains a trained **Gradient Boosting classifier** used to predict gladiator fight outcomes.
|
| 4 |
+
|
| 5 |
+
## Dataset
|
| 6 |
+
The full dataset is available on Kaggle:
|
| 7 |
+
https://www.kaggle.com/datasets/anthonytherrien/gladiator-combat-records-and-profiles-dataset
|
| 8 |
+
|
| 9 |
+
A representative sample may be included in this repository for demonstration purposes.
|
| 10 |
+
|
| 11 |
+
## Model Performance
|
| 12 |
+
- **F1-score:** 0.910
|
| 13 |
+
- **Accuracy:** ~90%
|
| 14 |
+
- **ROC-AUC:** 0.970
|
| 15 |
+
The Gradient Boosting model performed the best out of all tested classifiers.
|
| 16 |
+
|
| 17 |
+
## Model Comparison
|
| 18 |
+
|
| 19 |
+
Several models were tested, including Logistic Regression, Random Forest, and Gradient Boosting.
|
| 20 |
+
In both the regression-style and classification tasks, **Gradient Boosting consistently performed the best**,
|
| 21 |
+
showing higher accuracy and a better balance between precision and recall.
|
| 22 |
+
This makes it the most reliable model for predicting gladiator outcomes.
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
## Usage
|
| 26 |
+
```python
|
| 27 |
+
import pickle
|
| 28 |
+
|
| 29 |
+
with open("gladiator_gradient_boosting_classifier.pkl", "rb") as f:
|
| 30 |
+
model = pickle.load(f)
|
| 31 |
+
]
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
## Exploratory Data Analysis (EDA)
|
| 36 |
+
Before modeling, the dataset was explored to understand the distribution of gladiator attributes and fighting outcomes.
|
| 37 |
+
Outliers were handled carefully to ensure they did not distort model training.
|
| 38 |
+
|
| 39 |
+
### Key Findings
|
| 40 |
+
- The dataset contains **detailed gladiator profiles**, including age, height, weight, fighting style, armor type, victory count, and more.
|
| 41 |
+
- Many features show **clear relationships with the final battle outcome** (Win/Loss), as the numerical features: Battle Experience, Public Favor.
|
| 42 |
+
- Categorical features such as **Gladiator Type, Weapon Type, Fighting Style, Crowd Appeal Techniques and Previous Occupation** showed meaningful differences between winners and non-winners.
|
| 43 |
+
- Correlation analysis indicated that **experience-based features** (e.g., previous wins) have stronger predictive power than purely physical attributes.
|
| 44 |
+
|
| 45 |
+
Strong correlation between Wins and Public Favor & Battle Experience:
|
| 46 |
+

|
| 47 |
+
|
| 48 |
+
Categorical features with strong correlation with the target variable Wins:
|
| 49 |
+

|
| 50 |
+
|
| 51 |
+
### Data Cleaning Steps
|
| 52 |
+
- Missing values were imputed or removed depending on relevance.
|
| 53 |
+
- Boolean and categorical features were encoded into numeric form.
|
| 54 |
+
- New engineered features such as **BMI**, **interaction terms**, and **readiness scores** were added to improve predictive performance.
|
| 55 |
+
- The target variable was converted into a binary class using a median split (`Win` vs. `Not Win`).
|
| 56 |
+
|
| 57 |
+
### Feature Insights
|
| 58 |
+
During EDA, battle experience emerged as the strongest predictor, showing a very high correlation (~0.95) with the target outcome.
|
| 59 |
+
However, because this feature was essentially a direct indicator of the final result, it created data leakage.
|
| 60 |
+
To ensure a fair and realistic model, this feature was removed from training, which made the remaining predictors more meaningful and prevented overly optimistic performance.
|
| 61 |
+
|
| 62 |
+
This EDA process provided important insights that shaped feature engineering and model selection.
|