YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Video: https://youtu.be/wvdXkOgies4

Gladiator Winning Model

This repository contains a trained Gradient Boosting classifier used to predict gladiator fight outcomes.

Dataset

The full dataset is available on Kaggle:
https://www.kaggle.com/datasets/anthonytherrien/gladiator-combat-records-and-profiles-dataset

A representative sample may be included in this repository for demonstration purposes.

Model Performance

  • F1-score: 0.910
  • Accuracy: ~90%
  • ROC-AUC: 0.970
    The Gradient Boosting model performed the best out of all tested classifiers.

Model Comparison

Several models were tested, including Logistic Regression, Random Forest, and Gradient Boosting. In both the regression-style and classification tasks, Gradient Boosting consistently performed the best, showing higher accuracy and a better balance between precision and recall. This makes it the most reliable model for predicting gladiator outcomes.

Exploratory Data Analysis (EDA)

Before modeling, the dataset was explored to understand the distribution of gladiator attributes and fighting outcomes. Outliers were handled carefully to ensure they did not distort model training.

Key Findings

  • The dataset contains detailed gladiator profiles, including age, height, weight, fighting style, armor type, victory count, and more.
  • Many features show clear relationships with the final battle outcome (Win/Loss), as the numerical features: Battle Experience, Public Favor.
  • Categorical features such as Gladiator Type, Weapon Type, Fighting Style, Crowd Appeal Techniques and Previous Occupation showed meaningful differences between winners and non-winners.
  • Correlation analysis indicated that experience-based features (e.g., previous wins) have stronger predictive power than purely physical attributes.

Strong correlation between Wins and Public Favor & Battle Experience: heat-map

Categorical features with strong correlation with the target variable Wins: Screenshot 2025-12-11 231821

Data Cleaning Steps

  • Missing values were imputed or removed depending on relevance.
  • Boolean and categorical features were encoded into numeric form.
  • New engineered features such as BMI, interaction terms, and readiness scores were added to improve predictive performance.
  • The target variable was converted into a binary class using a median split (Win vs. Not Win).

Feature Insights

During EDA, battle experience emerged as the strongest predictor, showing a very high correlation (~0.95) with the target outcome. However, because this feature was essentially a direct indicator of the final result, it created data leakage. To ensure a fair and realistic model, this feature was removed from training, which made the remaining predictors more meaningful and prevented overly optimistic performance.

This EDA process provided important insights that shaped feature engineering and model selection.

Usage

import pickle

with open("gladiator_gradient_boosting_classifier.pkl", "rb") as f:
    model = pickle.load(f)
]
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support