Tomertg
/

Gradient_Boosting

Model card Files Files and versions

xet

Community

Tomertg commited on Nov 25, 2025

Commit

c7dc25c

verified ·

1 Parent(s): 3b5f5a0

Update README.md

Browse files

Files changed (1) hide show

README.md +129 -18

README.md CHANGED Viewed

@@ -1,34 +1,145 @@
-# 🏋️‍♂️ Gradient Boosting Deadlift Predictor
-This repository contains the winning model from Assignment #2: Classification, Regression, Clustering & Evaluation.
-## 📌 Model Purpose
-The model predicts an athlete's **deadlift performance (lbs)** based on physical and strength-related features.
-## 🧠 Algorithm
-✅ Gradient Boosting Regressor
-Selected as the final model after comparing:
-- Linear Regression
 - Random Forest
 - Gradient Boosting
-## 🏆 Performance (Test Set)
-- R²: 0.85
-- MAE: ~28.6 lbs
-- RMSE: ~37.2 lbs
-Gradient Boosting achieved the **highest accuracy and lowest error**, so it was chosen as the final model.
-## 📁 Files
-- `winning_model.pkl` – serialized model ready for loading and inference
-## 🔧 Usage
 ```python
 import pickle
-with open("winning_model.pkl", "rb") as f:
     model = pickle.load(f)
-prediction = model.predict([[weight, height, backsquat, snatch]])

+# Strength Performance Analysis and Modeling
+## Overview
+This project analyzes a large dataset of athlete strength metrics to understand patterns in deadlift performance and build predictive and classification models.
+The work includes:
+- Exploratory Data Analysis (EDA)
+- Feature engineering
+- Regression modeling
+- Classification modeling
+- Clustering
+- Model selection and export
+The final goal was to classify athletes into performance categories and evaluate which model performs best.
+---
+## Dataset
+The dataset includes:
+- Body weight
+- Height
+- Age
+- Strength metrics: deadlift, back squat, snatch
+After cleaning, outliers were removed and missing values handled.
+---
+## Exploratory Data Analysis (EDA)
+### Average Deadlift by Body Weight
+![img11](img11.png)
+Heavier weight categories generally show higher deadlift performance.
+### Average Deadlift by Height
+![img12](img12.png)
+Taller athletes tend to lift more, with increasing variance at higher height ranges.
+### Average Deadlift by Age
+![img13](img13.png)
+Performance peaks around ages 25–34 and gradually decreases afterward.
+### Body Ratio and Deadlift
+![img14](img14.png)
+Higher strength-to-body weight ratios correlate with higher deadlift results.
+### Strength Metric Correlations
+![img15](img15.png)
+Deadlift and back squat show a strong positive correlation, while snatch is weakly correlated.
+---
+## Regression Modeling
+A baseline linear regression model was trained to predict deadlift performance.
+### Actual vs Predicted Deadlift
+![img16](img16.png)
+The model follows the general trend but shows noise due to variability between athletes.
+---
+## Clustering
+K-Means clustering was applied to identify athlete groups based on performance metrics.
+### Cluster Visualization (PCA)
+![img17](img17.png)
+Three clear performance clusters were identified, separating athletes by overall strength level.
+---
+## Classification Modeling
+Athletes were categorized into three balanced deadlift performance classes:
+- Low
+- Medium
+- High
+Models trained:
+- Logistic Regression
 - Random Forest
 - Gradient Boosting
+### Confusion Matrices
+Logistic Regression:
+![img18](img18.png)
+Random Forest:
+![img19](img19.png)
+Gradient Boosting:
+![img20](img20.png)
+---
+## Model Evaluation
+All models achieved high accuracy, precision, recall, and F1-score.
+However:
+- Random Forest made fewer critical misclassifications
+- It showed better separation between High and Low classes
+- It achieved the highest F1-score
+Therefore, the Random Forest model was selected as the final classification model.
+---
+## Final Model
+The winning model was:
+Random Forest Classifier
+It was trained fully and exported as:
+`classification_winner.pkl`
+---
+## How to Load the Model
 ```python
 import pickle
+with open("classification_winner.pkl", "rb") as f:
     model = pickle.load(f)
+prediction = model.predict(X_sample)