Tomertg commited on
Commit
c7dc25c
·
verified ·
1 Parent(s): 3b5f5a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -18
README.md CHANGED
@@ -1,34 +1,145 @@
1
- # 🏋️‍♂️ Gradient Boosting Deadlift Predictor
2
 
3
- This repository contains the winning model from Assignment #2: Classification, Regression, Clustering & Evaluation.
4
 
5
- ## 📌 Model Purpose
6
- The model predicts an athlete's **deadlift performance (lbs)** based on physical and strength-related features.
7
 
8
- ## 🧠 Algorithm
9
- ✅ Gradient Boosting Regressor
10
- Selected as the final model after comparing:
11
 
12
- - Linear Regression
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - Random Forest
14
  - Gradient Boosting
15
 
16
- ## 🏆 Performance (Test Set)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- - R²: 0.85
19
- - MAE: ~28.6 lbs
20
- - RMSE: ~37.2 lbs
21
 
22
- Gradient Boosting achieved the **highest accuracy and lowest error**, so it was chosen as the final model.
23
 
24
- ## 📁 Files
25
- - `winning_model.pkl` – serialized model ready for loading and inference
26
 
27
- ## 🔧 Usage
28
  ```python
29
  import pickle
30
 
31
- with open("winning_model.pkl", "rb") as f:
32
  model = pickle.load(f)
33
 
34
- prediction = model.predict([[weight, height, backsquat, snatch]])
 
1
+ # Strength Performance Analysis and Modeling
2
 
3
+ ## Overview
4
 
5
+ This project analyzes a large dataset of athlete strength metrics to understand patterns in deadlift performance and build predictive and classification models.
 
6
 
7
+ The work includes:
 
 
8
 
9
+ - Exploratory Data Analysis (EDA)
10
+ - Feature engineering
11
+ - Regression modeling
12
+ - Classification modeling
13
+ - Clustering
14
+ - Model selection and export
15
+
16
+ The final goal was to classify athletes into performance categories and evaluate which model performs best.
17
+
18
+ ---
19
+
20
+ ## Dataset
21
+
22
+ The dataset includes:
23
+
24
+ - Body weight
25
+ - Height
26
+ - Age
27
+ - Strength metrics: deadlift, back squat, snatch
28
+
29
+ After cleaning, outliers were removed and missing values handled.
30
+
31
+ ---
32
+
33
+ ## Exploratory Data Analysis (EDA)
34
+
35
+ ### Average Deadlift by Body Weight
36
+ ![img11](img11.png)
37
+
38
+ Heavier weight categories generally show higher deadlift performance.
39
+
40
+ ### Average Deadlift by Height
41
+ ![img12](img12.png)
42
+
43
+ Taller athletes tend to lift more, with increasing variance at higher height ranges.
44
+
45
+ ### Average Deadlift by Age
46
+ ![img13](img13.png)
47
+
48
+ Performance peaks around ages 25–34 and gradually decreases afterward.
49
+
50
+ ### Body Ratio and Deadlift
51
+ ![img14](img14.png)
52
+
53
+ Higher strength-to-body weight ratios correlate with higher deadlift results.
54
+
55
+ ### Strength Metric Correlations
56
+ ![img15](img15.png)
57
+
58
+ Deadlift and back squat show a strong positive correlation, while snatch is weakly correlated.
59
+
60
+ ---
61
+
62
+ ## Regression Modeling
63
+
64
+ A baseline linear regression model was trained to predict deadlift performance.
65
+
66
+ ### Actual vs Predicted Deadlift
67
+ ![img16](img16.png)
68
+
69
+ The model follows the general trend but shows noise due to variability between athletes.
70
+
71
+ ---
72
+
73
+ ## Clustering
74
+
75
+ K-Means clustering was applied to identify athlete groups based on performance metrics.
76
+
77
+ ### Cluster Visualization (PCA)
78
+ ![img17](img17.png)
79
+
80
+ Three clear performance clusters were identified, separating athletes by overall strength level.
81
+
82
+ ---
83
+
84
+ ## Classification Modeling
85
+
86
+ Athletes were categorized into three balanced deadlift performance classes:
87
+
88
+ - Low
89
+ - Medium
90
+ - High
91
+
92
+ Models trained:
93
+
94
+ - Logistic Regression
95
  - Random Forest
96
  - Gradient Boosting
97
 
98
+ ### Confusion Matrices
99
+
100
+ Logistic Regression:
101
+ ![img18](img18.png)
102
+
103
+ Random Forest:
104
+ ![img19](img19.png)
105
+
106
+ Gradient Boosting:
107
+ ![img20](img20.png)
108
+
109
+ ---
110
+
111
+ ## Model Evaluation
112
+
113
+ All models achieved high accuracy, precision, recall, and F1-score.
114
+
115
+ However:
116
+
117
+ - Random Forest made fewer critical misclassifications
118
+ - It showed better separation between High and Low classes
119
+ - It achieved the highest F1-score
120
+
121
+ Therefore, the Random Forest model was selected as the final classification model.
122
+
123
+ ---
124
+
125
+ ## Final Model
126
+
127
+ The winning model was:
128
+
129
+ Random Forest Classifier
130
+
131
+ It was trained fully and exported as:
132
 
133
+ `classification_winner.pkl`
 
 
134
 
135
+ ---
136
 
137
+ ## How to Load the Model
 
138
 
 
139
  ```python
140
  import pickle
141
 
142
+ with open("classification_winner.pkl", "rb") as f:
143
  model = pickle.load(f)
144
 
145
+ prediction = model.predict(X_sample)