uleeberber commited on
Commit
6b3239e
·
verified ·
1 Parent(s): c853412

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -34,7 +34,7 @@ Classification: Categorizing a workout as Low, Medium, or High intensity.
34
 
35
  **Data Cleaning:** there were no missing value or duplicates allowing for robust modeling without data imputation.
36
 
37
- **Decriptive statistics:**
38
 
39
  Sessions last on average ~1.26 hours (SD ≈ 0.34), so most sessions are between ~0.9 and 1.6 hours.
40
 
@@ -135,7 +135,7 @@ Top Negative Driver: Workout_Type_Yoga (lower intensity).
135
 
136
  Secondary Drivers: Experience_Level, Weight, and BMI had meaningful but smaller impacts.
137
 
138
- In conclustion Duration and Activity Type are the dominant predictors, validating the initial EDA findings.
139
 
140
  **Part 4: Feature Engineering**
141
 
@@ -213,7 +213,7 @@ Distance from each user to their cluster centroid, indicating how typical or aty
213
 
214
  After building the baseline regression model, the next step was to retrain and compare multiple models using the fully engineered feature set created in Part 4. This improved dataset included:
215
 
216
- The six engineered numeric featuresm, Physique_Cluster (one-hot encoded), Physique_Dist, All original encoded and scaled variables
217
 
218
  This richer feature representation allows more advanced models to detect complex, nonlinear relationships that Linear Regression cannot capture.
219
 
@@ -381,3 +381,7 @@ The trained Random Forest pipeline, including the scaler, clustering model, thre
381
 
382
  [Download Winning Classification Model (.pkl)](https://huggingface.co/uleeberber/models_assignment_2/resolve/main/the_winning_classification_pipeline.pkl)
383
 
 
 
 
 
 
34
 
35
  **Data Cleaning:** there were no missing value or duplicates allowing for robust modeling without data imputation.
36
 
37
+ **Descriptive statistics:**
38
 
39
  Sessions last on average ~1.26 hours (SD ≈ 0.34), so most sessions are between ~0.9 and 1.6 hours.
40
 
 
135
 
136
  Secondary Drivers: Experience_Level, Weight, and BMI had meaningful but smaller impacts.
137
 
138
+ In conclusion Duration and Activity Type are the dominant predictors, validating the initial EDA findings.
139
 
140
  **Part 4: Feature Engineering**
141
 
 
213
 
214
  After building the baseline regression model, the next step was to retrain and compare multiple models using the fully engineered feature set created in Part 4. This improved dataset included:
215
 
216
+ The six engineered numeric features, Physique_Cluster (one-hot encoded), Physique_Dist, All original encoded and scaled variables
217
 
218
  This richer feature representation allows more advanced models to detect complex, nonlinear relationships that Linear Regression cannot capture.
219
 
 
381
 
382
  [Download Winning Classification Model (.pkl)](https://huggingface.co/uleeberber/models_assignment_2/resolve/main/the_winning_classification_pipeline.pkl)
383
 
384
+ **Conclusion**
385
+
386
+ The analysis identified the Random Forest algorithm as the superior model, achieving near-perfect performance for both regression (R^2 = 0.9999) and classification (99.88% Accuracy).While these metrics demonstrate exceptional predictive power, the remarkably high accuracy, combined with the unexpectedly low feature importance of Heart Rate and Weight, suggests the underlying dataset is likely synthetic. In real-world physiology, heart rate and body mass are critical drivers of energy expenditure; their lower correlation here indicates the data was likely generated using a deterministic formula heavily weighted toward Duration and Activity Type.
387
+