uleeberber
/

models_assignment_2

Model card Files Files and versions

xet

Community

uleeberber commited on Dec 9, 2025

Commit

6b3239e

verified ·

1 Parent(s): c853412

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -3

README.md CHANGED Viewed

@@ -34,7 +34,7 @@ Classification: Categorizing a workout as Low, Medium, or High intensity.
 **Data Cleaning:** there were no missing value or duplicates allowing for robust modeling without data imputation.
-**Decriptive statistics:**
 Sessions last on average ~1.26 hours (SD ≈ 0.34), so most sessions are between ~0.9 and 1.6 hours.
@@ -135,7 +135,7 @@ Top Negative Driver: Workout_Type_Yoga (lower intensity).
 Secondary Drivers: Experience_Level, Weight, and BMI had meaningful but smaller impacts.
-In conclustion Duration and Activity Type are the dominant predictors, validating the initial EDA findings.
 **Part 4: Feature Engineering**
@@ -213,7 +213,7 @@ Distance from each user to their cluster centroid, indicating how typical or aty
 After building the baseline regression model, the next step was to retrain and compare multiple models using the fully engineered feature set created in Part 4. This improved dataset included:
-The six engineered numeric featuresm, Physique_Cluster (one-hot encoded), Physique_Dist, All original encoded and scaled variables
 This richer feature representation allows more advanced models to detect complex, nonlinear relationships that Linear Regression cannot capture.
@@ -381,3 +381,7 @@ The trained Random Forest pipeline, including the scaler, clustering model, thre
 [Download Winning Classification Model (.pkl)](https://huggingface.co/uleeberber/models_assignment_2/resolve/main/the_winning_classification_pipeline.pkl)

 **Data Cleaning:** there were no missing value or duplicates allowing for robust modeling without data imputation.
+**Descriptive statistics:**
 Sessions last on average ~1.26 hours (SD ≈ 0.34), so most sessions are between ~0.9 and 1.6 hours.
 Secondary Drivers: Experience_Level, Weight, and BMI had meaningful but smaller impacts.
+In conclusion Duration and Activity Type are the dominant predictors, validating the initial EDA findings.
 **Part 4: Feature Engineering**
 After building the baseline regression model, the next step was to retrain and compare multiple models using the fully engineered feature set created in Part 4. This improved dataset included:
+The six engineered numeric features, Physique_Cluster (one-hot encoded), Physique_Dist, All original encoded and scaled variables
 This richer feature representation allows more advanced models to detect complex, nonlinear relationships that Linear Regression cannot capture.
 [Download Winning Classification Model (.pkl)](https://huggingface.co/uleeberber/models_assignment_2/resolve/main/the_winning_classification_pipeline.pkl)
+**Conclusion**
+The analysis identified the Random Forest algorithm as the superior model, achieving near-perfect performance for both regression (R^2 = 0.9999) and classification (99.88% Accuracy).While these metrics demonstrate exceptional predictive power, the remarkably high accuracy, combined with the unexpectedly low feature importance of Heart Rate and Weight, suggests the underlying dataset is likely synthetic. In real-world physiology, heart rate and body mass are critical drivers of energy expenditure; their lower correlation here indicates the data was likely generated using a deterministic formula heavily weighted toward Duration and Activity Type.