Amol Kaushik commited on
Commit
0c4ffb9
·
1 Parent(s): d742823

final report

Browse files
Files changed (1) hide show
  1. A3/A3_Report.ipynb +41 -41
A3/A3_Report.ipynb CHANGED
@@ -19,7 +19,7 @@
19
  "source": [
20
  "## 1. Problem statement\n",
21
  "\n",
22
- "The goal is to classify the weakest link in a squat movement. Given 38 movement features, the model predicts which body region is limiting the person's squat. The input is 38 movement features and the output is a body region classification which is upper and lower body. Due to massive class imbalance in the original 14 class problem with some classes with only 1-2 datapoints, we changed to a 2-class approach that gives predictions that actually helps."
23
  ]
24
  },
25
  {
@@ -47,7 +47,7 @@
47
  },
48
  {
49
  "cell_type": "code",
50
- "execution_count": 121,
51
  "id": "edbe3fbd",
52
  "metadata": {},
53
  "outputs": [],
@@ -77,7 +77,7 @@
77
  },
78
  {
79
  "cell_type": "code",
80
- "execution_count": 122,
81
  "id": "23f1b38b",
82
  "metadata": {},
83
  "outputs": [
@@ -128,7 +128,7 @@
128
  },
129
  {
130
  "cell_type": "code",
131
- "execution_count": 123,
132
  "id": "080ab472",
133
  "metadata": {},
134
  "outputs": [
@@ -200,7 +200,7 @@
200
  },
201
  {
202
  "cell_type": "code",
203
- "execution_count": 124,
204
  "id": "438e27ae",
205
  "metadata": {},
206
  "outputs": [
@@ -298,7 +298,7 @@
298
  },
299
  {
300
  "cell_type": "code",
301
- "execution_count": 125,
302
  "id": "7560ae66",
303
  "metadata": {},
304
  "outputs": [
@@ -335,7 +335,7 @@
335
  },
336
  {
337
  "cell_type": "code",
338
- "execution_count": 126,
339
  "id": "9f17a88e",
340
  "metadata": {},
341
  "outputs": [
@@ -381,7 +381,7 @@
381
  },
382
  {
383
  "cell_type": "code",
384
- "execution_count": 127,
385
  "id": "d4c02996",
386
  "metadata": {},
387
  "outputs": [],
@@ -404,7 +404,7 @@
404
  },
405
  {
406
  "cell_type": "code",
407
- "execution_count": 128,
408
  "id": "c8292b2b",
409
  "metadata": {},
410
  "outputs": [],
@@ -442,7 +442,7 @@
442
  },
443
  {
444
  "cell_type": "code",
445
- "execution_count": 129,
446
  "id": "b598aef7",
447
  "metadata": {},
448
  "outputs": [
@@ -475,7 +475,7 @@
475
  },
476
  {
477
  "cell_type": "code",
478
- "execution_count": 130,
479
  "id": "962743cc",
480
  "metadata": {},
481
  "outputs": [
@@ -603,7 +603,7 @@
603
  },
604
  {
605
  "cell_type": "code",
606
- "execution_count": 131,
607
  "id": "5c9efd5b",
608
  "metadata": {},
609
  "outputs": [
@@ -636,7 +636,7 @@
636
  },
637
  {
638
  "cell_type": "code",
639
- "execution_count": 132,
640
  "id": "ce01a75f",
641
  "metadata": {},
642
  "outputs": [
@@ -812,9 +812,19 @@
812
  "display(tuning_df_14class)"
813
  ]
814
  },
 
 
 
 
 
 
 
 
 
 
815
  {
816
  "cell_type": "code",
817
- "execution_count": 133,
818
  "id": "3e5e5e9b",
819
  "metadata": {},
820
  "outputs": [
@@ -849,7 +859,7 @@
849
  },
850
  {
851
  "cell_type": "code",
852
- "execution_count": 134,
853
  "id": "4de69063",
854
  "metadata": {},
855
  "outputs": [
@@ -885,7 +895,7 @@
885
  },
886
  {
887
  "cell_type": "code",
888
- "execution_count": 135,
889
  "id": "a994b1af",
890
  "metadata": {},
891
  "outputs": [
@@ -1035,7 +1045,7 @@
1035
  },
1036
  {
1037
  "cell_type": "code",
1038
- "execution_count": 136,
1039
  "id": "00f3eda4",
1040
  "metadata": {},
1041
  "outputs": [
@@ -1067,7 +1077,7 @@
1067
  },
1068
  {
1069
  "cell_type": "code",
1070
- "execution_count": 137,
1071
  "id": "6b03902f",
1072
  "metadata": {},
1073
  "outputs": [
@@ -1249,7 +1259,7 @@
1249
  "source": [
1250
  "## 8. Why we did not use polynomial features\n",
1251
  "\n",
1252
- "We tested polynomial interaction features which created 820 new features from the original 40. However, this approach was not used in the final model because the F1-score improvement was negligible, 820 features vs 40 original features makes it hard to interpret the model. Many more parameters to learn from the same amount of data, so the tuned body region model without polynomial features provides a good balance of accuracy and simplicity."
1253
  ]
1254
  },
1255
  {
@@ -1262,7 +1272,7 @@
1262
  },
1263
  {
1264
  "cell_type": "code",
1265
- "execution_count": 138,
1266
  "id": "0b3e066a",
1267
  "metadata": {},
1268
  "outputs": [
@@ -1398,7 +1408,7 @@
1398
  },
1399
  {
1400
  "cell_type": "code",
1401
- "execution_count": 139,
1402
  "id": "d21c037d",
1403
  "metadata": {},
1404
  "outputs": [
@@ -1476,7 +1486,7 @@
1476
  },
1477
  {
1478
  "cell_type": "code",
1479
- "execution_count": 142,
1480
  "id": "4f01e27a",
1481
  "metadata": {},
1482
  "outputs": [
@@ -1530,7 +1540,9 @@
1530
  "source": [
1531
  "## 10. Deployment\n",
1532
  "\n",
1533
- "The classification endpoint is added to the existing Gradio app as a second tab. Tab 1 has Movement Scoring from A2. Tab 2 has Body Region Classification which takes 38 deviation features as input and outputs the predicted body region (Upper Body or Lower Body). The deployed model is KNN (k=7) with StandardScaler preprocessing, achieving 82.8% F1-score.\n",
 
 
1534
  "\n",
1535
  "Deployment URL: https://huggingface.co/spaces/Bachstelze/github_sync"
1536
  ]
@@ -1540,23 +1552,15 @@
1540
  "id": "67013cc1",
1541
  "metadata": {},
1542
  "source": [
1543
- "## 11. Virtual environment\n",
1544
  "\n",
 
1545
  "```bash\n",
1546
  "python -m venv venv\n",
1547
  "source venv/bin/activate\n",
1548
  "pip install -r requirements.txt\n",
1549
- "```"
1550
- ]
1551
- },
1552
- {
1553
- "cell_type": "markdown",
1554
- "id": "445419d9",
1555
- "metadata": {},
1556
- "source": [
1557
- "## 12. DevOps/MLOps process\n",
1558
  "\n",
1559
- "GitHub Actions automatically syncs the repository to HuggingFace Spaces when pushed to main. The workflow file is found at .github/workflows/push_to_hf_space.yml."
1560
  ]
1561
  },
1562
  {
@@ -1564,7 +1568,7 @@
1564
  "id": "7a142abd",
1565
  "metadata": {},
1566
  "source": [
1567
- "## 13. Contributions \n",
1568
  "\n",
1569
  "| Member | GitHub Issue | Tasks |\n",
1570
  "|--------|--------------|-------|\n",
@@ -1579,7 +1583,7 @@
1579
  "id": "f62680e2",
1580
  "metadata": {},
1581
  "source": [
1582
- "## 14. Iterations\n",
1583
  "\n",
1584
  "| # | Iteration | Approach | Key change |\n",
1585
  "|---|-----------|----------|------------|\n",
@@ -1588,11 +1592,7 @@
1588
  "| 3 | Baseline | Body Regions | Grouped classes (Upper/Lower) |\n",
1589
  "| 4 | Tuned | Body Regions | GridSearchCV (5-fold CV) |\n",
1590
  "\n",
1591
- "Note: Polynomial interaction features were tested but not included in final iterations due to minimal improvement and increased complexity (820 features vs 38).\n",
1592
- "\n",
1593
- "### Deployed Model\n",
1594
- "\n",
1595
- "The deployed model uses body region classification with KNN (k=7) and StandardScaler preprocessing. It takes 38 input features and achieves 82.8% F1-weighted and 84% accuracy on the test set."
1596
  ]
1597
  }
1598
  ],
 
19
  "source": [
20
  "## 1. Problem statement\n",
21
  "\n",
22
+ "The goal is to classify the weakest link in a squat movement. Given 41 movement features, the model predicts which body region is limiting the person's squat. The input is 41 movement features (AimoScore + 13 Angle deviations + 25 NASM deviations + 2 Time deviations) and the output is a body region classification which is upper and lower body. Due to massive class imbalance in the original 14 class problem with some classes with only 1-2 datapoints, we changed to a 2-class approach that gives predictions that actually helps."
23
  ]
24
  },
25
  {
 
47
  },
48
  {
49
  "cell_type": "code",
50
+ "execution_count": 143,
51
  "id": "edbe3fbd",
52
  "metadata": {},
53
  "outputs": [],
 
77
  },
78
  {
79
  "cell_type": "code",
80
+ "execution_count": 144,
81
  "id": "23f1b38b",
82
  "metadata": {},
83
  "outputs": [
 
128
  },
129
  {
130
  "cell_type": "code",
131
+ "execution_count": 145,
132
  "id": "080ab472",
133
  "metadata": {},
134
  "outputs": [
 
200
  },
201
  {
202
  "cell_type": "code",
203
+ "execution_count": 146,
204
  "id": "438e27ae",
205
  "metadata": {},
206
  "outputs": [
 
298
  },
299
  {
300
  "cell_type": "code",
301
+ "execution_count": 147,
302
  "id": "7560ae66",
303
  "metadata": {},
304
  "outputs": [
 
335
  },
336
  {
337
  "cell_type": "code",
338
+ "execution_count": 148,
339
  "id": "9f17a88e",
340
  "metadata": {},
341
  "outputs": [
 
381
  },
382
  {
383
  "cell_type": "code",
384
+ "execution_count": 149,
385
  "id": "d4c02996",
386
  "metadata": {},
387
  "outputs": [],
 
404
  },
405
  {
406
  "cell_type": "code",
407
+ "execution_count": 150,
408
  "id": "c8292b2b",
409
  "metadata": {},
410
  "outputs": [],
 
442
  },
443
  {
444
  "cell_type": "code",
445
+ "execution_count": 151,
446
  "id": "b598aef7",
447
  "metadata": {},
448
  "outputs": [
 
475
  },
476
  {
477
  "cell_type": "code",
478
+ "execution_count": 152,
479
  "id": "962743cc",
480
  "metadata": {},
481
  "outputs": [
 
603
  },
604
  {
605
  "cell_type": "code",
606
+ "execution_count": 153,
607
  "id": "5c9efd5b",
608
  "metadata": {},
609
  "outputs": [
 
636
  },
637
  {
638
  "cell_type": "code",
639
+ "execution_count": 154,
640
  "id": "ce01a75f",
641
  "metadata": {},
642
  "outputs": [
 
812
  "display(tuning_df_14class)"
813
  ]
814
  },
815
+ {
816
+ "cell_type": "markdown",
817
+ "id": "ba67bf51",
818
+ "metadata": {},
819
+ "source": [
820
+ "## Approach 2: Body Region Classification\n",
821
+ "\n",
822
+ "Due to the severe class imbalance in the 14-class problem, we explored grouping classes into 2 body regions (Upper Body / Lower Body) for more meaningful predictions."
823
+ ]
824
+ },
825
  {
826
  "cell_type": "code",
827
+ "execution_count": 155,
828
  "id": "3e5e5e9b",
829
  "metadata": {},
830
  "outputs": [
 
859
  },
860
  {
861
  "cell_type": "code",
862
+ "execution_count": 156,
863
  "id": "4de69063",
864
  "metadata": {},
865
  "outputs": [
 
895
  },
896
  {
897
  "cell_type": "code",
898
+ "execution_count": 157,
899
  "id": "a994b1af",
900
  "metadata": {},
901
  "outputs": [
 
1045
  },
1046
  {
1047
  "cell_type": "code",
1048
+ "execution_count": 158,
1049
  "id": "00f3eda4",
1050
  "metadata": {},
1051
  "outputs": [
 
1077
  },
1078
  {
1079
  "cell_type": "code",
1080
+ "execution_count": 159,
1081
  "id": "6b03902f",
1082
  "metadata": {},
1083
  "outputs": [
 
1259
  "source": [
1260
  "## 8. Why we did not use polynomial features\n",
1261
  "\n",
1262
+ "We tested polynomial interaction features which created 820 new features from the original 41. However, this approach was not used in the final model because the F1-score improvement was negligible, 820 features vs 41 original features makes it hard to interpret the model. Many more parameters to learn from the same amount of data, so the tuned body region model without polynomial features provides a good balance of accuracy and simplicity."
1263
  ]
1264
  },
1265
  {
 
1272
  },
1273
  {
1274
  "cell_type": "code",
1275
+ "execution_count": 160,
1276
  "id": "0b3e066a",
1277
  "metadata": {},
1278
  "outputs": [
 
1408
  },
1409
  {
1410
  "cell_type": "code",
1411
+ "execution_count": 161,
1412
  "id": "d21c037d",
1413
  "metadata": {},
1414
  "outputs": [
 
1486
  },
1487
  {
1488
  "cell_type": "code",
1489
+ "execution_count": 162,
1490
  "id": "4f01e27a",
1491
  "metadata": {},
1492
  "outputs": [
 
1540
  "source": [
1541
  "## 10. Deployment\n",
1542
  "\n",
1543
+ "The classification endpoint is added to the existing Gradio app as a second tab. Tab 1 has Movement Scoring from A2. Tab 2 has Body Region Classification which takes 41 features as input and outputs the predicted body region (Upper Body or Lower Body). The deployed model is KNN (k=7) with StandardScaler preprocessing, achieving 82.8% F1-weighted score and 84% accuracy on the test set.\n",
1544
+ "\n",
1545
+ "[Due to last minute issues, the app currently downloads the pickle model from google drive, but we are working to automate this. The overall functionality has not suffered, the issue exists with some policy restrictions on huggingface, and not in the functionality of the app]\n",
1546
  "\n",
1547
  "Deployment URL: https://huggingface.co/spaces/Bachstelze/github_sync"
1548
  ]
 
1552
  "id": "67013cc1",
1553
  "metadata": {},
1554
  "source": [
1555
+ "## 11. Environment & DevOps\n",
1556
  "\n",
1557
+ "**Virtual environment setup:**\n",
1558
  "```bash\n",
1559
  "python -m venv venv\n",
1560
  "source venv/bin/activate\n",
1561
  "pip install -r requirements.txt\n",
 
 
 
 
 
 
 
 
 
1562
  "\n",
1563
+ "```**CI/CD Pipeline:** GitHub Actions automatically syncs the repository to HuggingFace Spaces when pushed to main. The workflow file is located at `.github/workflows/push_to_hf_space.yml`.\n"
1564
  ]
1565
  },
1566
  {
 
1568
  "id": "7a142abd",
1569
  "metadata": {},
1570
  "source": [
1571
+ "## 12. Contributions \n",
1572
  "\n",
1573
  "| Member | GitHub Issue | Tasks |\n",
1574
  "|--------|--------------|-------|\n",
 
1583
  "id": "f62680e2",
1584
  "metadata": {},
1585
  "source": [
1586
+ "## 13. Iterations\n",
1587
  "\n",
1588
  "| # | Iteration | Approach | Key change |\n",
1589
  "|---|-----------|----------|------------|\n",
 
1592
  "| 3 | Baseline | Body Regions | Grouped classes (Upper/Lower) |\n",
1593
  "| 4 | Tuned | Body Regions | GridSearchCV (5-fold CV) |\n",
1594
  "\n",
1595
+ "Note: Polynomial interaction features were tested but not included in final iterations due to minimal improvement and increased complexity (820 features vs 41)."
 
 
 
 
1596
  ]
1597
  }
1598
  ],