Spaces:
Sleeping
Sleeping
Amol Kaushik commited on
Commit ·
0c4ffb9
1
Parent(s): d742823
final report
Browse files- A3/A3_Report.ipynb +41 -41
A3/A3_Report.ipynb
CHANGED
|
@@ -19,7 +19,7 @@
|
|
| 19 |
"source": [
|
| 20 |
"## 1. Problem statement\n",
|
| 21 |
"\n",
|
| 22 |
-
"The goal is to classify the weakest link in a squat movement. Given
|
| 23 |
]
|
| 24 |
},
|
| 25 |
{
|
|
@@ -47,7 +47,7 @@
|
|
| 47 |
},
|
| 48 |
{
|
| 49 |
"cell_type": "code",
|
| 50 |
-
"execution_count":
|
| 51 |
"id": "edbe3fbd",
|
| 52 |
"metadata": {},
|
| 53 |
"outputs": [],
|
|
@@ -77,7 +77,7 @@
|
|
| 77 |
},
|
| 78 |
{
|
| 79 |
"cell_type": "code",
|
| 80 |
-
"execution_count":
|
| 81 |
"id": "23f1b38b",
|
| 82 |
"metadata": {},
|
| 83 |
"outputs": [
|
|
@@ -128,7 +128,7 @@
|
|
| 128 |
},
|
| 129 |
{
|
| 130 |
"cell_type": "code",
|
| 131 |
-
"execution_count":
|
| 132 |
"id": "080ab472",
|
| 133 |
"metadata": {},
|
| 134 |
"outputs": [
|
|
@@ -200,7 +200,7 @@
|
|
| 200 |
},
|
| 201 |
{
|
| 202 |
"cell_type": "code",
|
| 203 |
-
"execution_count":
|
| 204 |
"id": "438e27ae",
|
| 205 |
"metadata": {},
|
| 206 |
"outputs": [
|
|
@@ -298,7 +298,7 @@
|
|
| 298 |
},
|
| 299 |
{
|
| 300 |
"cell_type": "code",
|
| 301 |
-
"execution_count":
|
| 302 |
"id": "7560ae66",
|
| 303 |
"metadata": {},
|
| 304 |
"outputs": [
|
|
@@ -335,7 +335,7 @@
|
|
| 335 |
},
|
| 336 |
{
|
| 337 |
"cell_type": "code",
|
| 338 |
-
"execution_count":
|
| 339 |
"id": "9f17a88e",
|
| 340 |
"metadata": {},
|
| 341 |
"outputs": [
|
|
@@ -381,7 +381,7 @@
|
|
| 381 |
},
|
| 382 |
{
|
| 383 |
"cell_type": "code",
|
| 384 |
-
"execution_count":
|
| 385 |
"id": "d4c02996",
|
| 386 |
"metadata": {},
|
| 387 |
"outputs": [],
|
|
@@ -404,7 +404,7 @@
|
|
| 404 |
},
|
| 405 |
{
|
| 406 |
"cell_type": "code",
|
| 407 |
-
"execution_count":
|
| 408 |
"id": "c8292b2b",
|
| 409 |
"metadata": {},
|
| 410 |
"outputs": [],
|
|
@@ -442,7 +442,7 @@
|
|
| 442 |
},
|
| 443 |
{
|
| 444 |
"cell_type": "code",
|
| 445 |
-
"execution_count":
|
| 446 |
"id": "b598aef7",
|
| 447 |
"metadata": {},
|
| 448 |
"outputs": [
|
|
@@ -475,7 +475,7 @@
|
|
| 475 |
},
|
| 476 |
{
|
| 477 |
"cell_type": "code",
|
| 478 |
-
"execution_count":
|
| 479 |
"id": "962743cc",
|
| 480 |
"metadata": {},
|
| 481 |
"outputs": [
|
|
@@ -603,7 +603,7 @@
|
|
| 603 |
},
|
| 604 |
{
|
| 605 |
"cell_type": "code",
|
| 606 |
-
"execution_count":
|
| 607 |
"id": "5c9efd5b",
|
| 608 |
"metadata": {},
|
| 609 |
"outputs": [
|
|
@@ -636,7 +636,7 @@
|
|
| 636 |
},
|
| 637 |
{
|
| 638 |
"cell_type": "code",
|
| 639 |
-
"execution_count":
|
| 640 |
"id": "ce01a75f",
|
| 641 |
"metadata": {},
|
| 642 |
"outputs": [
|
|
@@ -812,9 +812,19 @@
|
|
| 812 |
"display(tuning_df_14class)"
|
| 813 |
]
|
| 814 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 815 |
{
|
| 816 |
"cell_type": "code",
|
| 817 |
-
"execution_count":
|
| 818 |
"id": "3e5e5e9b",
|
| 819 |
"metadata": {},
|
| 820 |
"outputs": [
|
|
@@ -849,7 +859,7 @@
|
|
| 849 |
},
|
| 850 |
{
|
| 851 |
"cell_type": "code",
|
| 852 |
-
"execution_count":
|
| 853 |
"id": "4de69063",
|
| 854 |
"metadata": {},
|
| 855 |
"outputs": [
|
|
@@ -885,7 +895,7 @@
|
|
| 885 |
},
|
| 886 |
{
|
| 887 |
"cell_type": "code",
|
| 888 |
-
"execution_count":
|
| 889 |
"id": "a994b1af",
|
| 890 |
"metadata": {},
|
| 891 |
"outputs": [
|
|
@@ -1035,7 +1045,7 @@
|
|
| 1035 |
},
|
| 1036 |
{
|
| 1037 |
"cell_type": "code",
|
| 1038 |
-
"execution_count":
|
| 1039 |
"id": "00f3eda4",
|
| 1040 |
"metadata": {},
|
| 1041 |
"outputs": [
|
|
@@ -1067,7 +1077,7 @@
|
|
| 1067 |
},
|
| 1068 |
{
|
| 1069 |
"cell_type": "code",
|
| 1070 |
-
"execution_count":
|
| 1071 |
"id": "6b03902f",
|
| 1072 |
"metadata": {},
|
| 1073 |
"outputs": [
|
|
@@ -1249,7 +1259,7 @@
|
|
| 1249 |
"source": [
|
| 1250 |
"## 8. Why we did not use polynomial features\n",
|
| 1251 |
"\n",
|
| 1252 |
-
"We tested polynomial interaction features which created 820 new features from the original
|
| 1253 |
]
|
| 1254 |
},
|
| 1255 |
{
|
|
@@ -1262,7 +1272,7 @@
|
|
| 1262 |
},
|
| 1263 |
{
|
| 1264 |
"cell_type": "code",
|
| 1265 |
-
"execution_count":
|
| 1266 |
"id": "0b3e066a",
|
| 1267 |
"metadata": {},
|
| 1268 |
"outputs": [
|
|
@@ -1398,7 +1408,7 @@
|
|
| 1398 |
},
|
| 1399 |
{
|
| 1400 |
"cell_type": "code",
|
| 1401 |
-
"execution_count":
|
| 1402 |
"id": "d21c037d",
|
| 1403 |
"metadata": {},
|
| 1404 |
"outputs": [
|
|
@@ -1476,7 +1486,7 @@
|
|
| 1476 |
},
|
| 1477 |
{
|
| 1478 |
"cell_type": "code",
|
| 1479 |
-
"execution_count":
|
| 1480 |
"id": "4f01e27a",
|
| 1481 |
"metadata": {},
|
| 1482 |
"outputs": [
|
|
@@ -1530,7 +1540,9 @@
|
|
| 1530 |
"source": [
|
| 1531 |
"## 10. Deployment\n",
|
| 1532 |
"\n",
|
| 1533 |
-
"The classification endpoint is added to the existing Gradio app as a second tab. Tab 1 has Movement Scoring from A2. Tab 2 has Body Region Classification which takes
|
|
|
|
|
|
|
| 1534 |
"\n",
|
| 1535 |
"Deployment URL: https://huggingface.co/spaces/Bachstelze/github_sync"
|
| 1536 |
]
|
|
@@ -1540,23 +1552,15 @@
|
|
| 1540 |
"id": "67013cc1",
|
| 1541 |
"metadata": {},
|
| 1542 |
"source": [
|
| 1543 |
-
"## 11.
|
| 1544 |
"\n",
|
|
|
|
| 1545 |
"```bash\n",
|
| 1546 |
"python -m venv venv\n",
|
| 1547 |
"source venv/bin/activate\n",
|
| 1548 |
"pip install -r requirements.txt\n",
|
| 1549 |
-
"```"
|
| 1550 |
-
]
|
| 1551 |
-
},
|
| 1552 |
-
{
|
| 1553 |
-
"cell_type": "markdown",
|
| 1554 |
-
"id": "445419d9",
|
| 1555 |
-
"metadata": {},
|
| 1556 |
-
"source": [
|
| 1557 |
-
"## 12. DevOps/MLOps process\n",
|
| 1558 |
"\n",
|
| 1559 |
-
"GitHub Actions automatically syncs the repository to HuggingFace Spaces when pushed to main. The workflow file is
|
| 1560 |
]
|
| 1561 |
},
|
| 1562 |
{
|
|
@@ -1564,7 +1568,7 @@
|
|
| 1564 |
"id": "7a142abd",
|
| 1565 |
"metadata": {},
|
| 1566 |
"source": [
|
| 1567 |
-
"##
|
| 1568 |
"\n",
|
| 1569 |
"| Member | GitHub Issue | Tasks |\n",
|
| 1570 |
"|--------|--------------|-------|\n",
|
|
@@ -1579,7 +1583,7 @@
|
|
| 1579 |
"id": "f62680e2",
|
| 1580 |
"metadata": {},
|
| 1581 |
"source": [
|
| 1582 |
-
"##
|
| 1583 |
"\n",
|
| 1584 |
"| # | Iteration | Approach | Key change |\n",
|
| 1585 |
"|---|-----------|----------|------------|\n",
|
|
@@ -1588,11 +1592,7 @@
|
|
| 1588 |
"| 3 | Baseline | Body Regions | Grouped classes (Upper/Lower) |\n",
|
| 1589 |
"| 4 | Tuned | Body Regions | GridSearchCV (5-fold CV) |\n",
|
| 1590 |
"\n",
|
| 1591 |
-
"Note: Polynomial interaction features were tested but not included in final iterations due to minimal improvement and increased complexity (820 features vs
|
| 1592 |
-
"\n",
|
| 1593 |
-
"### Deployed Model\n",
|
| 1594 |
-
"\n",
|
| 1595 |
-
"The deployed model uses body region classification with KNN (k=7) and StandardScaler preprocessing. It takes 38 input features and achieves 82.8% F1-weighted and 84% accuracy on the test set."
|
| 1596 |
]
|
| 1597 |
}
|
| 1598 |
],
|
|
|
|
| 19 |
"source": [
|
| 20 |
"## 1. Problem statement\n",
|
| 21 |
"\n",
|
| 22 |
+
"The goal is to classify the weakest link in a squat movement. Given 41 movement features, the model predicts which body region is limiting the person's squat. The input is 41 movement features (AimoScore + 13 Angle deviations + 25 NASM deviations + 2 Time deviations) and the output is a body region classification which is upper and lower body. Due to massive class imbalance in the original 14 class problem with some classes with only 1-2 datapoints, we changed to a 2-class approach that gives predictions that actually helps."
|
| 23 |
]
|
| 24 |
},
|
| 25 |
{
|
|
|
|
| 47 |
},
|
| 48 |
{
|
| 49 |
"cell_type": "code",
|
| 50 |
+
"execution_count": 143,
|
| 51 |
"id": "edbe3fbd",
|
| 52 |
"metadata": {},
|
| 53 |
"outputs": [],
|
|
|
|
| 77 |
},
|
| 78 |
{
|
| 79 |
"cell_type": "code",
|
| 80 |
+
"execution_count": 144,
|
| 81 |
"id": "23f1b38b",
|
| 82 |
"metadata": {},
|
| 83 |
"outputs": [
|
|
|
|
| 128 |
},
|
| 129 |
{
|
| 130 |
"cell_type": "code",
|
| 131 |
+
"execution_count": 145,
|
| 132 |
"id": "080ab472",
|
| 133 |
"metadata": {},
|
| 134 |
"outputs": [
|
|
|
|
| 200 |
},
|
| 201 |
{
|
| 202 |
"cell_type": "code",
|
| 203 |
+
"execution_count": 146,
|
| 204 |
"id": "438e27ae",
|
| 205 |
"metadata": {},
|
| 206 |
"outputs": [
|
|
|
|
| 298 |
},
|
| 299 |
{
|
| 300 |
"cell_type": "code",
|
| 301 |
+
"execution_count": 147,
|
| 302 |
"id": "7560ae66",
|
| 303 |
"metadata": {},
|
| 304 |
"outputs": [
|
|
|
|
| 335 |
},
|
| 336 |
{
|
| 337 |
"cell_type": "code",
|
| 338 |
+
"execution_count": 148,
|
| 339 |
"id": "9f17a88e",
|
| 340 |
"metadata": {},
|
| 341 |
"outputs": [
|
|
|
|
| 381 |
},
|
| 382 |
{
|
| 383 |
"cell_type": "code",
|
| 384 |
+
"execution_count": 149,
|
| 385 |
"id": "d4c02996",
|
| 386 |
"metadata": {},
|
| 387 |
"outputs": [],
|
|
|
|
| 404 |
},
|
| 405 |
{
|
| 406 |
"cell_type": "code",
|
| 407 |
+
"execution_count": 150,
|
| 408 |
"id": "c8292b2b",
|
| 409 |
"metadata": {},
|
| 410 |
"outputs": [],
|
|
|
|
| 442 |
},
|
| 443 |
{
|
| 444 |
"cell_type": "code",
|
| 445 |
+
"execution_count": 151,
|
| 446 |
"id": "b598aef7",
|
| 447 |
"metadata": {},
|
| 448 |
"outputs": [
|
|
|
|
| 475 |
},
|
| 476 |
{
|
| 477 |
"cell_type": "code",
|
| 478 |
+
"execution_count": 152,
|
| 479 |
"id": "962743cc",
|
| 480 |
"metadata": {},
|
| 481 |
"outputs": [
|
|
|
|
| 603 |
},
|
| 604 |
{
|
| 605 |
"cell_type": "code",
|
| 606 |
+
"execution_count": 153,
|
| 607 |
"id": "5c9efd5b",
|
| 608 |
"metadata": {},
|
| 609 |
"outputs": [
|
|
|
|
| 636 |
},
|
| 637 |
{
|
| 638 |
"cell_type": "code",
|
| 639 |
+
"execution_count": 154,
|
| 640 |
"id": "ce01a75f",
|
| 641 |
"metadata": {},
|
| 642 |
"outputs": [
|
|
|
|
| 812 |
"display(tuning_df_14class)"
|
| 813 |
]
|
| 814 |
},
|
| 815 |
+
{
|
| 816 |
+
"cell_type": "markdown",
|
| 817 |
+
"id": "ba67bf51",
|
| 818 |
+
"metadata": {},
|
| 819 |
+
"source": [
|
| 820 |
+
"## Approach 2: Body Region Classification\n",
|
| 821 |
+
"\n",
|
| 822 |
+
"Due to the severe class imbalance in the 14-class problem, we explored grouping classes into 2 body regions (Upper Body / Lower Body) for more meaningful predictions."
|
| 823 |
+
]
|
| 824 |
+
},
|
| 825 |
{
|
| 826 |
"cell_type": "code",
|
| 827 |
+
"execution_count": 155,
|
| 828 |
"id": "3e5e5e9b",
|
| 829 |
"metadata": {},
|
| 830 |
"outputs": [
|
|
|
|
| 859 |
},
|
| 860 |
{
|
| 861 |
"cell_type": "code",
|
| 862 |
+
"execution_count": 156,
|
| 863 |
"id": "4de69063",
|
| 864 |
"metadata": {},
|
| 865 |
"outputs": [
|
|
|
|
| 895 |
},
|
| 896 |
{
|
| 897 |
"cell_type": "code",
|
| 898 |
+
"execution_count": 157,
|
| 899 |
"id": "a994b1af",
|
| 900 |
"metadata": {},
|
| 901 |
"outputs": [
|
|
|
|
| 1045 |
},
|
| 1046 |
{
|
| 1047 |
"cell_type": "code",
|
| 1048 |
+
"execution_count": 158,
|
| 1049 |
"id": "00f3eda4",
|
| 1050 |
"metadata": {},
|
| 1051 |
"outputs": [
|
|
|
|
| 1077 |
},
|
| 1078 |
{
|
| 1079 |
"cell_type": "code",
|
| 1080 |
+
"execution_count": 159,
|
| 1081 |
"id": "6b03902f",
|
| 1082 |
"metadata": {},
|
| 1083 |
"outputs": [
|
|
|
|
| 1259 |
"source": [
|
| 1260 |
"## 8. Why we did not use polynomial features\n",
|
| 1261 |
"\n",
|
| 1262 |
+
"We tested polynomial interaction features which created 820 new features from the original 41. However, this approach was not used in the final model because the F1-score improvement was negligible, 820 features vs 41 original features makes it hard to interpret the model. Many more parameters to learn from the same amount of data, so the tuned body region model without polynomial features provides a good balance of accuracy and simplicity."
|
| 1263 |
]
|
| 1264 |
},
|
| 1265 |
{
|
|
|
|
| 1272 |
},
|
| 1273 |
{
|
| 1274 |
"cell_type": "code",
|
| 1275 |
+
"execution_count": 160,
|
| 1276 |
"id": "0b3e066a",
|
| 1277 |
"metadata": {},
|
| 1278 |
"outputs": [
|
|
|
|
| 1408 |
},
|
| 1409 |
{
|
| 1410 |
"cell_type": "code",
|
| 1411 |
+
"execution_count": 161,
|
| 1412 |
"id": "d21c037d",
|
| 1413 |
"metadata": {},
|
| 1414 |
"outputs": [
|
|
|
|
| 1486 |
},
|
| 1487 |
{
|
| 1488 |
"cell_type": "code",
|
| 1489 |
+
"execution_count": 162,
|
| 1490 |
"id": "4f01e27a",
|
| 1491 |
"metadata": {},
|
| 1492 |
"outputs": [
|
|
|
|
| 1540 |
"source": [
|
| 1541 |
"## 10. Deployment\n",
|
| 1542 |
"\n",
|
| 1543 |
+
"The classification endpoint is added to the existing Gradio app as a second tab. Tab 1 has Movement Scoring from A2. Tab 2 has Body Region Classification which takes 41 features as input and outputs the predicted body region (Upper Body or Lower Body). The deployed model is KNN (k=7) with StandardScaler preprocessing, achieving 82.8% F1-weighted score and 84% accuracy on the test set.\n",
|
| 1544 |
+
"\n",
|
| 1545 |
+
"[Due to last minute issues, the app currently downloads the pickle model from google drive, but we are working to automate this. The overall functionality has not suffered, the issue exists with some policy restrictions on huggingface, and not in the functionality of the app]\n",
|
| 1546 |
"\n",
|
| 1547 |
"Deployment URL: https://huggingface.co/spaces/Bachstelze/github_sync"
|
| 1548 |
]
|
|
|
|
| 1552 |
"id": "67013cc1",
|
| 1553 |
"metadata": {},
|
| 1554 |
"source": [
|
| 1555 |
+
"## 11. Environment & DevOps\n",
|
| 1556 |
"\n",
|
| 1557 |
+
"**Virtual environment setup:**\n",
|
| 1558 |
"```bash\n",
|
| 1559 |
"python -m venv venv\n",
|
| 1560 |
"source venv/bin/activate\n",
|
| 1561 |
"pip install -r requirements.txt\n",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1562 |
"\n",
|
| 1563 |
+
"```**CI/CD Pipeline:** GitHub Actions automatically syncs the repository to HuggingFace Spaces when pushed to main. The workflow file is located at `.github/workflows/push_to_hf_space.yml`.\n"
|
| 1564 |
]
|
| 1565 |
},
|
| 1566 |
{
|
|
|
|
| 1568 |
"id": "7a142abd",
|
| 1569 |
"metadata": {},
|
| 1570 |
"source": [
|
| 1571 |
+
"## 12. Contributions \n",
|
| 1572 |
"\n",
|
| 1573 |
"| Member | GitHub Issue | Tasks |\n",
|
| 1574 |
"|--------|--------------|-------|\n",
|
|
|
|
| 1583 |
"id": "f62680e2",
|
| 1584 |
"metadata": {},
|
| 1585 |
"source": [
|
| 1586 |
+
"## 13. Iterations\n",
|
| 1587 |
"\n",
|
| 1588 |
"| # | Iteration | Approach | Key change |\n",
|
| 1589 |
"|---|-----------|----------|------------|\n",
|
|
|
|
| 1592 |
"| 3 | Baseline | Body Regions | Grouped classes (Upper/Lower) |\n",
|
| 1593 |
"| 4 | Tuned | Body Regions | GridSearchCV (5-fold CV) |\n",
|
| 1594 |
"\n",
|
| 1595 |
+
"Note: Polynomial interaction features were tested but not included in final iterations due to minimal improvement and increased complexity (820 features vs 41)."
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1596 |
]
|
| 1597 |
}
|
| 1598 |
],
|