{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A4 Report — DevOps, CI/CD, and Quality Assurance\n",
    "\n",
    "This notebook documents the DevOps and quality assurance improvements implemented in the project, including:\n",
    "\n",
    "- CI/CD pipeline development\n",
    "- Automated linting and notebook quality checks\n",
    "- Unit testing integration\n",
    "- Deployment safeguards for HuggingFace\n",
    "- Adoption of Git LFS for model storage\n",
    "- Team development and coding practices\n",
    "\n",
    "The goal is to improve reliability, reproducibility, and deployment stability of the machine learning system.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Project Context\n",
    "\n",
    "The application is deployed via HuggingFace Spaces using Python and Gradio.\n",
    "\n",
    "Key challenges before improvements:\n",
    "\n",
    "- No CI/CD quality gates\n",
    "- Direct pushes to main branch\n",
    "- Deployment failures caused by incompatible files\n",
    "- Models stored externally (Google Drive), causing version inconsistencies\n",
    "- Lack of automated testing\n",
    "- Notebook-heavy workflow without linting support\n",
    "\n",
    "The improvements documented here address these issues.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CI/CD Pipeline Implementation\n",
    "\n",
    "The GitHub Actions pipeline was extended to introduce quality assurance barriers before deployment.\n",
    "\n",
    "### Previous pipeline\n",
    "- Only synchronized repository with HuggingFace\n",
    "- No linting\n",
    "- No testing\n",
    "- No deployment safety checks\n",
    "\n",
    "### Updated pipeline flow\n",
    "\n",
    "1. Repository checkout (with Git LFS enabled)\n",
    "2. Python environment setup\n",
    "3. Dependency installation\n",
    "4. Linting for Python scripts\n",
    "5. Notebook linting using nbQA\n",
    "6. File restriction checks\n",
    "7. Unit test execution\n",
    "8. Deployment to HuggingFace\n",
    "\n",
    "Deployment only occurs if all quality checks pass.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CI/CD Workflow Design\n",
    "\n",
    "The GitHub Actions workflow enforces code quality and deployment stability.\n",
    "\n",
    "Key components:\n",
    "\n",
    "### Linting\n",
    "- flake8 for Python scripts\n",
    "- nbQA + flake8 for Jupyter notebooks\n",
    "\n",
    "### Deployment safeguards\n",
    "- CI fails if .pdf or .xlsx files are committed\n",
    "- Prevents HuggingFace sync crashes\n",
    "\n",
    "### Unit testing\n",
    "- pytest integrated into CI\n",
    "- Tests run before deployment\n",
    "\n",
    "\n",
    "The implemented tests validate the full ML pipeline, including:\n",
    "- Regression model loading\n",
    "- Regression prediction functionality\n",
    "- Classification model loading\n",
    "- Classification prediction functionality\n",
    "- Model artifact structure validation\n",
    "- Error handling for incorrect inputs and failures\n",
    "\n",
    "\n",
    "### Git LFS support\n",
    "- Models tracked using Git LFS\n",
    "- Ensures version-controlled model artifacts\n",
    "\n",
    "This transforms the pipeline into a quality-gated deployment system.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook Linting with nbQA\n",
    "\n",
    "The project relies heavily on Jupyter notebooks for:\n",
    "\n",
    "- Model experimentation\n",
    "- Evaluation\n",
    "- Feature engineering\n",
    "\n",
    "Traditional linters do not support .ipynb files.\n",
    "\n",
    "nbQA enables:\n",
    "\n",
    "- Running flake8 on notebooks\n",
    "- Detecting unused imports\n",
    "- Detecting syntax errors\n",
    "- Improving notebook readability\n",
    "\n",
    "This ensures notebooks meet the same quality standards as Python scripts.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Unit Testing Integration\n",
    "\n",
    "Unit testing was introduced using pytest.\n",
    "\n",
    "The CI pipeline executes:\n",
    "\n",
    "pytest A4/ -v --tb=short\n",
    "\n",
    "Purpose:\n",
    "\n",
    "- Validate model behavior\n",
    "- Prevent regression errors\n",
    "- Verify preprocessing and prediction logic\n",
    "- Support reproducibility\n",
    "\n",
    "One example includes test_model.py, which evaluates model predictions and generates diagnostic plots.\n",
    "\n",
    "Testing will expand as more components stabilize.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Versioning with Git LFS\n",
    "\n",
    "Originally, models were stored on Google Drive, leading to:\n",
    "\n",
    "- Version inconsistencies\n",
    "- Difficulty reproducing results\n",
    "- Deployment mismatches\n",
    "\n",
    "Git LFS was introduced to store models directly in the repository.\n",
    "\n",
    "Benefits:\n",
    "\n",
    "- Version-controlled model artifacts\n",
    "- Consistent deployment models\n",
    "- Easier collaboration\n",
    "- Improved reproducibility\n",
    "\n",
    "CI uses:\n",
    "checkout with lfs: true\n",
    "\n",
    "This ensures models are downloaded correctly during pipeline execution.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deployment Stability Improvements\n",
    "\n",
    "The pipeline now prevents common failure scenarios.\n",
    "\n",
    "### Restricted files\n",
    "CI blocks:\n",
    "- .pdf\n",
    "- .xlsx\n",
    "\n",
    "These previously caused HuggingFace sync crashes.\n",
    "\n",
    "### Dependency consistency\n",
    "- scikit-learn version pinned\n",
    "- Prevents InconsistentVersionWarning\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DevOps and QA Process Improvements\n",
    "\n",
    "The project transitioned from ad-hoc development to structured DevOps practices.\n",
    "\n",
    "Improvements include:\n",
    "\n",
    "- Automated linting\n",
    "- Notebook quality enforcement\n",
    "- Unit testing integration\n",
    "- Deployment safeguards\n",
    "- Git LFS model management\n",
    "- CI quality gates before deployment\n",
    "\n",
    "These changes improve:\n",
    "\n",
    "- reliability\n",
    "- collaboration\n",
    "- reproducibility\n",
    "- deployment stability\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Design and Coding Rules\n",
    "\n",
    "The team defined shared development practices.\n",
    "\n",
    "### Code structure\n",
    "- Modular Python scripts\n",
    "- Separation of experimentation and production logic\n",
    "\n",
    "### Notebook standards\n",
    "- Executable cells\n",
    "- Clear documentation\n",
    "- Reduced unused code\n",
    "\n",
    "### Deployment awareness\n",
    "- Avoid large or incompatible files\n",
    "- Maintain compatibility with HuggingFace environment\n",
    "\n",
    "### Quality enforcement\n",
    "- CI linting\n",
    "- Automated tests\n",
    "- Dependency control\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Future Work\n",
    "\n",
    "Planned DevOps enhancements:\n",
    "\n",
    "- Full PR-based workflow\n",
    "- Automated model evaluation metrics in CI\n",
    "- Continuous training pipelines\n",
    "- Model version tracking dashboards\n",
    "- Automated notebook formatting\n",
    "\n",
    "The current pipeline provides the foundation for these improvements.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A4 – Classification Task\n",
    "\n",
    "Two datasets were merged into a single dataset containing 41 features (including movement angles and weak-link indicators). For each data point, the weakest link was identified by selecting the column with the maximum score.\n",
    "\n",
    "Initially, a 14-class classifier was used. An alternative approach was then explored by separating features into upper-body and lower-body regions and further trained another model on upper and lower body features, following lab guidance and feedback. Models were trained separately for body regions and then combined to evaluate performance improvements.\n",
    "\n",
    "- 5-fold cross-validation was applied  \n",
    "- Weighted averages were used due to class imbalance  \n",
    "\n",
    "Body-region classification models tested:\n",
    "- Logistic Regression  \n",
    "- LDA  \n",
    "- QDA  \n",
    "- Naive Bayes  \n",
    "- KNN (k = 5, 7, 10)\n",
    "(Champion model -> best performer with knn = 7)\n",
    "\n",
    "For the 14-class weak-link classification:\n",
    "- Logistic Regression  \n",
    "- LDA  \n",
    "- Naive Bayes  \n",
    "- KNN (k = 5, 7, 10)\n",
    "(Champion model -> LDA performed best (F1 = 0.57))\n",
    "\n",
    "Following feedback, a two-step approach was tested:\n",
    "1. Predict body region using KNN  \n",
    "2. Apply LDA for upper/lower classification  \n",
    "\n",
    "This did not improve performance (F1 ≈ 0.54).\n",
    "\n",
    "Conducting a statistical t-test among the two-step approach and the previous model gave t-statistic = -0.661 and p-value of 0.5447 which indicates the result is not statistically significant. It can be concluded that pipeline is not better than single LDA. \n",
    "\n",
    "Applying Random Forest Classifier improved results:\n",
    "\n",
    "- Baseline (LDA): F1 = 0.57  \n",
    "- After feedback adjustments: F1 = 0.54  \n",
    "- Random Forest: F1 = 0.61 (best performance)\n",
    "\n",
    "The A4_Classification notebook extends A3 with these improvements.\n",
    "\n",
    "---\n",
    "\n",
    "## A4 – Regression Task\n",
    "\n",
    "The regression setup remains consistent with A2, with Random Forest Regressor introduced to improve performance. \n",
    "\n",
    "- Baseline model R²: 0.54  \n",
    "- Random Forest R²: 0.65  \n",
    "\n",
    "This represents a direct improvement over the earlier regression pipeline.\n",
    "\n",
    "Conducting a statistical t-test among Random forest and the previous champion model (feature_selection_lasso) gave t-statistic = 11.76 and p-value of 0.00029 which indicates the result is statistically significant (p < 0.05). It can be concluded that random forest is reliably better than feature_selection_lasso. Even looking at the 95% confidence interval score [0.0935, 0.1435] we can say with 95% confidence that the improvement is positive. \n",
    "\n",
    "The A4_Regression notebook is an enhanced version of A2_ModelBuilding.ipynb, while A4_Classification extends A3 based on feedback and model experimentation.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}