Spaces:

Bachstelze
/

github_sync

Sleeping

App Files Files Community

Reem commited on 27 days ago

Commit

91ea454

1 Parent(s): 7b41208

a4-report

Browse files

Files changed (1) hide show

A4/report.ipynb +286 -0

A4/report.ipynb ADDED Viewed

	@@ -0,0 +1,286 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# A4 Report — DevOps, CI/CD, and Quality Assurance\n",
+    "\n",
+    "This notebook documents the DevOps and quality assurance improvements implemented in the project, including:\n",
+    "\n",
+    "- CI/CD pipeline development\n",
+    "- Automated linting and notebook quality checks\n",
+    "- Unit testing integration\n",
+    "- Deployment safeguards for HuggingFace\n",
+    "- Adoption of Git LFS for model storage\n",
+    "- Team development and coding practices\n",
+    "\n",
+    "The goal is to improve reliability, reproducibility, and deployment stability of the machine learning system.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Project Context\n",
+    "\n",
+    "The application is deployed via HuggingFace Spaces using Python and Gradio.\n",
+    "\n",
+    "Key challenges before improvements:\n",
+    "\n",
+    "- No CI/CD quality gates\n",
+    "- Direct pushes to main branch\n",
+    "- Deployment failures caused by incompatible files\n",
+    "- Models stored externally (Google Drive), causing version inconsistencies\n",
+    "- Lack of automated testing\n",
+    "- Notebook-heavy workflow without linting support\n",
+    "\n",
+    "The improvements documented here address these issues.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## CI/CD Pipeline Implementation\n",
+    "\n",
+    "The GitHub Actions pipeline was extended to introduce quality assurance barriers before deployment.\n",
+    "\n",
+    "### Previous pipeline\n",
+    "- Only synchronized repository with HuggingFace\n",
+    "- No linting\n",
+    "- No testing\n",
+    "- No deployment safety checks\n",
+    "\n",
+    "### Updated pipeline flow\n",
+    "\n",
+    "1. Repository checkout (with Git LFS enabled)\n",
+    "2. Python environment setup\n",
+    "3. Dependency installation\n",
+    "4. Linting for Python scripts\n",
+    "5. Notebook linting using nbQA\n",
+    "6. File restriction checks\n",
+    "7. Unit test execution\n",
+    "8. Deployment to HuggingFace\n",
+    "\n",
+    "Deployment only occurs if all quality checks pass.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## CI/CD Workflow Design\n",
+    "\n",
+    "The GitHub Actions workflow enforces code quality and deployment stability.\n",
+    "\n",
+    "Key components:\n",
+    "\n",
+    "### Linting\n",
+    "- flake8 for Python scripts\n",
+    "- nbQA + flake8 for Jupyter notebooks\n",
+    "\n",
+    "### Deployment safeguards\n",
+    "- CI fails if .pdf or .xlsx files are committed\n",
+    "- Prevents HuggingFace sync crashes\n",
+    "\n",
+    "### Unit testing\n",
+    "- pytest integrated into CI\n",
+    "- Tests run before deployment\n",
+    "\n",
+    "### Git LFS support\n",
+    "- Models tracked using Git LFS\n",
+    "- Ensures version-controlled model artifacts\n",
+    "\n",
+    "This transforms the pipeline into a quality-gated deployment system.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Notebook Linting with nbQA\n",
+    "\n",
+    "The project relies heavily on Jupyter notebooks for:\n",
+    "\n",
+    "- Model experimentation\n",
+    "- Evaluation\n",
+    "- Feature engineering\n",
+    "\n",
+    "Traditional linters do not support .ipynb files.\n",
+    "\n",
+    "nbQA enables:\n",
+    "\n",
+    "- Running flake8 on notebooks\n",
+    "- Detecting unused imports\n",
+    "- Detecting syntax errors\n",
+    "- Improving notebook readability\n",
+    "\n",
+    "This ensures notebooks meet the same quality standards as Python scripts.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Unit Testing Integration\n",
+    "\n",
+    "Unit testing was introduced using pytest.\n",
+    "\n",
+    "The CI pipeline executes:\n",
+    "\n",
+    "pytest A4/ -v --tb=short\n",
+    "\n",
+    "Purpose:\n",
+    "\n",
+    "- Validate model behavior\n",
+    "- Prevent regression errors\n",
+    "- Verify preprocessing and prediction logic\n",
+    "- Support reproducibility\n",
+    "\n",
+    "One example includes test_model.py, which evaluates model predictions and generates diagnostic plots.\n",
+    "\n",
+    "Testing will expand as more components stabilize.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Model Versioning with Git LFS\n",
+    "\n",
+    "Originally, models were stored on Google Drive, leading to:\n",
+    "\n",
+    "- Version inconsistencies\n",
+    "- Difficulty reproducing results\n",
+    "- Deployment mismatches\n",
+    "\n",
+    "Git LFS was introduced to store models directly in the repository.\n",
+    "\n",
+    "Benefits:\n",
+    "\n",
+    "- Version-controlled model artifacts\n",
+    "- Consistent deployment models\n",
+    "- Easier collaboration\n",
+    "- Improved reproducibility\n",
+    "\n",
+    "CI uses:\n",
+    "checkout with lfs: true\n",
+    "\n",
+    "This ensures models are downloaded correctly during pipeline execution.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Deployment Stability Improvements\n",
+    "\n",
+    "The pipeline now prevents common failure scenarios.\n",
+    "\n",
+    "### Restricted files\n",
+    "CI blocks:\n",
+    "- .pdf\n",
+    "- .xlsx\n",
+    "\n",
+    "These previously caused HuggingFace sync crashes.\n",
+    "\n",
+    "### Dependency consistency\n",
+    "- scikit-learn version pinned\n",
+    "- Prevents InconsistentVersionWarning\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## DevOps and QA Process Improvements\n",
+    "\n",
+    "The project transitioned from ad-hoc development to structured DevOps practices.\n",
+    "\n",
+    "Improvements include:\n",
+    "\n",
+    "- Automated linting\n",
+    "- Notebook quality enforcement\n",
+    "- Unit testing integration\n",
+    "- Deployment safeguards\n",
+    "- Git LFS model management\n",
+    "- CI quality gates before deployment\n",
+    "\n",
+    "These changes improve:\n",
+    "\n",
+    "- reliability\n",
+    "- collaboration\n",
+    "- reproducibility\n",
+    "- deployment stability\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Design and Coding Rules\n",
+    "\n",
+    "The team defined shared development practices.\n",
+    "\n",
+    "### Code structure\n",
+    "- Modular Python scripts\n",
+    "- Separation of experimentation and production logic\n",
+    "\n",
+    "### Notebook standards\n",
+    "- Executable cells\n",
+    "- Clear documentation\n",
+    "- Reduced unused code\n",
+    "\n",
+    "### Deployment awareness\n",
+    "- Avoid large or incompatible files\n",
+    "- Maintain compatibility with HuggingFace environment\n",
+    "\n",
+    "### Quality enforcement\n",
+    "- CI linting\n",
+    "- Automated tests\n",
+    "- Dependency control\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Future Work\n",
+    "\n",
+    "Planned DevOps enhancements:\n",
+    "\n",
+    "- Full PR-based workflow\n",
+    "- Automated model evaluation metrics in CI\n",
+    "- Continuous training pipelines\n",
+    "- Model version tracking dashboards\n",
+    "- Automated notebook formatting\n",
+    "\n",
+    "The current pipeline provides the foundation for these improvements.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}