| # Milestone Summaries | |
| This document provides a comprehensive overview of all six project milestones, documenting the evolution of the Hopcroft Skill Classification system from requirements engineering through production monitoring. | |
| --- | |
| ## Milestone 1: Requirements Engineering | |
| **Objective:** Define the problem space, stakeholders, and success criteria using the Machine Learning Canvas framework. | |
| ### Key Deliverables | |
| | Component | Description | | |
| |-----------|-------------| | |
| | **Prediction Task** | Multi-label classification of 217 technical skills from GitHub issue/PR text | | |
| | **Stakeholders** | Project managers, team leads, developers | | |
| | **Data Source** | SkillScope DB with 7,245 merged PRs from 11 Java repositories | | |
| | **Success Metrics** | Micro-F1 score improvement over baseline, precision/recall balance | | |
| ### ML Canvas Framework | |
| The complete ML Canvas is documented in [ML Canvas.md](./ML%20Canvas.md), covering: | |
| - **Value Proposition**: Automated task assignment optimization | |
| - **Decisions**: Resource allocation for issue resolution | |
| - **Data Collection**: Automated labeling via API call detection | |
| - **Impact Simulation**: Outperform SkillScope RF + TF-IDF baseline | |
| - **Monitoring**: Continuous evaluation with drift detection | |
| ### Identified Risks & Mitigations | |
| | Risk | Mitigation Strategy | | |
| |------|---------------------| | |
| | Label imbalance (217 classes) | SMOTE, MLSMOTE, ADASYN oversampling | | |
| | Text noise (URLs, HTML, code) | Custom preprocessing pipeline | | |
| | Multi-label complexity | MultiOutputClassifier with stratified splits | | |
| --- | |
| ## Milestone 2: Data Management & Experiment Tracking | |
| **Objective:** Establish end-to-end infrastructure for reproducible ML experiments. | |
| ### Data Pipeline | |
| ``` | |
| data/raw/ β dataset.py β data/processed/ | |
| (SkillScope SQLite) (HuggingFace) (Clean CSV) | |
| β | |
| features.py | |
| β | |
| data/processed/ | |
| (TF-IDF/Embeddings) | |
| ``` | |
| ### Key Components | |
| 1. **Data Management** | |
| - DVC setup with DagsHub remote storage | |
| - Git-ignored data and model directories | |
| - Version-controlled `.dvc` files for reproducibility | |
| 2. **Data Ingestion** | |
| - `dataset.py`: Downloads SkillScope from Hugging Face | |
| - Extracts SQLite database with cleanup | |
| 3. **Feature Engineering** | |
| - `features.py`: Text cleaning pipeline | |
| - URL/HTML/Markdown removal | |
| - Normalization and Porter stemming | |
| - TF-IDF vectorization (uni+bi-grams) | |
| - Sentence embedding generation | |
| 4. **Configuration** | |
| - `config.py`: Centralized paths, hyperparameters, MLflow URI | |
| 5. **Experiment Tracking** | |
| - MLflow with DagsHub remote | |
| - Logged metrics: precision, recall, F1-score | |
| - Artifact storage: models, vectorizers, scalers | |
| ### Training Actions | |
| | Action | Description | | |
| |--------|-------------| | |
| | `baseline` | Random Forest with TF-IDF | | |
| | `mlsmote` | Multi-label SMOTE oversampling | | |
| | `ros` | Random Oversampling | | |
| | `adasyn-pca` | ADASYN + PCA dimensionality reduction | | |
| | `lightgbm` | LightGBM classifier | | |
| --- | |
| ## Milestone 3: Quality Assurance | |
| **Objective:** Implement comprehensive testing and validation framework for data quality and model robustness. | |
| ### Data Cleaning Pipeline | |
| | Metric | Before | After | Resolution | | |
| |--------|--------|-------|------------| | |
| | Total Samples | 7,154 | 6,673 | -481 duplicates | | |
| | Duplicates | 481 | 0 | Exact match removal | | |
| | Label Conflicts | 640 | 0 | Majority voting | | |
| | Data Leakage | Present | 0 | Train/test separation | | |
| ### Validation Frameworks | |
| #### Great Expectations (10 Tests) | |
| | Test | Purpose | Status | | |
| |------|---------|--------| | |
| | Database Schema | Validate SQLite structure | β Pass | | |
| | TF-IDF Matrix | No NaN/Inf, sparsity checks | β Pass | | |
| | Binary Labels | Values in {0,1} | β Pass | | |
| | Feature-Label Alignment | Row count consistency | β Pass | | |
| | Label Distribution | Min 5 occurrences per label | β Pass | | |
| | SMOTE Compatibility | Min 10 non-zero features | β Pass | | |
| | Multi-Output Format | >50% multi-label samples | β Pass | | |
| | Duplicate Detection | No duplicate features | β Pass | | |
| | Train-Test Separation | Zero intersection | β Pass | | |
| | Label Consistency | Same features β same labels | β Pass | | |
| #### Deepchecks (24 Checks) | |
| - **Data Integrity Suite**: 92% score (12 checks) | |
| - **Train-Test Validation Suite**: 100% score (12 checks) | |
| - **Overall Status**: Production-ready (96% combined) | |
| #### Behavioral Testing (36 Tests) | |
| | Category | Tests | Description | | |
| |----------|-------|-------------| | |
| | Invariance | 9 | Typo, case, punctuation robustness | | |
| | Directional | 10 | Keyword addition effects | | |
| | Minimum Functionality | 17 | Basic skill predictions | | |
| ### Code Quality | |
| - **Ruff Analysis**: 28 minor issues (100% fixable) | |
| - **Standards**: PEP 8 compliant, Black compatible | |
| Full details: [testing_and_validation.md](./testing_and_validation.md) | |
| --- | |
| ## Milestone 4: API Development | |
| **Objective:** Implement production-ready REST API for skill prediction with MLflow integration. | |
| ### Endpoints | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `POST` | `/predict` | Single issue prediction | | |
| | `POST` | `/predict/batch` | Batch predictions (max 100) | | |
| | `GET` | `/predictions/{run_id}` | Retrieve by MLflow Run ID | | |
| | `GET` | `/predictions` | List recent predictions | | |
| | `GET` | `/health` | Service health check | | |
| | `GET` | `/metrics` | Prometheus metrics | | |
| ### Features | |
| - **FastAPI Framework**: Async request handling, auto-generated OpenAPI docs | |
| - **MLflow Integration**: All predictions logged with metadata | |
| - **Pydantic Validation**: Request/response schema enforcement | |
| - **Prometheus Metrics**: Request counters, latency histograms, gauges | |
| ### Documentation Access | |
| - Swagger UI: `/docs` | |
| - ReDoc: `/redoc` | |
| - OpenAPI JSON: `/openapi.json` | |
| --- | |
| ## Milestone 5: Deployment & Containerization | |
| **Objective:** Implement containerized deployment with CI/CD pipeline for production delivery. | |
| ### Docker Architecture | |
| ``` | |
| docker/docker-compose.yml | |
| βββ hopcroft-api (FastAPI Backend) | |
| β βββ Port: 8080 | |
| β βββ Health Check: /health | |
| β βββ Volumes: source code, logs | |
| β | |
| βββ hopcroft-gui (Streamlit Frontend) | |
| β βββ Port: 8501 | |
| β βββ Depends on: hopcroft-api | |
| β | |
| βββ hopcroft-net (Bridge Network) | |
| ``` | |
| ### Hugging Face Spaces Deployment | |
| | Component | Configuration | | |
| |-----------|---------------| | |
| | SDK | Docker | | |
| | Port | 7860 | | |
| | Startup Script | `docker/scripts/start_space.sh` | | |
| | Secrets | `DAGSHUB_USERNAME`, `DAGSHUB_TOKEN` | | |
| **Startup Flow:** | |
| 1. Configure DVC with secrets | |
| 2. Pull models from DagsHub | |
| 3. Start FastAPI (port 8000) | |
| 4. Start Streamlit (port 8501) | |
| 5. Start Nginx reverse proxy (port 7860) | |
| ### CI/CD Pipeline (GitHub Actions) | |
| ```yaml | |
| Triggers: push/PR to main, feature/* | |
| Jobs: | |
| 1. unit-tests | |
| - Ruff linting | |
| - Pytest unit tests | |
| - HTML report generation | |
| 2. build-image (requires unit-tests) | |
| - DVC model pull | |
| - Docker image build | |
| ``` | |
| --- | |
| ## Milestone 6: Monitoring & Observability | |
| **Objective:** Implement comprehensive monitoring infrastructure with drift detection. | |
| ### Prometheus Metrics | |
| | Metric | Type | Description | | |
| |--------|------|-------------| | |
| | `hopcroft_requests_total` | Counter | Total requests by method/endpoint | | |
| | `hopcroft_request_duration_seconds` | Histogram | Request latency distribution | | |
| | `hopcroft_in_progress_requests` | Gauge | Currently processing requests | | |
| | `hopcroft_prediction_processing_seconds` | Summary | Model inference time | | |
| ### Grafana Dashboards | |
| - **Request Rate**: Real-time requests per second | |
| - **Request Latency (p50, p95)**: Response time percentiles | |
| - **In-Progress Requests**: Currently processing requests | |
| - **Error Rate (5xx)**: Failed request percentage | |
| - **Model Prediction Time**: Inference latency | |
| - **Requests by Endpoint**: Traffic distribution | |
| ### Data Drift Detection | |
| | Component | Details | | |
| |-----------|---------| | |
| | Algorithm | Kolmogorov-Smirnov Two-Sample Test | | |
| | Baseline | 1000 samples from training data | | |
| | Threshold | p-value < 0.05 (Bonferroni corrected) | | |
| | Metrics | `drift_detected`, `drift_p_value`, `drift_distance` | | |
| ### Alerting Rules | |
| | Alert | Condition | | |
| |-------|-----------| | |
| | `ServiceDown` | Target unreachable for 5m | | |
| | `HighErrorRate` | 5xx rate > 10% for 5m | | |
| | `SlowRequests` | P95 latency > 2s | | |
| ### Load Testing (Locust) | |
| | Task | Weight | Endpoint | | |
| |------|--------|----------| | |
| | Single Prediction | 60% | `POST /predict` | | |
| | Batch Prediction | 20% | `POST /predict/batch` | | |
| | Monitoring | 20% | `GET /health`, `/predictions` | | |
| ### HF Spaces Monitoring Access | |
| Both Prometheus and Grafana are available on the production deployment: | |
| | Service | URL | | |
| |---------|-----| | |
| | Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ | | |
| | Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ | | |
| ### Uptime Monitoring (Better Stack) | |
| - External monitoring from multiple locations | |
| - Email notifications on failures | |
| - Tracked endpoints: `/health`, `/openapi.json`, `/docs` | |