# Milestone Summaries This document provides a comprehensive overview of all six project milestones, documenting the evolution of the Hopcroft Skill Classification system from requirements engineering through production monitoring. --- ## Milestone 1: Requirements Engineering **Objective:** Define the problem space, stakeholders, and success criteria using the Machine Learning Canvas framework. ### Key Deliverables | Component | Description | |-----------|-------------| | **Prediction Task** | Multi-label classification of 217 technical skills from GitHub issue/PR text | | **Stakeholders** | Project managers, team leads, developers | | **Data Source** | SkillScope DB with 7,245 merged PRs from 11 Java repositories | | **Success Metrics** | Micro-F1 score improvement over baseline, precision/recall balance | ### ML Canvas Framework The complete ML Canvas is documented in [ML Canvas.md](./ML%20Canvas.md), covering: - **Value Proposition**: Automated task assignment optimization - **Decisions**: Resource allocation for issue resolution - **Data Collection**: Automated labeling via API call detection - **Impact Simulation**: Outperform SkillScope RF + TF-IDF baseline - **Monitoring**: Continuous evaluation with drift detection ### Identified Risks & Mitigations | Risk | Mitigation Strategy | |------|---------------------| | Label imbalance (217 classes) | SMOTE, MLSMOTE, ADASYN oversampling | | Text noise (URLs, HTML, code) | Custom preprocessing pipeline | | Multi-label complexity | MultiOutputClassifier with stratified splits | --- ## Milestone 2: Data Management & Experiment Tracking **Objective:** Establish end-to-end infrastructure for reproducible ML experiments. ### Data Pipeline ``` data/raw/ → dataset.py → data/processed/ (SkillScope SQLite) (HuggingFace) (Clean CSV) ↓ features.py ↓ data/processed/ (TF-IDF/Embeddings) ``` ### Key Components 1. **Data Management** - DVC setup with DagsHub remote storage - Git-ignored data and model directories - Version-controlled `.dvc` files for reproducibility 2. **Data Ingestion** - `dataset.py`: Downloads SkillScope from Hugging Face - Extracts SQLite database with cleanup 3. **Feature Engineering** - `features.py`: Text cleaning pipeline - URL/HTML/Markdown removal - Normalization and Porter stemming - TF-IDF vectorization (uni+bi-grams) - Sentence embedding generation 4. **Configuration** - `config.py`: Centralized paths, hyperparameters, MLflow URI 5. **Experiment Tracking** - MLflow with DagsHub remote - Logged metrics: precision, recall, F1-score - Artifact storage: models, vectorizers, scalers ### Training Actions | Action | Description | |--------|-------------| | `baseline` | Random Forest with TF-IDF | | `mlsmote` | Multi-label SMOTE oversampling | | `ros` | Random Oversampling | | `adasyn-pca` | ADASYN + PCA dimensionality reduction | | `lightgbm` | LightGBM classifier | --- ## Milestone 3: Quality Assurance **Objective:** Implement comprehensive testing and validation framework for data quality and model robustness. ### Data Cleaning Pipeline | Metric | Before | After | Resolution | |--------|--------|-------|------------| | Total Samples | 7,154 | 6,673 | -481 duplicates | | Duplicates | 481 | 0 | Exact match removal | | Label Conflicts | 640 | 0 | Majority voting | | Data Leakage | Present | 0 | Train/test separation | ### Validation Frameworks #### Great Expectations (10 Tests) | Test | Purpose | Status | |------|---------|--------| | Database Schema | Validate SQLite structure | ✅ Pass | | TF-IDF Matrix | No NaN/Inf, sparsity checks | ✅ Pass | | Binary Labels | Values in {0,1} | ✅ Pass | | Feature-Label Alignment | Row count consistency | ✅ Pass | | Label Distribution | Min 5 occurrences per label | ✅ Pass | | SMOTE Compatibility | Min 10 non-zero features | ✅ Pass | | Multi-Output Format | >50% multi-label samples | ✅ Pass | | Duplicate Detection | No duplicate features | ✅ Pass | | Train-Test Separation | Zero intersection | ✅ Pass | | Label Consistency | Same features → same labels | ✅ Pass | #### Deepchecks (24 Checks) - **Data Integrity Suite**: 92% score (12 checks) - **Train-Test Validation Suite**: 100% score (12 checks) - **Overall Status**: Production-ready (96% combined) #### Behavioral Testing (36 Tests) | Category | Tests | Description | |----------|-------|-------------| | Invariance | 9 | Typo, case, punctuation robustness | | Directional | 10 | Keyword addition effects | | Minimum Functionality | 17 | Basic skill predictions | ### Code Quality - **Ruff Analysis**: 28 minor issues (100% fixable) - **Standards**: PEP 8 compliant, Black compatible Full details: [testing_and_validation.md](./testing_and_validation.md) --- ## Milestone 4: API Development **Objective:** Implement production-ready REST API for skill prediction with MLflow integration. ### Endpoints | Method | Endpoint | Description | |--------|----------|-------------| | `POST` | `/predict` | Single issue prediction | | `POST` | `/predict/batch` | Batch predictions (max 100) | | `GET` | `/predictions/{run_id}` | Retrieve by MLflow Run ID | | `GET` | `/predictions` | List recent predictions | | `GET` | `/health` | Service health check | | `GET` | `/metrics` | Prometheus metrics | ### Features - **FastAPI Framework**: Async request handling, auto-generated OpenAPI docs - **MLflow Integration**: All predictions logged with metadata - **Pydantic Validation**: Request/response schema enforcement - **Prometheus Metrics**: Request counters, latency histograms, gauges ### Documentation Access - Swagger UI: `/docs` - ReDoc: `/redoc` - OpenAPI JSON: `/openapi.json` --- ## Milestone 5: Deployment & Containerization **Objective:** Implement containerized deployment with CI/CD pipeline for production delivery. ### Docker Architecture ``` docker/docker-compose.yml ├── hopcroft-api (FastAPI Backend) │ ├── Port: 8080 │ ├── Health Check: /health │ └── Volumes: source code, logs │ ├── hopcroft-gui (Streamlit Frontend) │ ├── Port: 8501 │ └── Depends on: hopcroft-api │ └── hopcroft-net (Bridge Network) ``` ### Hugging Face Spaces Deployment | Component | Configuration | |-----------|---------------| | SDK | Docker | | Port | 7860 | | Startup Script | `docker/scripts/start_space.sh` | | Secrets | `DAGSHUB_USERNAME`, `DAGSHUB_TOKEN` | **Startup Flow:** 1. Configure DVC with secrets 2. Pull models from DagsHub 3. Start FastAPI (port 8000) 4. Start Streamlit (port 8501) 5. Start Nginx reverse proxy (port 7860) ### CI/CD Pipeline (GitHub Actions) ```yaml Triggers: push/PR to main, feature/* Jobs: 1. unit-tests - Ruff linting - Pytest unit tests - HTML report generation 2. build-image (requires unit-tests) - DVC model pull - Docker image build ``` --- ## Milestone 6: Monitoring & Observability **Objective:** Implement comprehensive monitoring infrastructure with drift detection. ### Prometheus Metrics | Metric | Type | Description | |--------|------|-------------| | `hopcroft_requests_total` | Counter | Total requests by method/endpoint | | `hopcroft_request_duration_seconds` | Histogram | Request latency distribution | | `hopcroft_in_progress_requests` | Gauge | Currently processing requests | | `hopcroft_prediction_processing_seconds` | Summary | Model inference time | ### Grafana Dashboards - **Request Rate**: Real-time requests per second - **Request Latency (p50, p95)**: Response time percentiles - **In-Progress Requests**: Currently processing requests - **Error Rate (5xx)**: Failed request percentage - **Model Prediction Time**: Inference latency - **Requests by Endpoint**: Traffic distribution ### Data Drift Detection | Component | Details | |-----------|---------| | Algorithm | Kolmogorov-Smirnov Two-Sample Test | | Baseline | 1000 samples from training data | | Threshold | p-value < 0.05 (Bonferroni corrected) | | Metrics | `drift_detected`, `drift_p_value`, `drift_distance` | ### Alerting Rules | Alert | Condition | |-------|-----------| | `ServiceDown` | Target unreachable for 5m | | `HighErrorRate` | 5xx rate > 10% for 5m | | `SlowRequests` | P95 latency > 2s | ### Load Testing (Locust) | Task | Weight | Endpoint | |------|--------|----------| | Single Prediction | 60% | `POST /predict` | | Batch Prediction | 20% | `POST /predict/batch` | | Monitoring | 20% | `GET /health`, `/predictions` | ### HF Spaces Monitoring Access Both Prometheus and Grafana are available on the production deployment: | Service | URL | |---------|-----| | Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ | | Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ | ### Uptime Monitoring (Better Stack) - External monitoring from multiple locations - Email notifications on failures - Tracked endpoints: `/health`, `/openapi.json`, `/docs`