Milestone Summaries
This document provides a comprehensive overview of all six project milestones, documenting the evolution of the Hopcroft Skill Classification system from requirements engineering through production monitoring.
Milestone 1: Requirements Engineering
Objective: Define the problem space, stakeholders, and success criteria using the Machine Learning Canvas framework.
Key Deliverables
| Component | Description |
|---|---|
| Prediction Task | Multi-label classification of 217 technical skills from GitHub issue/PR text |
| Stakeholders | Project managers, team leads, developers |
| Data Source | SkillScope DB with 7,245 merged PRs from 11 Java repositories |
| Success Metrics | Micro-F1 score improvement over baseline, precision/recall balance |
ML Canvas Framework
The complete ML Canvas is documented in ML Canvas.md, covering:
- Value Proposition: Automated task assignment optimization
- Decisions: Resource allocation for issue resolution
- Data Collection: Automated labeling via API call detection
- Impact Simulation: Outperform SkillScope RF + TF-IDF baseline
- Monitoring: Continuous evaluation with drift detection
Identified Risks & Mitigations
| Risk | Mitigation Strategy |
|---|---|
| Label imbalance (217 classes) | SMOTE, MLSMOTE, ADASYN oversampling |
| Text noise (URLs, HTML, code) | Custom preprocessing pipeline |
| Multi-label complexity | MultiOutputClassifier with stratified splits |
Milestone 2: Data Management & Experiment Tracking
Objective: Establish end-to-end infrastructure for reproducible ML experiments.
Data Pipeline
data/raw/ β dataset.py β data/processed/
(SkillScope SQLite) (HuggingFace) (Clean CSV)
β
features.py
β
data/processed/
(TF-IDF/Embeddings)
Key Components
Data Management
- DVC setup with DagsHub remote storage
- Git-ignored data and model directories
- Version-controlled
.dvcfiles for reproducibility
Data Ingestion
dataset.py: Downloads SkillScope from Hugging Face- Extracts SQLite database with cleanup
Feature Engineering
features.py: Text cleaning pipeline- URL/HTML/Markdown removal
- Normalization and Porter stemming
- TF-IDF vectorization (uni+bi-grams)
- Sentence embedding generation
Configuration
config.py: Centralized paths, hyperparameters, MLflow URI
Experiment Tracking
- MLflow with DagsHub remote
- Logged metrics: precision, recall, F1-score
- Artifact storage: models, vectorizers, scalers
Training Actions
| Action | Description |
|---|---|
baseline |
Random Forest with TF-IDF |
mlsmote |
Multi-label SMOTE oversampling |
ros |
Random Oversampling |
adasyn-pca |
ADASYN + PCA dimensionality reduction |
lightgbm |
LightGBM classifier |
Milestone 3: Quality Assurance
Objective: Implement comprehensive testing and validation framework for data quality and model robustness.
Data Cleaning Pipeline
| Metric | Before | After | Resolution |
|---|---|---|---|
| Total Samples | 7,154 | 6,673 | -481 duplicates |
| Duplicates | 481 | 0 | Exact match removal |
| Label Conflicts | 640 | 0 | Majority voting |
| Data Leakage | Present | 0 | Train/test separation |
Validation Frameworks
Great Expectations (10 Tests)
| Test | Purpose | Status |
|---|---|---|
| Database Schema | Validate SQLite structure | β Pass |
| TF-IDF Matrix | No NaN/Inf, sparsity checks | β Pass |
| Binary Labels | Values in {0,1} | β Pass |
| Feature-Label Alignment | Row count consistency | β Pass |
| Label Distribution | Min 5 occurrences per label | β Pass |
| SMOTE Compatibility | Min 10 non-zero features | β Pass |
| Multi-Output Format | >50% multi-label samples | β Pass |
| Duplicate Detection | No duplicate features | β Pass |
| Train-Test Separation | Zero intersection | β Pass |
| Label Consistency | Same features β same labels | β Pass |
Deepchecks (24 Checks)
- Data Integrity Suite: 92% score (12 checks)
- Train-Test Validation Suite: 100% score (12 checks)
- Overall Status: Production-ready (96% combined)
Behavioral Testing (36 Tests)
| Category | Tests | Description |
|---|---|---|
| Invariance | 9 | Typo, case, punctuation robustness |
| Directional | 10 | Keyword addition effects |
| Minimum Functionality | 17 | Basic skill predictions |
Code Quality
- Ruff Analysis: 28 minor issues (100% fixable)
- Standards: PEP 8 compliant, Black compatible
Full details: testing_and_validation.md
Milestone 4: API Development
Objective: Implement production-ready REST API for skill prediction with MLflow integration.
Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST |
/predict |
Single issue prediction |
POST |
/predict/batch |
Batch predictions (max 100) |
GET |
/predictions/{run_id} |
Retrieve by MLflow Run ID |
GET |
/predictions |
List recent predictions |
GET |
/health |
Service health check |
GET |
/metrics |
Prometheus metrics |
Features
- FastAPI Framework: Async request handling, auto-generated OpenAPI docs
- MLflow Integration: All predictions logged with metadata
- Pydantic Validation: Request/response schema enforcement
- Prometheus Metrics: Request counters, latency histograms, gauges
Documentation Access
- Swagger UI:
/docs - ReDoc:
/redoc - OpenAPI JSON:
/openapi.json
Milestone 5: Deployment & Containerization
Objective: Implement containerized deployment with CI/CD pipeline for production delivery.
Docker Architecture
docker/docker-compose.yml
βββ hopcroft-api (FastAPI Backend)
β βββ Port: 8080
β βββ Health Check: /health
β βββ Volumes: source code, logs
β
βββ hopcroft-gui (Streamlit Frontend)
β βββ Port: 8501
β βββ Depends on: hopcroft-api
β
βββ hopcroft-net (Bridge Network)
Hugging Face Spaces Deployment
| Component | Configuration |
|---|---|
| SDK | Docker |
| Port | 7860 |
| Startup Script | docker/scripts/start_space.sh |
| Secrets | DAGSHUB_USERNAME, DAGSHUB_TOKEN |
Startup Flow:
- Configure DVC with secrets
- Pull models from DagsHub
- Start FastAPI (port 8000)
- Start Streamlit (port 8501)
- Start Nginx reverse proxy (port 7860)
CI/CD Pipeline (GitHub Actions)
Triggers: push/PR to main, feature/*
Jobs:
1. unit-tests
- Ruff linting
- Pytest unit tests
- HTML report generation
2. build-image (requires unit-tests)
- DVC model pull
- Docker image build
Milestone 6: Monitoring & Observability
Objective: Implement comprehensive monitoring infrastructure with drift detection.
Prometheus Metrics
| Metric | Type | Description |
|---|---|---|
hopcroft_requests_total |
Counter | Total requests by method/endpoint |
hopcroft_request_duration_seconds |
Histogram | Request latency distribution |
hopcroft_in_progress_requests |
Gauge | Currently processing requests |
hopcroft_prediction_processing_seconds |
Summary | Model inference time |
Grafana Dashboards
- Request Rate: Real-time requests per second
- Request Latency (p50, p95): Response time percentiles
- In-Progress Requests: Currently processing requests
- Error Rate (5xx): Failed request percentage
- Model Prediction Time: Inference latency
- Requests by Endpoint: Traffic distribution
Data Drift Detection
| Component | Details |
|---|---|
| Algorithm | Kolmogorov-Smirnov Two-Sample Test |
| Baseline | 1000 samples from training data |
| Threshold | p-value < 0.05 (Bonferroni corrected) |
| Metrics | drift_detected, drift_p_value, drift_distance |
Alerting Rules
| Alert | Condition |
|---|---|
ServiceDown |
Target unreachable for 5m |
HighErrorRate |
5xx rate > 10% for 5m |
SlowRequests |
P95 latency > 2s |
Load Testing (Locust)
| Task | Weight | Endpoint |
|---|---|---|
| Single Prediction | 60% | POST /predict |
| Batch Prediction | 20% | POST /predict/batch |
| Monitoring | 20% | GET /health, /predictions |
HF Spaces Monitoring Access
Both Prometheus and Grafana are available on the production deployment:
| Service | URL |
|---|---|
| Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ |
| Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ |
Uptime Monitoring (Better Stack)
- External monitoring from multiple locations
- Email notifications on failures
- Tracked endpoints:
/health,/openapi.json,/docs