Hopcroft-Skill-Classification / docs /milestone_summaries.md
maurocarlu's picture
nginx endpoints addition - grafana documentation update
70cbf15
# Milestone Summaries
This document provides a comprehensive overview of all six project milestones, documenting the evolution of the Hopcroft Skill Classification system from requirements engineering through production monitoring.
---
## Milestone 1: Requirements Engineering
**Objective:** Define the problem space, stakeholders, and success criteria using the Machine Learning Canvas framework.
### Key Deliverables
| Component | Description |
|-----------|-------------|
| **Prediction Task** | Multi-label classification of 217 technical skills from GitHub issue/PR text |
| **Stakeholders** | Project managers, team leads, developers |
| **Data Source** | SkillScope DB with 7,245 merged PRs from 11 Java repositories |
| **Success Metrics** | Micro-F1 score improvement over baseline, precision/recall balance |
### ML Canvas Framework
The complete ML Canvas is documented in [ML Canvas.md](./ML%20Canvas.md), covering:
- **Value Proposition**: Automated task assignment optimization
- **Decisions**: Resource allocation for issue resolution
- **Data Collection**: Automated labeling via API call detection
- **Impact Simulation**: Outperform SkillScope RF + TF-IDF baseline
- **Monitoring**: Continuous evaluation with drift detection
### Identified Risks & Mitigations
| Risk | Mitigation Strategy |
|------|---------------------|
| Label imbalance (217 classes) | SMOTE, MLSMOTE, ADASYN oversampling |
| Text noise (URLs, HTML, code) | Custom preprocessing pipeline |
| Multi-label complexity | MultiOutputClassifier with stratified splits |
---
## Milestone 2: Data Management & Experiment Tracking
**Objective:** Establish end-to-end infrastructure for reproducible ML experiments.
### Data Pipeline
```
data/raw/ β†’ dataset.py β†’ data/processed/
(SkillScope SQLite) (HuggingFace) (Clean CSV)
↓
features.py
↓
data/processed/
(TF-IDF/Embeddings)
```
### Key Components
1. **Data Management**
- DVC setup with DagsHub remote storage
- Git-ignored data and model directories
- Version-controlled `.dvc` files for reproducibility
2. **Data Ingestion**
- `dataset.py`: Downloads SkillScope from Hugging Face
- Extracts SQLite database with cleanup
3. **Feature Engineering**
- `features.py`: Text cleaning pipeline
- URL/HTML/Markdown removal
- Normalization and Porter stemming
- TF-IDF vectorization (uni+bi-grams)
- Sentence embedding generation
4. **Configuration**
- `config.py`: Centralized paths, hyperparameters, MLflow URI
5. **Experiment Tracking**
- MLflow with DagsHub remote
- Logged metrics: precision, recall, F1-score
- Artifact storage: models, vectorizers, scalers
### Training Actions
| Action | Description |
|--------|-------------|
| `baseline` | Random Forest with TF-IDF |
| `mlsmote` | Multi-label SMOTE oversampling |
| `ros` | Random Oversampling |
| `adasyn-pca` | ADASYN + PCA dimensionality reduction |
| `lightgbm` | LightGBM classifier |
---
## Milestone 3: Quality Assurance
**Objective:** Implement comprehensive testing and validation framework for data quality and model robustness.
### Data Cleaning Pipeline
| Metric | Before | After | Resolution |
|--------|--------|-------|------------|
| Total Samples | 7,154 | 6,673 | -481 duplicates |
| Duplicates | 481 | 0 | Exact match removal |
| Label Conflicts | 640 | 0 | Majority voting |
| Data Leakage | Present | 0 | Train/test separation |
### Validation Frameworks
#### Great Expectations (10 Tests)
| Test | Purpose | Status |
|------|---------|--------|
| Database Schema | Validate SQLite structure | βœ… Pass |
| TF-IDF Matrix | No NaN/Inf, sparsity checks | βœ… Pass |
| Binary Labels | Values in {0,1} | βœ… Pass |
| Feature-Label Alignment | Row count consistency | βœ… Pass |
| Label Distribution | Min 5 occurrences per label | βœ… Pass |
| SMOTE Compatibility | Min 10 non-zero features | βœ… Pass |
| Multi-Output Format | >50% multi-label samples | βœ… Pass |
| Duplicate Detection | No duplicate features | βœ… Pass |
| Train-Test Separation | Zero intersection | βœ… Pass |
| Label Consistency | Same features β†’ same labels | βœ… Pass |
#### Deepchecks (24 Checks)
- **Data Integrity Suite**: 92% score (12 checks)
- **Train-Test Validation Suite**: 100% score (12 checks)
- **Overall Status**: Production-ready (96% combined)
#### Behavioral Testing (36 Tests)
| Category | Tests | Description |
|----------|-------|-------------|
| Invariance | 9 | Typo, case, punctuation robustness |
| Directional | 10 | Keyword addition effects |
| Minimum Functionality | 17 | Basic skill predictions |
### Code Quality
- **Ruff Analysis**: 28 minor issues (100% fixable)
- **Standards**: PEP 8 compliant, Black compatible
Full details: [testing_and_validation.md](./testing_and_validation.md)
---
## Milestone 4: API Development
**Objective:** Implement production-ready REST API for skill prediction with MLflow integration.
### Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/predict` | Single issue prediction |
| `POST` | `/predict/batch` | Batch predictions (max 100) |
| `GET` | `/predictions/{run_id}` | Retrieve by MLflow Run ID |
| `GET` | `/predictions` | List recent predictions |
| `GET` | `/health` | Service health check |
| `GET` | `/metrics` | Prometheus metrics |
### Features
- **FastAPI Framework**: Async request handling, auto-generated OpenAPI docs
- **MLflow Integration**: All predictions logged with metadata
- **Pydantic Validation**: Request/response schema enforcement
- **Prometheus Metrics**: Request counters, latency histograms, gauges
### Documentation Access
- Swagger UI: `/docs`
- ReDoc: `/redoc`
- OpenAPI JSON: `/openapi.json`
---
## Milestone 5: Deployment & Containerization
**Objective:** Implement containerized deployment with CI/CD pipeline for production delivery.
### Docker Architecture
```
docker/docker-compose.yml
β”œβ”€β”€ hopcroft-api (FastAPI Backend)
β”‚ β”œβ”€β”€ Port: 8080
β”‚ β”œβ”€β”€ Health Check: /health
β”‚ └── Volumes: source code, logs
β”‚
β”œβ”€β”€ hopcroft-gui (Streamlit Frontend)
β”‚ β”œβ”€β”€ Port: 8501
β”‚ └── Depends on: hopcroft-api
β”‚
└── hopcroft-net (Bridge Network)
```
### Hugging Face Spaces Deployment
| Component | Configuration |
|-----------|---------------|
| SDK | Docker |
| Port | 7860 |
| Startup Script | `docker/scripts/start_space.sh` |
| Secrets | `DAGSHUB_USERNAME`, `DAGSHUB_TOKEN` |
**Startup Flow:**
1. Configure DVC with secrets
2. Pull models from DagsHub
3. Start FastAPI (port 8000)
4. Start Streamlit (port 8501)
5. Start Nginx reverse proxy (port 7860)
### CI/CD Pipeline (GitHub Actions)
```yaml
Triggers: push/PR to main, feature/*
Jobs:
1. unit-tests
- Ruff linting
- Pytest unit tests
- HTML report generation
2. build-image (requires unit-tests)
- DVC model pull
- Docker image build
```
---
## Milestone 6: Monitoring & Observability
**Objective:** Implement comprehensive monitoring infrastructure with drift detection.
### Prometheus Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `hopcroft_requests_total` | Counter | Total requests by method/endpoint |
| `hopcroft_request_duration_seconds` | Histogram | Request latency distribution |
| `hopcroft_in_progress_requests` | Gauge | Currently processing requests |
| `hopcroft_prediction_processing_seconds` | Summary | Model inference time |
### Grafana Dashboards
- **Request Rate**: Real-time requests per second
- **Request Latency (p50, p95)**: Response time percentiles
- **In-Progress Requests**: Currently processing requests
- **Error Rate (5xx)**: Failed request percentage
- **Model Prediction Time**: Inference latency
- **Requests by Endpoint**: Traffic distribution
### Data Drift Detection
| Component | Details |
|-----------|---------|
| Algorithm | Kolmogorov-Smirnov Two-Sample Test |
| Baseline | 1000 samples from training data |
| Threshold | p-value < 0.05 (Bonferroni corrected) |
| Metrics | `drift_detected`, `drift_p_value`, `drift_distance` |
### Alerting Rules
| Alert | Condition |
|-------|-----------|
| `ServiceDown` | Target unreachable for 5m |
| `HighErrorRate` | 5xx rate > 10% for 5m |
| `SlowRequests` | P95 latency > 2s |
### Load Testing (Locust)
| Task | Weight | Endpoint |
|------|--------|----------|
| Single Prediction | 60% | `POST /predict` |
| Batch Prediction | 20% | `POST /predict/batch` |
| Monitoring | 20% | `GET /health`, `/predictions` |
### HF Spaces Monitoring Access
Both Prometheus and Grafana are available on the production deployment:
| Service | URL |
|---------|-----|
| Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ |
| Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ |
### Uptime Monitoring (Better Stack)
- External monitoring from multiple locations
- Email notifications on failures
- Tracked endpoints: `/health`, `/openapi.json`, `/docs`