Spaces:

DaCrow13
/

Hopcroft-Skill-Classification

Running

App Files Files Community

Hopcroft-Skill-Classification / docs /milestone_summaries.md

maurocarlu

nginx endpoints addition - grafana documentation update

70cbf15 22 days ago

preview code

raw

history blame contribute delete

9.15 kB

Milestone Summaries

This document provides a comprehensive overview of all six project milestones, documenting the evolution of the Hopcroft Skill Classification system from requirements engineering through production monitoring.

Milestone 1: Requirements Engineering

Objective: Define the problem space, stakeholders, and success criteria using the Machine Learning Canvas framework.

Key Deliverables

Component	Description
Prediction Task	Multi-label classification of 217 technical skills from GitHub issue/PR text
Stakeholders	Project managers, team leads, developers
Data Source	SkillScope DB with 7,245 merged PRs from 11 Java repositories
Success Metrics	Micro-F1 score improvement over baseline, precision/recall balance

ML Canvas Framework

The complete ML Canvas is documented in ML Canvas.md, covering:

Value Proposition: Automated task assignment optimization
Decisions: Resource allocation for issue resolution
Data Collection: Automated labeling via API call detection
Impact Simulation: Outperform SkillScope RF + TF-IDF baseline
Monitoring: Continuous evaluation with drift detection

Identified Risks & Mitigations

Risk	Mitigation Strategy
Label imbalance (217 classes)	SMOTE, MLSMOTE, ADASYN oversampling
Text noise (URLs, HTML, code)	Custom preprocessing pipeline
Multi-label complexity	MultiOutputClassifier with stratified splits

Milestone 2: Data Management & Experiment Tracking

Objective: Establish end-to-end infrastructure for reproducible ML experiments.

Data Pipeline

data/raw/           → dataset.py       → data/processed/
(SkillScope SQLite)   (HuggingFace)       (Clean CSV)
                           ↓
                      features.py
                           ↓
                    data/processed/
                    (TF-IDF/Embeddings)

Key Components

Data Management
- DVC setup with DagsHub remote storage
- Git-ignored data and model directories
- Version-controlled .dvc files for reproducibility
Data Ingestion
- dataset.py: Downloads SkillScope from Hugging Face
- Extracts SQLite database with cleanup
Feature Engineering
- features.py: Text cleaning pipeline
  - URL/HTML/Markdown removal
  - Normalization and Porter stemming
  - TF-IDF vectorization (uni+bi-grams)
  - Sentence embedding generation
Configuration
- config.py: Centralized paths, hyperparameters, MLflow URI
Experiment Tracking
- MLflow with DagsHub remote
- Logged metrics: precision, recall, F1-score
- Artifact storage: models, vectorizers, scalers

Training Actions

Action	Description
`baseline`	Random Forest with TF-IDF
`mlsmote`	Multi-label SMOTE oversampling
`ros`	Random Oversampling
`adasyn-pca`	ADASYN + PCA dimensionality reduction
`lightgbm`	LightGBM classifier

Milestone 3: Quality Assurance

Objective: Implement comprehensive testing and validation framework for data quality and model robustness.

Data Cleaning Pipeline

Metric	Before	After	Resolution
Total Samples	7,154	6,673	-481 duplicates
Duplicates	481	0	Exact match removal
Label Conflicts	640	0	Majority voting
Data Leakage	Present	0	Train/test separation

Validation Frameworks

Great Expectations (10 Tests)

Test	Purpose	Status
Database Schema	Validate SQLite structure	✅ Pass
TF-IDF Matrix	No NaN/Inf, sparsity checks	✅ Pass
Binary Labels	Values in {0,1}	✅ Pass
Feature-Label Alignment	Row count consistency	✅ Pass
Label Distribution	Min 5 occurrences per label	✅ Pass
SMOTE Compatibility	Min 10 non-zero features	✅ Pass
Multi-Output Format	>50% multi-label samples	✅ Pass
Duplicate Detection	No duplicate features	✅ Pass
Train-Test Separation	Zero intersection	✅ Pass
Label Consistency	Same features → same labels	✅ Pass

Deepchecks (24 Checks)

Data Integrity Suite: 92% score (12 checks)
Train-Test Validation Suite: 100% score (12 checks)
Overall Status: Production-ready (96% combined)

Behavioral Testing (36 Tests)

Category	Tests	Description
Invariance	9	Typo, case, punctuation robustness
Directional	10	Keyword addition effects
Minimum Functionality	17	Basic skill predictions

Code Quality

Ruff Analysis: 28 minor issues (100% fixable)
Standards: PEP 8 compliant, Black compatible

Full details: testing_and_validation.md

Milestone 4: API Development

Objective: Implement production-ready REST API for skill prediction with MLflow integration.

Endpoints

Method	Endpoint	Description
`POST`	`/predict`	Single issue prediction
`POST`	`/predict/batch`	Batch predictions (max 100)
`GET`	`/predictions/{run_id}`	Retrieve by MLflow Run ID
`GET`	`/predictions`	List recent predictions
`GET`	`/health`	Service health check
`GET`	`/metrics`	Prometheus metrics

Features

FastAPI Framework: Async request handling, auto-generated OpenAPI docs
MLflow Integration: All predictions logged with metadata
Pydantic Validation: Request/response schema enforcement
Prometheus Metrics: Request counters, latency histograms, gauges

Documentation Access

Swagger UI: /docs
ReDoc: /redoc
OpenAPI JSON: /openapi.json

Milestone 5: Deployment & Containerization

Objective: Implement containerized deployment with CI/CD pipeline for production delivery.

Docker Architecture

docker/docker-compose.yml
├── hopcroft-api (FastAPI Backend)
│   ├── Port: 8080
│   ├── Health Check: /health
│   └── Volumes: source code, logs
│
├── hopcroft-gui (Streamlit Frontend)
│   ├── Port: 8501
│   └── Depends on: hopcroft-api
│
└── hopcroft-net (Bridge Network)

Hugging Face Spaces Deployment

Component	Configuration
SDK	Docker
Port	7860
Startup Script	`docker/scripts/start_space.sh`
Secrets	`DAGSHUB_USERNAME`, `DAGSHUB_TOKEN`

Startup Flow:

Configure DVC with secrets
Pull models from DagsHub
Start FastAPI (port 8000)
Start Streamlit (port 8501)
Start Nginx reverse proxy (port 7860)

CI/CD Pipeline (GitHub Actions)

Triggers: push/PR to main, feature/*
Jobs:
  1. unit-tests
     - Ruff linting
     - Pytest unit tests
     - HTML report generation
  
  2. build-image (requires unit-tests)
     - DVC model pull
     - Docker image build

Milestone 6: Monitoring & Observability

Objective: Implement comprehensive monitoring infrastructure with drift detection.

Prometheus Metrics

Metric	Type	Description
`hopcroft_requests_total`	Counter	Total requests by method/endpoint
`hopcroft_request_duration_seconds`	Histogram	Request latency distribution
`hopcroft_in_progress_requests`	Gauge	Currently processing requests
`hopcroft_prediction_processing_seconds`	Summary	Model inference time

Grafana Dashboards

Request Rate: Real-time requests per second
Request Latency (p50, p95): Response time percentiles
In-Progress Requests: Currently processing requests
Error Rate (5xx): Failed request percentage
Model Prediction Time: Inference latency
Requests by Endpoint: Traffic distribution

Data Drift Detection

Component	Details
Algorithm	Kolmogorov-Smirnov Two-Sample Test
Baseline	1000 samples from training data
Threshold	p-value < 0.05 (Bonferroni corrected)
Metrics	`drift_detected`, `drift_p_value`, `drift_distance`

Alerting Rules

Alert	Condition
`ServiceDown`	Target unreachable for 5m
`HighErrorRate`	5xx rate > 10% for 5m
`SlowRequests`	P95 latency > 2s

Load Testing (Locust)

Task	Weight	Endpoint
Single Prediction	60%	`POST /predict`
Batch Prediction	20%	`POST /predict/batch`
Monitoring	20%	`GET /health`, `/predictions`

HF Spaces Monitoring Access

Both Prometheus and Grafana are available on the production deployment:

Service	URL
Prometheus	https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/
Grafana	https://dacrow13-hopcroft-skill-classification.hf.space/grafana/

Uptime Monitoring (Better Stack)

External monitoring from multiple locations
Email notifications on failures
Tracked endpoints: /health, /openapi.json, /docs