Hopcroft-Skill-Classification / docs /milestone_summaries.md
maurocarlu's picture
nginx endpoints addition - grafana documentation update
70cbf15

Milestone Summaries

This document provides a comprehensive overview of all six project milestones, documenting the evolution of the Hopcroft Skill Classification system from requirements engineering through production monitoring.


Milestone 1: Requirements Engineering

Objective: Define the problem space, stakeholders, and success criteria using the Machine Learning Canvas framework.

Key Deliverables

Component Description
Prediction Task Multi-label classification of 217 technical skills from GitHub issue/PR text
Stakeholders Project managers, team leads, developers
Data Source SkillScope DB with 7,245 merged PRs from 11 Java repositories
Success Metrics Micro-F1 score improvement over baseline, precision/recall balance

ML Canvas Framework

The complete ML Canvas is documented in ML Canvas.md, covering:

  • Value Proposition: Automated task assignment optimization
  • Decisions: Resource allocation for issue resolution
  • Data Collection: Automated labeling via API call detection
  • Impact Simulation: Outperform SkillScope RF + TF-IDF baseline
  • Monitoring: Continuous evaluation with drift detection

Identified Risks & Mitigations

Risk Mitigation Strategy
Label imbalance (217 classes) SMOTE, MLSMOTE, ADASYN oversampling
Text noise (URLs, HTML, code) Custom preprocessing pipeline
Multi-label complexity MultiOutputClassifier with stratified splits

Milestone 2: Data Management & Experiment Tracking

Objective: Establish end-to-end infrastructure for reproducible ML experiments.

Data Pipeline

data/raw/           β†’ dataset.py       β†’ data/processed/
(SkillScope SQLite)   (HuggingFace)       (Clean CSV)
                           ↓
                      features.py
                           ↓
                    data/processed/
                    (TF-IDF/Embeddings)

Key Components

  1. Data Management

    • DVC setup with DagsHub remote storage
    • Git-ignored data and model directories
    • Version-controlled .dvc files for reproducibility
  2. Data Ingestion

    • dataset.py: Downloads SkillScope from Hugging Face
    • Extracts SQLite database with cleanup
  3. Feature Engineering

    • features.py: Text cleaning pipeline
      • URL/HTML/Markdown removal
      • Normalization and Porter stemming
      • TF-IDF vectorization (uni+bi-grams)
      • Sentence embedding generation
  4. Configuration

    • config.py: Centralized paths, hyperparameters, MLflow URI
  5. Experiment Tracking

    • MLflow with DagsHub remote
    • Logged metrics: precision, recall, F1-score
    • Artifact storage: models, vectorizers, scalers

Training Actions

Action Description
baseline Random Forest with TF-IDF
mlsmote Multi-label SMOTE oversampling
ros Random Oversampling
adasyn-pca ADASYN + PCA dimensionality reduction
lightgbm LightGBM classifier

Milestone 3: Quality Assurance

Objective: Implement comprehensive testing and validation framework for data quality and model robustness.

Data Cleaning Pipeline

Metric Before After Resolution
Total Samples 7,154 6,673 -481 duplicates
Duplicates 481 0 Exact match removal
Label Conflicts 640 0 Majority voting
Data Leakage Present 0 Train/test separation

Validation Frameworks

Great Expectations (10 Tests)

Test Purpose Status
Database Schema Validate SQLite structure βœ… Pass
TF-IDF Matrix No NaN/Inf, sparsity checks βœ… Pass
Binary Labels Values in {0,1} βœ… Pass
Feature-Label Alignment Row count consistency βœ… Pass
Label Distribution Min 5 occurrences per label βœ… Pass
SMOTE Compatibility Min 10 non-zero features βœ… Pass
Multi-Output Format >50% multi-label samples βœ… Pass
Duplicate Detection No duplicate features βœ… Pass
Train-Test Separation Zero intersection βœ… Pass
Label Consistency Same features β†’ same labels βœ… Pass

Deepchecks (24 Checks)

  • Data Integrity Suite: 92% score (12 checks)
  • Train-Test Validation Suite: 100% score (12 checks)
  • Overall Status: Production-ready (96% combined)

Behavioral Testing (36 Tests)

Category Tests Description
Invariance 9 Typo, case, punctuation robustness
Directional 10 Keyword addition effects
Minimum Functionality 17 Basic skill predictions

Code Quality

  • Ruff Analysis: 28 minor issues (100% fixable)
  • Standards: PEP 8 compliant, Black compatible

Full details: testing_and_validation.md


Milestone 4: API Development

Objective: Implement production-ready REST API for skill prediction with MLflow integration.

Endpoints

Method Endpoint Description
POST /predict Single issue prediction
POST /predict/batch Batch predictions (max 100)
GET /predictions/{run_id} Retrieve by MLflow Run ID
GET /predictions List recent predictions
GET /health Service health check
GET /metrics Prometheus metrics

Features

  • FastAPI Framework: Async request handling, auto-generated OpenAPI docs
  • MLflow Integration: All predictions logged with metadata
  • Pydantic Validation: Request/response schema enforcement
  • Prometheus Metrics: Request counters, latency histograms, gauges

Documentation Access

  • Swagger UI: /docs
  • ReDoc: /redoc
  • OpenAPI JSON: /openapi.json

Milestone 5: Deployment & Containerization

Objective: Implement containerized deployment with CI/CD pipeline for production delivery.

Docker Architecture

docker/docker-compose.yml
β”œβ”€β”€ hopcroft-api (FastAPI Backend)
β”‚   β”œβ”€β”€ Port: 8080
β”‚   β”œβ”€β”€ Health Check: /health
β”‚   └── Volumes: source code, logs
β”‚
β”œβ”€β”€ hopcroft-gui (Streamlit Frontend)
β”‚   β”œβ”€β”€ Port: 8501
β”‚   └── Depends on: hopcroft-api
β”‚
└── hopcroft-net (Bridge Network)

Hugging Face Spaces Deployment

Component Configuration
SDK Docker
Port 7860
Startup Script docker/scripts/start_space.sh
Secrets DAGSHUB_USERNAME, DAGSHUB_TOKEN

Startup Flow:

  1. Configure DVC with secrets
  2. Pull models from DagsHub
  3. Start FastAPI (port 8000)
  4. Start Streamlit (port 8501)
  5. Start Nginx reverse proxy (port 7860)

CI/CD Pipeline (GitHub Actions)

Triggers: push/PR to main, feature/*
Jobs:
  1. unit-tests
     - Ruff linting
     - Pytest unit tests
     - HTML report generation
  
  2. build-image (requires unit-tests)
     - DVC model pull
     - Docker image build

Milestone 6: Monitoring & Observability

Objective: Implement comprehensive monitoring infrastructure with drift detection.

Prometheus Metrics

Metric Type Description
hopcroft_requests_total Counter Total requests by method/endpoint
hopcroft_request_duration_seconds Histogram Request latency distribution
hopcroft_in_progress_requests Gauge Currently processing requests
hopcroft_prediction_processing_seconds Summary Model inference time

Grafana Dashboards

  • Request Rate: Real-time requests per second
  • Request Latency (p50, p95): Response time percentiles
  • In-Progress Requests: Currently processing requests
  • Error Rate (5xx): Failed request percentage
  • Model Prediction Time: Inference latency
  • Requests by Endpoint: Traffic distribution

Data Drift Detection

Component Details
Algorithm Kolmogorov-Smirnov Two-Sample Test
Baseline 1000 samples from training data
Threshold p-value < 0.05 (Bonferroni corrected)
Metrics drift_detected, drift_p_value, drift_distance

Alerting Rules

Alert Condition
ServiceDown Target unreachable for 5m
HighErrorRate 5xx rate > 10% for 5m
SlowRequests P95 latency > 2s

Load Testing (Locust)

Task Weight Endpoint
Single Prediction 60% POST /predict
Batch Prediction 20% POST /predict/batch
Monitoring 20% GET /health, /predictions

HF Spaces Monitoring Access

Both Prometheus and Grafana are available on the production deployment:

Uptime Monitoring (Better Stack)

  • External monitoring from multiple locations
  • Email notifications on failures
  • Tracked endpoints: /health, /openapi.json, /docs