Spaces:

DaCrow13
/

Hopcroft-Skill-Classification

Sleeping

App Files Files Community

maurocarlu commited on Jan 5

Commit

bba28e5

1 Parent(s): 36092d6

add comprehensive project documentation including milestone summaries, a user guide, and design choices, and update the main README.

Browse files

Files changed (7) hide show

README.md +130 -630
docs/README.md +19 -7
docs/design_choices.md +487 -0
docs/docs/getting-started.md +0 -6
docs/docs/index.md +0 -10
docs/milestone_summaries.md +288 -0
docs/user_guide.md +497 -0

README.md CHANGED Viewed

@@ -8,685 +8,185 @@ app_port: 7860
 api_docs_url: /docs
 ---
-# Hopcroft_Skill-Classification-Tool-Competition
-The task involves analyzing the relationship between issue characteristics and required skills, developing effective feature extraction methods that combine textual and code-context information, and implementing sophisticated multi-label classification approaches. Students may incorporate additional GitHub metadata to enhance model inputs, but must avoid using third-party classification engines or direct outputs from the provided database. The work requires careful attention to the multi-label nature of the problem, where each issue may require multiple different skills for resolution.
-## Project Organization
-```
-├── LICENSE            <- Open-source license if one is chosen
-├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
-├── README.md          <- The top-level README for developers using this project.
-├── data
-│   ├── external       <- Data from third party sources.
-│   ├── interim        <- Intermediate data that has been transformed.
-│   ├── processed      <- The final, canonical data sets for modeling.
-│   └── raw            <- The original, immutable data dump.
-│
-├── docs               <- A default mkdocs project; see www.mkdocs.org for details
-│
-├── models             <- Trained and serialized models, model predictions, or model summaries
-│
-├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
-│                         the creator's initials, and a short `-` delimited description, e.g.
-│                         `1.0-jqp-initial-data-exploration`.
-│
-├── pyproject.toml     <- Project configuration file with package metadata for
-│                         hopcroft_skill_classification_tool_competition and configuration for tools like black
-│
-├── references         <- Data dictionaries, manuals, and all other explanatory materials.
-│
-├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
-│   └── figures        <- Generated graphics and figures to be used in reporting
-│
-├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
-│                         generated with `pip freeze > requirements.txt`
-│
-├── setup.cfg          <- Configuration file for flake8
-│
-└── hopcroft_skill_classification_tool_competition   <- Source code for use in this project.
-    │
-    ├── __init__.py             <- Makes hopcroft_skill_classification_tool_competition a Python module
-    │
-    ├── config.py               <- Store useful variables and configuration
-    │
-    ├── dataset.py              <- Scripts to download or generate data
-    │
-    ├── features.py             <- Code to create features for modeling
-    │
-    ├── modeling
-    │   ├── __init__.py
-    │   ├── predict.py          <- Code to run model inference with trained models
-    │   └── train.py            <- Code to train models
-    │
-    └── plots.py                <- Code to create visualizations
-```
---------
-## Setup
-### MLflow Credentials Configuration
-Set up DagsHub credentials for MLflow tracking.
-**Get your token:** [DagsHub](https://dagshub.com) → Profile → Settings → Tokens
-#### Option 1: Using `.env` file (Recommended for local development)
-```bash
-# Copy the template
-cp .env.example .env
-# Edit .env with your credentials
-```
-Your `.env` file should contain:
-```
-MLFLOW_TRACKING_URI=https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow
-MLFLOW_TRACKING_USERNAME=your_username
-MLFLOW_TRACKING_PASSWORD=your_token
-```
-> [!NOTE]
-> The `.env` file is git-ignored for security. Never commit credentials to version control.
-#### Option 2: Using Docker Compose
-When using Docker Compose, the `.env` file is automatically loaded via `env_file` directive in `docker-compose.yml`.
-```bash
-# Start the service (credentials loaded from .env)
-docker compose up --build
-```
---------
-## CI Configuration
 [![CI Pipeline](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml/badge.svg)](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml)
-This project uses automatically triggered GitHub Actions triggers for Continuous Integration.
-### Secrets
-To enable DVC model pulling, configure these Repository Secrets:
-- `DAGSHUB_USERNAME`: DagsHub username.
-- `DAGSHUB_TOKEN`: DagsHub access token.
---------
-## Milestone Summary
-### Milestone 1
-We compiled the ML Canvas and defined:
-- Problem: multi-label classification of skills for PR/issues.
-- Stakeholders and business/research goals.
-- Data sources (SkillScope DB) and constraints (no external classifiers).
-- Success metrics (micro-F1, imbalance handling, experiment tracking).
-- Risks (label imbalance, text noise, multi-label complexity) and mitigations.
-### Milestone 2
-We implemented the essential end-to-end infrastructure to go from data to tracked modeling experiments:
-1. Data Management
-    - DVC setup (raw dataset and TF-IDF features tracked) with DagsHub remote; dedicated gitignores for data/models.
-2. Data Ingestion & EDA
-    - `dataset.py` to download/extract SkillScope from Hugging Face (zip → SQLite) with cleanup.
-    - Initial exploration notebook `notebooks/1.0-initial-data-exploration.ipynb` (schema, text stats, label distribution).
-3. Feature Engineering
-    - `features.py`: GitHub text cleaning (URL/HTML/markdown removal, normalization, Porter stemming) and TF-IDF (uni+bi-grams) saved as NumPy (`features_tfidf.npy`, `labels_tfidf.npy`).
-4. Central Config
-    - `config.py` with project paths, training settings, RF param grid, MLflow URI/experiments, PCA/ADASYN, feature constants.
-5. Modeling & Experiments
-    - Unified `modeling/train.py` with actions: baseline RF, MLSMOTE, ROS, ADASYN+PCA, LightGBM, LightGBM+MLSMOTE, and inference.
-    - GridSearchCV (micro-F1), MLflow logging, removal of all-zero labels, multilabel-stratified splits (with fallback).
-6. Imbalance Handling
-    - Local `mlsmote.py` (multi-label oversampling) with fallback to `RandomOverSampler`; dedicated ADASYN+PCA pipeline.
-7. Tracking & Reproducibility
-    - Remote MLflow (DagsHub) with README credential setup; DVC-tracked models and auxiliary artifacts (e.g., PCA, kept label indices).
-8. Tooling
-    - Updated `requirements.txt` (lightgbm, imbalanced-learn, iterative-stratification, huggingface-hub, dvc, mlflow, nltk, seaborn, etc.) and extended Makefile targets (`data`, `features`).
-### Milestone 3 (QA)
-We implemented a comprehensive testing and validation framework to ensure data quality and model robustness:
-1. **Data Cleaning Pipeline**
-    - `data_cleaning.py`: Removes duplicates (481 samples), resolves label conflicts via majority voting (640 samples), filters sparse samples incompatible with SMOTE, and ensures train-test separation without leakage.
-    - Final cleaned dataset: 6,673 samples (from 7,154 original), 80/20 stratified split.
-2. **Great Expectations Validation** (10 tests)
-    - Database integrity, feature matrix validation (no NaN/Inf, sparsity checks), label format validation (binary {0,1}), feature-label consistency.
-    - Label distribution for stratification (min 5 occurrences), SMOTE compatibility (min 10 non-zero features), duplicate detection, train-test separation, label consistency.
-    - All 10 tests pass on cleaned data; comprehensive JSON reports in `reports/great_expectations/`.
-3. **Deepchecks Validation** (24 checks across 2 suites)
-    - Data Integrity Suite (92% score): validates duplicates, label conflicts, nulls, data types, feature correlation.
-    - Train-Test Validation Suite (100% score): **zero data leakage**, proper train/test split, feature/label drift analysis.
-    - Cleaned data achieved production-ready status (96% overall score).
-4. **Behavioral Testing** (36 tests)
-    - Invariance tests (9): typo robustness, synonym substitution, case insensitivity, punctuation/URL noise tolerance.
-    - Directional tests (10): keyword addition effects, technical detail impact on predictions.
-    - Minimum Functionality Tests (17): basic skill predictions on clear examples (bug fixes, database work, API development, testing, DevOps).
-    - All tests passed; comprehensive report in `reports/behavioral/`.
-5. **Code Quality Analysis**
-    - Ruff static analysis: 28 minor issues identified (unsorted imports, unused variables, f-strings), 100% fixable.
-    - PEP 8 compliant, Black compatible (line length 88).
-6. **Documentation**
-    - Comprehensive `docs/testing_and_validation.md` with detailed test descriptions, execution commands, and analysis results.
-    - Behavioral testing README with test categories, usage examples, and extension guide.
-7. **Tooling**
-    - Makefile targets: `validate-gx`, `validate-deepchecks`, `test-behavioral`, `test-complete`.
-    - Automated test execution and report generation.
-### Milestone 4 (API)
-We implemented a production-ready FastAPI service for skill prediction with MLflow integration:
-#### Features
-- **REST API Endpoints**:
-  - `POST /predict` - Predict skills for a GitHub issue (logs to MLflow)
-  - `GET /predictions/{run_id}` - Retrieve prediction by MLflow run ID
-  - `GET /predictions` - List recent predictions with pagination
-  - `GET /health` - Health check endpoint
-- **Model Management**: Loads trained Random Forest + TF-IDF vectorizer from `models/`
-- **MLflow Tracking**: All predictions logged with metadata, probabilities, and timestamps
-- **Input Validation**: Pydantic models for request/response validation
-- **Interactive Docs**: Auto-generated Swagger UI and ReDoc
-#### API Usage
-**1. Start the API Server**
-```bash
-# Development mode (auto-reload)
-make api-dev
-# Production mode
-make api-run
-```
-Server starts at: [http://127.0.0.1:8000](http://127.0.0.1:8000)
-**2. Test Endpoints**
-**Option A: Swagger UI (Recommended)**
-- Navigate to: [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
-- Interactive interface to test all endpoints
-- View request/response schemas
-**Option B: Make Commands**
-```bash
-# Test all endpoints
-make test-api-all
-# Individual endpoints
-make test-api-health        # Health check
-make test-api-predict       # Single prediction
-make test-api-list          # List predictions
-```
-#### Prerequisites
-- Trained model: `models/random_forest_tfidf_gridsearch.pkl`
-- TF-IDF vectorizer: `models/tfidf_vectorizer.pkl` (auto-saved during feature creation)
-- Label names: `models/label_names.pkl` (auto-saved during feature creation)
-#### MLflow Integration
-- All predictions logged to: `https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow`
-- Experiment: `skill_prediction_api`
-- Tracked: input text, predictions, probabilities, metadata
-#### Docker
-Build and run the API in a container:
-```bash
-docker build -f docker/Dockerfile -t hopcroft-api .
-docker run --rm --name hopcroft-api -p 8080:8080 hopcroft-api
 ```
-Endpoints:
-- Swagger UI: [http://localhost:8080/docs](http://localhost:8080/docs)
-- Health check: [http://localhost:8080/health](http://localhost:8080/health)
-### Milestone 5 (Deployment)
-We implemented a complete containerized deployment pipeline for production-ready delivery:
-1. **Docker Containerization**
-    - `docker/Dockerfile`: Multi-stage Python 3.10 slim image with non-root user, system dependencies (git, nginx, curl), DVC integration, and automated startup script.
-    - `docker/Dockerfile.streamlit`: Lightweight container for Streamlit GUI with minimal dependencies.
-    - `docker/.dockerignore`: Optimized build context excluding unnecessary files.
-2. **Docker Compose Orchestration**
-    - Multi-service architecture: API backend (`hopcroft-api`), Streamlit frontend (`hopcroft-gui`), and monitoring stack.
-    - Bridge network (`hopcroft-net`) for inter-service communication.
-    - Health checks with automatic restart policies.
-    - Bind mounts for development hot-reload, named volumes for persistent storage (`hopcroft-logs`).
-3. **Hugging Face Spaces Deployment**
-    - Docker SDK configuration with port 7860.
-    - `docker/scripts/start_space.sh`: Automated startup script that configures DVC credentials, pulls models from DagsHub, and starts FastAPI + Streamlit + Nginx.
-    - Secrets management via HF Spaces Variables (`DAGSHUB_USERNAME`, `DAGSHUB_TOKEN`).
-    - Live deployment: `https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft`
-4. **Nginx Reverse Proxy**
-    - `docker/nginx.conf`: Routes traffic to API (port 8000) and Streamlit (port 8501) on single port 7860.
-    - Path-based routing for API docs, metrics, and web interface.
-5. **Environment Configuration**
-    - `.env.example` template with MLflow and DagsHub credentials.
-    - Automatic environment variable injection via `env_file` directive.
-### Milestone 6 (Monitoring)
-We implemented comprehensive observability and load testing infrastructure:
-1. **Prometheus Metrics Collection**
-    - `prometheus.yml`: Scrape configuration for API metrics (10s interval), self-monitoring, and Pushgateway.
-    - Custom metrics: `hopcroft_requests_total`, `hopcroft_request_duration_seconds`, `hopcroft_in_progress_requests`, `hopcroft_prediction_processing_seconds`.
-    - PromQL queries for request rate, latency percentiles, and in-progress tracking.
-2. **Grafana Dashboards**
-    - Auto-provisioned datasources and dashboards via `provisioning/` directory.
-    - `hopcroft_dashboard.json`: Real-time visualization of API request rate, latency, drift status, and p-value metrics.
-    - Credentials: `admin/admin` on port 3000.
-3. **Alerting System**
-    - `alert_rules.yml`: Prometheus alert rules for `ServiceDown`, `HighErrorRate` (>10% 5xx), `SlowRequests` (p95 > 2s).
-    - Alertmanager configuration with severity-based routing and inhibition rules.
-    - Webhook integration for alert notifications.
-4. **Data Drift Detection**
-    - `prepare_baseline.py`: Extracts 1000-sample reference dataset from training data.
-    - `run_drift_check.py`: Kolmogorov-Smirnov two-sample test with Bonferroni correction (p < 0.05).
-    - Metrics pushed to Pushgateway: `drift_detected`, `drift_p_value`, `drift_distance`, `drift_check_timestamp`.
-    - JSON reports saved to `monitoring/drift/reports/`.
-5. **Locust Load Testing**
-    - `locustfile.py`: Simulated user behavior with weighted tasks (60% single prediction, 20% batch, 20% monitoring).
-    - Configurable wait times (1-5s) for realistic traffic simulation.
-    - Web UI on port 8089, headless mode support, CSV export for results.
-    - Pre-configured for HF Spaces and local Docker environments.
-6. **Uptime Monitoring (Better Stack)**
-    - External monitoring of production endpoints (`/health`, `/openapi.json`, `/docs`).
-    - Multi-location checks with email notifications.
-    - Incident tracking and resolution screenshots in `monitoring/screenshots/`.
-7. **CI/CD Pipeline**
-    - `.github/workflows/ci.yml`: GitHub Actions workflow triggered on push/PR to main and feature branches.
-    - Jobs: Ruff linting, pytest unit tests with HTML reports, DVC model pulling, Docker image build.
-    - Secrets: `DAGSHUB_USERNAME`, `DAGSHUB_TOKEN` for model access.
-    - Disk space optimization for CI runner.
-8. **Pushgateway Integration**
-    - Collects metrics from short-lived jobs (drift detection scripts).
-    - Persistent storage with 5-minute intervals.
-    - Scraped by Prometheus for long-term storage and Grafana visualization.
 ---
-## Docker Compose Usage
-Docker Compose orchestrates both the **API backend** and **Streamlit GUI** services with proper networking and configuration.
-### Prerequisites
-1. **Create your environment file:**
-   ```bash
-   cp .env.example .env
-   ```
-2. **Edit `.env`** with your actual credentials:
-   ```
-   MLFLOW_TRACKING_USERNAME=your_dagshub_username
-   MLFLOW_TRACKING_PASSWORD=your_dagshub_token
-   ```
-   Get your token from: [https://dagshub.com/user/settings/tokens](https://dagshub.com/user/settings/tokens)
-### Quick Start
-#### 1. Build and Start All Services
-Build both images and start the containers:
 ```bash
 docker compose -f docker/docker-compose.yml up -d --build
 ```
-| Flag | Description |
-|------|-------------|
-| `-d` | Run in detached mode (background) |
-| `--build` | Rebuild images before starting (use when code/Dockerfile changes) |
-**Available Services:**
-- **API (FastAPI):** [http://localhost:8080/docs](http://localhost:8080/docs)
-- **GUI (Streamlit):** [http://localhost:8501](http://localhost:8501)
-- **Health Check:** [http://localhost:8080/health](http://localhost:8080/health)
-#### 2. Stop All Services
-Stop and remove containers and networks:
 ```bash
-docker compose -f docker/docker-compose.yml down
-```
-| Flag | Description |
-|------|-------------|
-| `-v` | Also remove named volumes (e.g., `hopcroft-logs`): `docker-compose down -v` |
-| `--rmi all` | Also remove images: `docker-compose down --rmi all` |
-#### 3. Restart Services
-After updating `.env` or configuration files:
-```bash
-docker compose -f docker/docker-compose.yml restart
 ```
-Or for a full restart with environment reload:
-```bash
-docker compose -f docker/docker-compose.yml down
-docker compose -f docker/docker-compose.yml up -d
-```
-#### 4. Check Status
-View the status of all running services:
-```bash
-docker compose -f docker/docker-compose.yml ps
-```
-Or use Docker commands:
-```bash
-docker ps
 ```
-#### 5. View Logs
-Tail logs from both services in real-time:
-```bash
-docker compose -f docker/docker-compose.yml logs -f
 ```
-View logs from a specific service:
-```bash
-docker compose -f docker/docker-compose.yml logs -f hopcroft-api
-docker compose -f docker/docker-compose.yml logs -f hopcroft-gui
-```
-| Flag | Description |
-|------|-------------|
-| `-f` | Follow log output (stream new logs) |
-| `--tail 100` | Show only last 100 lines: `docker-compose logs --tail 100` |
-#### 6. Execute Commands in Container
-Open an interactive shell inside a running container:
-```bash
-docker compose -f docker/docker-compose.yml exec hopcroft-api /bin/bash
-docker compose -f docker/docker-compose.yml exec hopcroft-gui /bin/bash
-```
-Examples of useful commands inside the API container:
 ```bash
-# Check installed packages
-pip list
-# Run Python interactively
-python
-# Check model file exists
-ls -la /app/models/
-# Verify environment variables
-printenv | grep MLFLOW
-```
-```
-### Architecture Overview
-**Docker Compose orchestrates two services:**
-```
-docker/docker-compose.yml
-├── hopcroft-api (FastAPI Backend)
-│   ├── Build: docker/Dockerfile
-│   ├── Port: 8080:8080
-│   ├── Network: hopcroft-net
-│   ├── Environment: .env (MLflow credentials)
-│   ├── Volumes:
-│   │   ├── ./hopcroft_skill_classification_tool_competition (hot reload)
-│   │   └── hopcroft-logs:/app/logs (persistent logs)
-│   └── Health Check: /health endpoint
-│
-├── hopcroft-gui (Streamlit Frontend)
-│   ├── Build: docker/Dockerfile.streamlit
-│   ├── Port: 8501:8501
-│   ├── Network: hopcroft-net
-│   ├── Environment: API_BASE_URL=http://hopcroft-api:8080
-│   ├── Volumes:
-│   │   └── ./hopcroft_skill_classification_tool_competition/streamlit_app.py (hot reload)
-│   └── Depends on: hopcroft-api (waits for health check)
-│
-└── hopcroft-net (bridge network)
-```
-**External Access:**
-- API: http://localhost:8080
-- GUI: http://localhost:8501
-**Internal Communication:**
-- GUI → API: http://hopcroft-api:8080 (via Docker network)
-### Services Description
-**hopcroft-api (FastAPI Backend)**
-- Purpose: FastAPI backend serving the ML model for skill classification
-- Image: Built from `docker/Dockerfile`
-- Port: 8080 (maps to host 8080)
-- Features:
-  - Random Forest model with embedding features
-  - MLflow experiment tracking
-  - Auto-reload in development mode
-  - Health check endpoint
-**hopcroft-gui (Streamlit Frontend)**
-- Purpose: Streamlit web interface for interactive predictions
-- Image: Built from `docker/Dockerfile.streamlit`
-- Port: 8501 (maps to host 8501)
-- Features:
-  - User-friendly interface for skill prediction
-  - Real-time communication with API
-  - Automatic reconnection on API restart
-  - Depends on API health before starting
-### Development vs Production
-**Development (default):**
-- Auto-reload enabled (`--reload`)
-- Source code mounted with bind mounts
-- Custom command with hot reload
-- GUI → API via Docker network
-**Production:**
-- Auto-reload disabled
-- Use built image only
-- Use Dockerfile's CMD
-- GUI → API via Docker network
-For **production deployment**, modify `docker/docker-compose.yml` to remove bind mounts and disable reload.
-### Troubleshooting
-#### Issue: GUI shows "API is not available"
-**Solution:**
-1. Wait 30-60 seconds for API to fully initialize and become healthy
-2. Refresh the GUI page (F5)
-3. Check API health: `curl http://localhost:8080/health`
-4. Check logs: `docker compose -f docker/docker-compose.yml logs hopcroft-api`
-#### Issue: "500 Internal Server Error" on predictions
-**Solution:**
-1. Verify MLflow credentials in `.env` are correct
-2. Restart services: `docker compose -f docker/docker-compose.yml down && docker compose -f docker/docker-compose.yml up -d`
-3. Check environment variables: `docker exec hopcroft-api printenv | grep MLFLOW`
-#### Issue: Changes to code not reflected
-**Solution:**
-- For Python code changes: Auto-reload is enabled, wait a few seconds
-- For Dockerfile changes: Rebuild with `docker compose -f docker/docker-compose.yml up -d --build`
-- For `.env` changes: Restart with `docker compose -f docker/docker-compose.yml down && docker compose -f docker/docker-compose.yml up -d`
-#### Issue: Port already in use
-**Solution:**
 ```bash
-# Check what's using the port
-netstat -ano | findstr :8080
-netstat -ano | findstr :8501
-# Stop existing containers
-docker compose -f docker/docker-compose.yml down
-# Or change ports in docker/docker-compose.yml
 ```
---------
-## Hugging Face Spaces Deployment
-This project is configured to run on [Hugging Face Spaces](https://huggingface.co/spaces) using Docker.
-### 1. Setup Space
-1. Create a new Space on Hugging Face.
-2. Select **Docker** as the SDK.
-3. Choose the **Blank** template or upload your code.
-### 2. Configure Secrets
-To enable the application to pull models from DagsHub via DVC, you must configure the following **Variables and Secrets** in your Space settings:
-| Name | Type | Description |
-|------|------|-------------|
-| `DAGSHUB_USERNAME` | Secret | Your DagsHub username. |
-| `DAGSHUB_TOKEN` | Secret | Your DagsHub access token (Settings -> Tokens). |
-> [!IMPORTANT]
-> These secrets are injected into the container at runtime. The `docker/scripts/start_space.sh` script uses them to authenticate DVC and pull the required model files (`.pkl`) before starting the API and GUI.
-### 3. Automated Startup
-The deployment follows this automated flow:
-1. **docker/Dockerfile**: Builds the environment, installs dependencies, and sets up Nginx.
-2. **docker/scripts/start_space.sh**:
-   - Configures DVC with your secrets.
-   - Pulls models from the DagsHub remote.
-   - Starts the **FastAPI** backend (port 8000).
-   - Starts the **Streamlit** frontend (port 8501).
-   - Starts **Nginx** (port 7860) as a reverse proxy to route traffic.
-### 4. Direct Access
-Once deployed, your Space will be available at:
-`https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft`
-The API documentation will be accessible at:
-`https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft/docs`
---------
-## Demo UI (Streamlit)
-The Streamlit GUI provides an interactive web interface for the skill classification API.
-### Features
-- Real-time skill prediction from GitHub issue text
-- Top-5 predicted skills with confidence scores
-- Full predictions table with all skills
-- API connection status indicator
-- Responsive design
-### Usage
-1. Ensure both services are running: `docker compose -f docker/docker-compose.yml up -d`
-2. Open the GUI in your browser: [http://localhost:8501](http://localhost:8501)
-3. Enter a GitHub issue description in the text area
-4. Click "Predict Skills" to get predictions
-5. View results in the predictions table
-### Architecture
-- **Frontend**: Streamlit (Python web framework)
-- **Communication**: HTTP requests to FastAPI backend via Docker network
-- **Independence**: GUI and API run in separate containers
-- **Auto-reload**: GUI code changes are reflected immediately (bind mount)
-> Both must run **simultaneously** in different terminals/containers.
-### Quick Start
-1. **Start the FastAPI backend:**
-   ```bash
-   fastapi dev hopcroft_skill_classification_tool_competition/main.py
-   ```
-2. **In a new terminal, start Streamlit:**
-   ```bash
-   streamlit run streamlit_app.py
-   ```
-3. **Open your browser:**
-   - Streamlit UI: http://localhost:8501
-   - FastAPI Docs: http://localhost:8000/docs
-### Features
-- Interactive web interface for skill prediction
-- Real-time predictions with confidence scores
-- Adjustable confidence threshold
-- Multiple input modes (quick/detailed/examples)
-- Visual result display
-- API health monitoring
-### Demo Walkthrough
-#### Main Dashboard
-![gui_main_dashboard](docs/img/gui_main_dashboard.png)
-The main interface provides:
-- **Sidebar**: API health status, confidence threshold slider, model info
-- **Three input modes**: Quick Input, Detailed Input, Examples
-#### Quick Input Mode
-![gui_quick_input](docs/img/gui_quick_input.png)
-Simply paste your GitHub issue text and click "Predict Skills"!
-#### Prediction Results
-![gui_detailed](docs/img/gui_detailed.png)
-View:
-- **Top predictions** with confidence scores
-- **Full predictions table** with filtering
-- **Processing metrics** (time, model version)
-- **Raw JSON response** (expandable)
-#### Detailed Input Mode
-![gui_detailed_input](docs/img/gui_detailed_input.png)
-Add optional metadata:
-- Repository name
-- PR number
-- Detailed description
-#### Example Gallery
-![gui_ex](docs/img/gui_ex.png)
-Test with pre-loaded examples:
-- Authentication bugs
-- ML features
-- Database issues
-- UI enhancements
-### Usage
-1. Enter GitHub issue/PR text in the input area
-2. (Optional) Add description, repo name, PR number
-3. Click "Predict Skills"
-4. View results with confidence scores
-5. Adjust threshold slider to filter predictions

 api_docs_url: /docs
 ---
+# Hopcroft Skill Classification
 [![CI Pipeline](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml/badge.svg)](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml)
+[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft)
+[![MLflow](https://img.shields.io/badge/MLflow-Tracking-blue)](https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow)
+**Multi-label skill classification for GitHub issues and pull requests** — Automatically identify technical skills required to resolve software issues using machine learning.
+---
+## Overview
+Hopcroft is an ML-enabled system that classifies GitHub issues into 217 technical skill categories, enabling automated developer assignment and optimized resource allocation. Built following professional MLOps and Software Engineering standards.
+### Key Features
+- 🎯 **Multi-label Classification**: Predict multiple skills per issue
+- 🚀 **REST API**: FastAPI with Swagger documentation
+- 🖥️ **Web Interface**: Streamlit GUI for interactive predictions
+- 📊 **Monitoring**: Prometheus/Grafana dashboards with drift detection
+- 🔄 **CI/CD**: GitHub Actions with Docker deployment
+- 📈 **Experiment Tracking**: MLflow on DagsHub
+---
+## Architecture
+```mermaid
+graph TB
+    subgraph "Data Layer"
+        A[(SkillScope DB)] --> B[Feature Engineering]
+        B --> C[TF-IDF / Embeddings]
+    end
+    subgraph "ML Pipeline"
+        C --> D[Model Training]
+        D --> E[(MLflow Tracking)]
+        D --> F[Random Forest Model]
+    end
+    subgraph "Serving Layer"
+        F --> G[FastAPI Service]
+        G --> H[/predict]
+        G --> I[/predictions]
+        G --> J[/health]
+    end
+    subgraph "Frontend"
+        G --> K[Streamlit GUI]
+    end
+    subgraph "Monitoring"
+        G --> L[Prometheus]
+        L --> M[Grafana]
+        N[Drift Detection] --> L
+    end
+    subgraph "Deployment"
+        O[GitHub Actions] --> P[Docker Build]
+        P --> Q[HF Spaces]
+    end
 ```
 ---
+## Documentation
+| Document | Description |
+|----------|-------------|
+| 📋 [Milestone Summaries](docs/milestone_summaries.md) | All 6 project phases documented |
+| 📖 [User Guide](docs/user_guide.md) | Setup, API, GUI, testing, monitoring |
+| 🏗️ [Design Choices](docs/design_choices.md) | Technical decisions & rationale |
+| 🎯 [ML Canvas](docs/ML%20Canvas.md) | Requirements engineering framework |
+| ✅ [Testing & Validation](docs/testing_and_validation.md) | QA strategy & results |
+---
+## Quick Start
+### Docker (Recommended)
 ```bash
+# Clone and configure
+git clone https://github.com/se4ai2526-uniba/Hopcroft.git
+cd Hopcroft
+cp .env.example .env
+# Edit .env with your DagsHub credentials
+# Start services
 docker compose -f docker/docker-compose.yml up -d --build
 ```
+**Access:**
+- 🌐 **API Docs**: http://localhost:8080/docs
+- 🖥️ **GUI**: http://localhost:8501
+- ❤️ **Health**: http://localhost:8080/health
+### Local Development
 ```bash
+# Setup environment
+python -m venv venv && source venv/bin/activate  # or venv\Scripts\activate on Windows
+pip install -r requirements.txt && pip install -e .
+# Start API
+make api-dev
+# Start GUI (new terminal)
+streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
 ```
+---
+## Project Structure
 ```
+├── hopcroft_skill_classification_tool_competition/
+│   ├── main.py              # FastAPI application
+│   ├── streamlit_app.py     # Streamlit GUI
+│   ├── features.py          # Feature engineering
+│   ├── modeling/            # Training & prediction
+│   └── config.py            # Configuration
+├── data/                    # DVC-tracked datasets
+├── models/                  # DVC-tracked models
+├── tests/                   # Pytest test suites
+├── monitoring/              # Prometheus, Grafana, Locust
+├── docker/                  # Docker configurations
+├── docs/                    # Documentation
+└── .github/workflows/       # CI/CD pipelines
 ```
+---
+## API Endpoints
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| `POST` | `/predict` | Classify single issue |
+| `POST` | `/predict/batch` | Batch classification |
+| `GET` | `/predictions` | List recent predictions |
+| `GET` | `/predictions/{id}` | Get by MLflow run ID |
+| `GET` | `/health` | Health check |
+| `GET` | `/metrics` | Prometheus metrics |
+**Example:**
 ```bash
+curl -X POST "http://localhost:8080/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"issue_text": "Fix OAuth2 authentication bug"}'
+```
+---
+## Live Deployment
+- **Application**: https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft
+- **API Docs**: https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft/docs
+- **MLflow**: https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow
+---
+## Development
 ```bash
+# Run tests
+make test-all              # All tests
+make test-behavioral       # ML behavioral tests
+make validate-deepchecks   # Data validation
+# Lint & format
+make lint                  # Check code style
+make format                # Auto-fix issues
+# Training
+make train-baseline-tfidf  # Train baseline model
 ```
+---
+## License
+This project was developed as part of the SE4AI 2025-26 course at the University of Bari.

docs/README.md CHANGED Viewed

@@ -1,12 +1,24 @@
-Generating the docs
-----------
-Use [mkdocs](http://www.mkdocs.org/) structure to update the documentation.
-Build locally with:
-    mkdocs build
-Serve locally with:
-    mkdocs serve

+# Documentation
+This directory contains comprehensive documentation for the Hopcroft Skill Classification system.
+## Contents
+| Document | Description |
+|----------|-------------|
+| [Milestone Summaries](milestone_summaries.md) | Overview of all 6 project development phases |
+| [User Guide](user_guide.md) | Setup, API, GUI, load testing, and monitoring instructions |
+| [Design Choices](design_choices.md) | Technical justifications and architectural decisions |
+| [ML Canvas](ML%20Canvas.md) | Machine Learning Canvas requirements framework |
+| [Testing & Validation](testing_and_validation.md) | QA strategy with test results and commands |
+## Quick Links
+- **Getting Started**: See [User Guide - System Setup](user_guide.md#1-system-setup)
+- **API Reference**: See [User Guide - API Usage](user_guide.md#2-api-usage)
+- **Architecture**: See [Design Choices](design_choices.md)
+- **Project History**: See [Milestone Summaries](milestone_summaries.md)
+## Images
+The `img/` directory contains screenshots for GUI documentation.

docs/design_choices.md ADDED Viewed

	@@ -0,0 +1,487 @@

+# Design Choices
+Technical justification of the architectural and engineering decisions made during the Hopcroft project development, following professional MLOps and Software Engineering standards.
+---
+## Table of Contents
+1. [Inception (Requirements Engineering)](#1-inception-requirements-engineering)
+2. [Reproducibility (Versioning & Pipelines)](#2-reproducibility-versioning--pipelines)
+3. [Quality Assurance](#3-quality-assurance)
+4. [API (Inference Service)](#4-api-inference-service)
+5. [Deployment (Containerization & CI/CD)](#5-deployment-containerization--cicd)
+6. [Monitoring](#6-monitoring)
+---
+## 1. Inception (Requirements Engineering)
+### Machine Learning Canvas
+The project adopted the **Machine Learning Canvas** framework to systematically define the problem space before implementation. This structured approach ensures alignment between business objectives and technical solutions.
+| Canvas Section | Application |
+|----------------|-------------|
+| **Prediction Task** | Multi-label classification of 217 technical skills from GitHub issue text |
+| **Decisions** | Automated developer assignment based on predicted skill requirements |
+| **Value Proposition** | Reduced issue resolution time, optimized resource allocation |
+| **Data Sources** | SkillScope DB (7,245 PRs from 11 Java repositories) |
+| **Making Predictions** | Real-time classification upon issue creation |
+| **Building Models** | Iterative improvement over RF+TF-IDF baseline |
+| **Monitoring** | Continuous evaluation with drift detection |
+The complete ML Canvas is documented in [ML Canvas.md](./ML%20Canvas.md).
+### Functional vs Non-Functional Requirements
+#### Functional Requirements
+| Requirement | Target | Metric |
+|-------------|--------|--------|
+| **Precision** | ≥ Baseline | True positives / Predicted positives |
+| **Recall** | ≥ Baseline | True positives / Actual positives |
+| **Micro-F1** | > Baseline | Harmonic mean across all labels |
+| **Multi-label Support** | 217 skills | Simultaneous prediction of multiple labels |
+#### Non-Functional Requirements
+| Category | Requirement | Implementation |
+|----------|-------------|----------------|
+| **Reproducibility** | Auditable experiments | MLflow tracking, DVC versioning |
+| **Explainability** | Interpretable predictions | Confidence scores per skill |
+| **Performance** | Low latency inference | FastAPI async, model caching |
+| **Scalability** | Batch processing | `/predict/batch` endpoint (max 100) |
+| **Maintainability** | Clean code | Ruff linting, type hints, docstrings |
+### System-First vs Model-First Development
+The project adopted a **System-First** approach, prioritizing infrastructure and pipeline development before model optimization:
+```
+Timeline:
+┌─────────────────────────────────────────────────────────────┐
+│ Phase 1: Infrastructure │ Phase 2: Model Development        │
+│ - DVC/MLflow setup      │ - Feature engineering              │
+│ - CI/CD pipeline        │ - Hyperparameter tuning            │
+│ - Docker containers     │ - SMOTE/ADASYN experiments         │
+│ - API skeleton          │ - Performance optimization         │
+└─────────────────────────────────────────────────────────────┘
+```
+**Rationale:**
+- Enables rapid iteration once infrastructure is stable
+- Ensures reproducibility from day one
+- Reduces technical debt during model development
+- Facilitates team collaboration with shared tooling
+---
+## 2. Reproducibility (Versioning & Pipelines)
+### Code Versioning (Git)
+Standard Git workflow with branch protection:
+| Branch | Purpose |
+|--------|---------|
+| `main` | Production-ready code |
+| `feature/*` | New development |
+| `milestone/*` | Grouping all features before merging into main |
+### Data & Model Versioning (DVC)
+**Design Decision:** Use DVC (Data Version Control) with DagsHub remote storage for large file management.
+```
+.dvc/config
+├── remote: origin
+├── url: https://dagshub.com/se4ai2526-uniba/Hopcroft.dvc
+└── auth: basic (credentials via environment)
+```
+**Tracked Artifacts:**
+| File | Purpose |
+|------|---------|
+| `data/raw/skillscope_data.db` | Original SQLite database |
+| `data/processed/*.npy` | TF-IDF and embedding features |
+| `models/*.pkl` | Trained models and vectorizers |
+**Versioning Workflow:**
+```bash
+# Track new data
+dvc add data/raw/new_dataset.db
+git add data/raw/.gitignore data/raw/new_dataset.db.dvc
+# Push to remote
+dvc push
+git commit -m "Add new dataset version"
+git push
+```
+### Experiment Tracking (MLflow)
+**Design Decision:** Remote MLflow instance on DagsHub for collaborative experiment tracking.
+| Configuration | Value |
+|---------------|-------|
+| Tracking URI | `https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow` |
+| Experiments | `skill_classification`, `skill_prediction_api` |
+**Logged Metrics:**
+- Training: precision, recall, F1-score, training time
+- Inference: prediction latency, confidence scores, timestamps
+**Artifact Storage:**
+- Model binaries (`.pkl`)
+- Vectorizers and scalers
+- Hyperparameter configurations
+### Auditable ML Pipeline
+The pipeline is designed for complete reproducibility:
+```
+┌──────────────┐    ┌──────────────┐    ┌──────────────┐
+│   dataset.py │───▶│  features.py │───▶│   train.py   │
+│   (DVC pull) │    │  (TF-IDF)    │    │  (MLflow)    │
+└──────────────┘    └──────────────┘    └──────────────┘
+       │                   │                   │
+       ▼                   ▼                   ▼
+    .dvc files         .dvc files          MLflow Run
+```
+---
+## 3. Quality Assurance
+### Testing Strategy
+#### Static Analysis (Ruff)
+**Design Decision:** Use Ruff as the primary linter for speed and comprehensive rule coverage.
+| Configuration | Value |
+|---------------|-------|
+| Line Length | 88 (Black compatible) |
+| Target Python | 3.10+ |
+| Rule Sets | PEP 8, isort, pyflakes |
+**CI Integration:**
+```yaml
+- name: Lint with Ruff
+  run: make lint
+```
+#### Dynamic Testing (Pytest)
+**Test Organization:**
+```
+tests/
+├── unit/              # Isolated function tests
+├── integration/       # Component interaction tests
+├── system/            # End-to-end tests
+├── behavioral/        # ML-specific tests
+├── deepchecks/        # Data validation
+└── great expectations/ # Schema validation
+```
+**Markers for Selective Execution:**
+```python
+@pytest.mark.unit
+@pytest.mark.integration
+@pytest.mark.system
+@pytest.mark.slow
+```
+### Model Validation vs Model Verification
+| Concept | Definition | Implementation |
+|---------|------------|----------------|
+| **Validation** | Does the model fit user needs? | Micro-F1 vs baseline comparison |
+| **Verification** | Is the model correctly built? | Unit tests, behavioral tests |
+### Behavioral Testing
+**Design Decision:** Implement CheckList-inspired behavioral tests to evaluate model robustness beyond accuracy metrics.
+| Test Type | Count | Purpose |
+|-----------|-------|---------|
+| **Invariance** | 9 | Stability under perturbations (typos, case changes) |
+| **Directional** | 10 | Expected behavior with keyword additions |
+| **Minimum Functionality** | 17 | Basic sanity checks on clear examples |
+**Example Invariance Test:**
+```python
+def test_case_insensitivity():
+    """Model should predict same skills regardless of case."""
+    assert predict("Fix BUG") == predict("fix bug")
+```
+### Data Quality Checks
+#### Great Expectations (10 Tests)
+**Design Decision:** Validate data at pipeline boundaries to catch quality issues early.
+| Validation Point | Tests |
+|------------------|-------|
+| Raw Database | Schema, row count, required columns |
+| Feature Matrix | No NaN/Inf, sparsity, SMOTE compatibility |
+| Label Matrix | Binary format, distribution, consistency |
+| Train/Test Split | No leakage, stratification |
+#### Deepchecks (24 Checks)
+**Suites:**
+- **Data Integrity Suite** (12 checks): Duplicates, nulls, correlations
+- **Train-Test Validation Suite** (12 checks): Leakage, drift, distribution
+**Status:** Production-ready (96% overall score)
+---
+## 4. API (Inference Service)
+### FastAPI Implementation
+**Design Decision:** Use FastAPI for async request handling, automatic OpenAPI generation, and native Pydantic validation.
+**Key Features:**
+- Async lifespan management for model loading
+- Middleware for Prometheus metrics collection
+- Structured exception handling
+### RESTful Principles
+**Design Decision:** Follow REST best practices for intuitive API design.
+| Principle | Implementation |
+|-----------|----------------|
+| **Nouns, not verbs** | `/predictions` instead of `/getPrediction` |
+| **Plural resources** | `/predictions`, `/issues` |
+| **HTTP methods** | GET (retrieve), POST (create) |
+| **Status codes** | 200 (OK), 201 (Created), 404 (Not Found), 500 (Error) |
+**Endpoint Design:**
+| Method | Endpoint | Action |
+|--------|----------|--------|
+| `POST` | `/predict` | Create new prediction |
+| `POST` | `/predict/batch` | Create batch predictions |
+| `GET` | `/predictions` | List predictions |
+| `GET` | `/predictions/{run_id}` | Get specific prediction |
+### OpenAPI/Swagger Documentation
+**Auto-generated documentation at runtime:**
+- Swagger UI: `/docs`
+- ReDoc: `/redoc`
+- OpenAPI JSON: `/openapi.json`
+**Pydantic Models for Schema Enforcement:**
+```python
+class IssueInput(BaseModel):
+    issue_text: str
+    repo_name: Optional[str] = None
+    pr_number: Optional[int] = None
+class PredictionResponse(BaseModel):
+    run_id: str
+    predictions: List[SkillPrediction]
+    model_version: str
+```
+---
+## 5. Deployment (Containerization & CI/CD)
+### Docker Containerization
+**Design Decision:** Multi-stage Docker builds with security best practices.
+**Dockerfile Features:**
+- Python 3.10 slim base image (minimal footprint)
+- Non-root user for security
+- DVC integration for model pulling
+- Health check endpoint configuration
+**Multi-Service Architecture:**
+```
+docker-compose.yml
+├── hopcroft-api (FastAPI)
+│   ├── Port: 8080
+│   ├── Volumes: source code, logs
+│   └── Health check: /health
+│
+├── hopcroft-gui (Streamlit)
+│   ├── Port: 8501
+│   ├── Depends on: hopcroft-api
+│   └── Environment: API_BASE_URL
+│
+└── hopcroft-net (Bridge network)
+```
+**Design Rationale:**
+- Separation of concerns (API vs GUI)
+- Independent scaling
+- Health-based dependency management
+- Shared network for internal communication
+### CI/CD Pipeline (GitHub Actions)
+**Design Decision:** Implement Continuous Delivery for ML (CD4ML) with automated testing and image builds.
+**Pipeline Stages:**
+```yaml
+Jobs:
+  unit-tests:
+    - Checkout code
+    - Setup Python 3.10
+    - Install dependencies
+    - Ruff linting
+    - Pytest unit tests
+    - Upload test report (on failure)
+  build-image:
+    - Needs: unit-tests
+    - Configure DVC credentials
+    - Pull models
+    - Build Docker image
+```
+**Triggers:**
+- Push to `main`, `feature/*`
+- Pull requests to `main`
+**Secrets Management:**
+- `DAGSHUB_USERNAME`: DagsHub authentication
+- `DAGSHUB_TOKEN`: DagsHub access token
+### Hugging Face Spaces Hosting
+**Design Decision:** Deploy on HF Spaces for free GPU-enabled hosting with Docker SDK support.
+**Configuration:**
+```yaml
+---
+title: Hopcroft Skill Classification
+sdk: docker
+app_port: 7860
+---
+```
+**Startup Flow:**
+1. `start_space.sh` configures DVC credentials
+2. Pull models from DagsHub
+3. Start FastAPI (port 8000)
+4. Start Streamlit (port 8501)
+5. Start Nginx (port 7860) for routing
+**Nginx Reverse Proxy:**
+- `/` → Streamlit GUI
+- `/docs`, `/predict`, `/predictions` → FastAPI
+- `/prometheus` → Prometheus metrics
+---
+## 6. Monitoring
+### Resource-Level Monitoring
+**Design Decision:** Implement Prometheus metrics for real-time observability.
+| Metric | Type | Purpose |
+|--------|------|---------|
+| `hopcroft_requests_total` | Counter | Request volume by endpoint |
+| `hopcroft_request_duration_seconds` | Histogram | Latency distribution (P50, P90, P99) |
+| `hopcroft_in_progress_requests` | Gauge | Concurrent request load |
+| `hopcroft_prediction_processing_seconds` | Summary | Model inference time |
+**Middleware Implementation:**
+```python
+@app.middleware("http")
+async def monitor_requests(request, call_next):
+    IN_PROGRESS.inc()
+    with REQUEST_LATENCY.labels(method, endpoint).time():
+        response = await call_next(request)
+    REQUESTS_TOTAL.labels(method, endpoint, status).inc()
+    IN_PROGRESS.dec()
+    return response
+```
+### Performance-Level Monitoring
+**Model Staleness Indicators:**
+- Prediction confidence trends over time
+- Drift detection alerts
+- Error rate monitoring
+### Drift Detection Strategy
+**Design Decision:** Implement statistical drift detection using Kolmogorov-Smirnov test with Bonferroni correction.
+| Component | Details |
+|-----------|---------|
+| **Algorithm** | KS Two-Sample Test |
+| **Baseline** | 1000 samples from training data |
+| **Threshold** | p-value < 0.05 (Bonferroni corrected) |
+| **Execution** | Scheduled via cron or manual trigger |
+**Drift Types Monitored:**
+| Type | Definition | Detection Method |
+|------|------------|------------------|
+| **Data Drift** | Feature distribution shift | KS test on input features |
+| **Target Drift** | Label distribution shift | Chi-square test on predictions |
+| **Concept Drift** | Relationship change | Performance degradation monitoring |
+**Metrics Published to Pushgateway:**
+- `drift_detected`: Binary indicator (0/1)
+- `drift_p_value`: Statistical significance
+- `drift_distance`: KS distance metric
+- `drift_check_timestamp`: Last check time
+### Alerting Configuration
+**Prometheus Alert Rules:**
+| Alert | Condition | Severity |
+|-------|-----------|----------|
+| `ServiceDown` | Target down for 5m | Critical |
+| `HighErrorRate` | 5xx rate > 10% | Warning |
+| `SlowRequests` | P95 latency > 2s | Warning |
+| `DriftDetected` | drift_detected = 1 | Warning |
+**Alertmanager Integration:**
+- Severity-based routing
+- Email notifications
+- Inhibition rules to prevent alert storms
+### Grafana Visualization
+**Dashboard Panels:**
+1. API Request Rate (time series)
+2. API Latency Percentiles (heatmap)
+3. Drift Detection Status (stat panel)
+4. Drift P-Value Trend (time series)
+5. Error Rate (gauge)
+**Data Sources:**
+- Prometheus: Real-time metrics
+- Pushgateway: Batch job metrics (drift detection)
+### HF Spaces Deployment
+Both Prometheus and Grafana are deployed on Hugging Face Spaces via Nginx reverse proxy:
+| Service | Production URL |
+|---------|----------------|
+| Prometheus | `https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/` |
+| Grafana | `https://dacrow13-hopcroft-skill-classification.hf.space/grafana/` |
+This enables real-time monitoring of the production deployment without additional infrastructure.

docs/docs/getting-started.md DELETED Viewed

@@ -1,6 +0,0 @@
-Getting started
-===============
-This is where you describe how to get set up on a clean install, including the
-commands necessary to get the raw data (using the `sync_data_from_s3` command,
-for example), and then how to make the cleaned, final data sets.

docs/docs/index.md DELETED Viewed

@@ -1,10 +0,0 @@
-# Hopcroft_Skill-Classification-Tool-Competition documentation!
-## Description
-The task involves analyzing the relationship between issue characteristics and required skills, developing effective feature extraction methods that combine textual and code-context information, and implementing sophisticated multi-label classification approaches. Students may incorporate additional GitHub metadata to enhance model inputs, but must avoid using third-party classification engines or direct outputs from the provided database. The work requires careful attention to the multi-label nature of the problem, where each issue may require multiple different skills for resolution.
-## Commands
-The Makefile contains the central entry points for common tasks related to this project.

docs/milestone_summaries.md ADDED Viewed

	@@ -0,0 +1,288 @@

+# Milestone Summaries
+This document provides a comprehensive overview of all six project milestones, documenting the evolution of the Hopcroft Skill Classification system from requirements engineering through production monitoring.
+---
+## Milestone 1: Requirements Engineering
+**Objective:** Define the problem space, stakeholders, and success criteria using the Machine Learning Canvas framework.
+### Key Deliverables
+| Component | Description |
+|-----------|-------------|
+| **Prediction Task** | Multi-label classification of 217 technical skills from GitHub issue/PR text |
+| **Stakeholders** | Project managers, team leads, developers |
+| **Data Source** | SkillScope DB with 7,245 merged PRs from 11 Java repositories |
+| **Success Metrics** | Micro-F1 score improvement over baseline, precision/recall balance |
+### ML Canvas Framework
+The complete ML Canvas is documented in [ML Canvas.md](./ML%20Canvas.md), covering:
+- **Value Proposition**: Automated task assignment optimization
+- **Decisions**: Resource allocation for issue resolution
+- **Data Collection**: Automated labeling via API call detection
+- **Impact Simulation**: Outperform SkillScope RF + TF-IDF baseline
+- **Monitoring**: Continuous evaluation with drift detection
+### Identified Risks & Mitigations
+| Risk | Mitigation Strategy |
+|------|---------------------|
+| Label imbalance (217 classes) | SMOTE, MLSMOTE, ADASYN oversampling |
+| Text noise (URLs, HTML, code) | Custom preprocessing pipeline |
+| Multi-label complexity | MultiOutputClassifier with stratified splits |
+---
+## Milestone 2: Data Management & Experiment Tracking
+**Objective:** Establish end-to-end infrastructure for reproducible ML experiments.
+### Data Pipeline
+```
+data/raw/           → dataset.py       → data/processed/
+(SkillScope SQLite)   (HuggingFace)       (Clean CSV)
+                           ↓
+                      features.py
+                           ↓
+                    data/processed/
+                    (TF-IDF/Embeddings)
+```
+### Key Components
+1. **Data Management**
+   - DVC setup with DagsHub remote storage
+   - Git-ignored data and model directories
+   - Version-controlled `.dvc` files for reproducibility
+2. **Data Ingestion**
+   - `dataset.py`: Downloads SkillScope from Hugging Face
+   - Extracts SQLite database with cleanup
+3. **Feature Engineering**
+   - `features.py`: Text cleaning pipeline
+     - URL/HTML/Markdown removal
+     - Normalization and Porter stemming
+     - TF-IDF vectorization (uni+bi-grams)
+     - Sentence embedding generation
+4. **Configuration**
+   - `config.py`: Centralized paths, hyperparameters, MLflow URI
+5. **Experiment Tracking**
+   - MLflow with DagsHub remote
+   - Logged metrics: precision, recall, F1-score
+   - Artifact storage: models, vectorizers, scalers
+### Training Actions
+| Action | Description |
+|--------|-------------|
+| `baseline` | Random Forest with TF-IDF |
+| `mlsmote` | Multi-label SMOTE oversampling |
+| `ros` | Random Oversampling |
+| `adasyn-pca` | ADASYN + PCA dimensionality reduction |
+| `lightgbm` | LightGBM classifier |
+---
+## Milestone 3: Quality Assurance
+**Objective:** Implement comprehensive testing and validation framework for data quality and model robustness.
+### Data Cleaning Pipeline
+| Metric | Before | After | Resolution |
+|--------|--------|-------|------------|
+| Total Samples | 7,154 | 6,673 | -481 duplicates |
+| Duplicates | 481 | 0 | Exact match removal |
+| Label Conflicts | 640 | 0 | Majority voting |
+| Data Leakage | Present | 0 | Train/test separation |
+### Validation Frameworks
+#### Great Expectations (10 Tests)
+| Test | Purpose | Status |
+|------|---------|--------|
+| Database Schema | Validate SQLite structure | ✅ Pass |
+| TF-IDF Matrix | No NaN/Inf, sparsity checks | ✅ Pass |
+| Binary Labels | Values in {0,1} | ✅ Pass |
+| Feature-Label Alignment | Row count consistency | ✅ Pass |
+| Label Distribution | Min 5 occurrences per label | ✅ Pass |
+| SMOTE Compatibility | Min 10 non-zero features | ✅ Pass |
+| Multi-Output Format | >50% multi-label samples | ✅ Pass |
+| Duplicate Detection | No duplicate features | ✅ Pass |
+| Train-Test Separation | Zero intersection | ✅ Pass |
+| Label Consistency | Same features → same labels | ✅ Pass |
+#### Deepchecks (24 Checks)
+- **Data Integrity Suite**: 92% score (12 checks)
+- **Train-Test Validation Suite**: 100% score (12 checks)
+- **Overall Status**: Production-ready (96% combined)
+#### Behavioral Testing (36 Tests)
+| Category | Tests | Description |
+|----------|-------|-------------|
+| Invariance | 9 | Typo, case, punctuation robustness |
+| Directional | 10 | Keyword addition effects |
+| Minimum Functionality | 17 | Basic skill predictions |
+### Code Quality
+- **Ruff Analysis**: 28 minor issues (100% fixable)
+- **Standards**: PEP 8 compliant, Black compatible
+Full details: [testing_and_validation.md](./testing_and_validation.md)
+---
+## Milestone 4: API Development
+**Objective:** Implement production-ready REST API for skill prediction with MLflow integration.
+### Endpoints
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| `POST` | `/predict` | Single issue prediction |
+| `POST` | `/predict/batch` | Batch predictions (max 100) |
+| `GET` | `/predictions/{run_id}` | Retrieve by MLflow Run ID |
+| `GET` | `/predictions` | List recent predictions |
+| `GET` | `/health` | Service health check |
+| `GET` | `/metrics` | Prometheus metrics |
+### Features
+- **FastAPI Framework**: Async request handling, auto-generated OpenAPI docs
+- **MLflow Integration**: All predictions logged with metadata
+- **Pydantic Validation**: Request/response schema enforcement
+- **Prometheus Metrics**: Request counters, latency histograms, gauges
+### Documentation Access
+- Swagger UI: `/docs`
+- ReDoc: `/redoc`
+- OpenAPI JSON: `/openapi.json`
+---
+## Milestone 5: Deployment & Containerization
+**Objective:** Implement containerized deployment with CI/CD pipeline for production delivery.
+### Docker Architecture
+```
+docker/docker-compose.yml
+├── hopcroft-api (FastAPI Backend)
+│   ├── Port: 8080
+│   ├── Health Check: /health
+│   └── Volumes: source code, logs
+│
+├── hopcroft-gui (Streamlit Frontend)
+│   ├── Port: 8501
+│   └── Depends on: hopcroft-api
+│
+└── hopcroft-net (Bridge Network)
+```
+### Hugging Face Spaces Deployment
+| Component | Configuration |
+|-----------|---------------|
+| SDK | Docker |
+| Port | 7860 |
+| Startup Script | `docker/scripts/start_space.sh` |
+| Secrets | `DAGSHUB_USERNAME`, `DAGSHUB_TOKEN` |
+**Startup Flow:**
+1. Configure DVC with secrets
+2. Pull models from DagsHub
+3. Start FastAPI (port 8000)
+4. Start Streamlit (port 8501)
+5. Start Nginx reverse proxy (port 7860)
+### CI/CD Pipeline (GitHub Actions)
+```yaml
+Triggers: push/PR to main, feature/*
+Jobs:
+  1. unit-tests
+     - Ruff linting
+     - Pytest unit tests
+     - HTML report generation
+  2. build-image (requires unit-tests)
+     - DVC model pull
+     - Docker image build
+```
+---
+## Milestone 6: Monitoring & Observability
+**Objective:** Implement comprehensive monitoring infrastructure with drift detection.
+### Prometheus Metrics
+| Metric | Type | Description |
+|--------|------|-------------|
+| `hopcroft_requests_total` | Counter | Total requests by method/endpoint |
+| `hopcroft_request_duration_seconds` | Histogram | Request latency distribution |
+| `hopcroft_in_progress_requests` | Gauge | Currently processing requests |
+| `hopcroft_prediction_processing_seconds` | Summary | Model inference time |
+### Grafana Dashboards
+- **API Request Rate**: Real-time requests per second
+- **API Latency**: P50, P90, P99 percentiles
+- **Drift Detection Status**: Binary indicator (0/1)
+- **Drift P-Value**: Statistical significance metric
+### Data Drift Detection
+| Component | Details |
+|-----------|---------|
+| Algorithm | Kolmogorov-Smirnov Two-Sample Test |
+| Baseline | 1000 samples from training data |
+| Threshold | p-value < 0.05 (Bonferroni corrected) |
+| Metrics | `drift_detected`, `drift_p_value`, `drift_distance` |
+### Alerting Rules
+| Alert | Condition |
+|-------|-----------|
+| `ServiceDown` | Target unreachable for 5m |
+| `HighErrorRate` | 5xx rate > 10% for 5m |
+| `SlowRequests` | P95 latency > 2s |
+### Load Testing (Locust)
+| Task | Weight | Endpoint |
+|------|--------|----------|
+| Single Prediction | 60% | `POST /predict` |
+| Batch Prediction | 20% | `POST /predict/batch` |
+| Monitoring | 20% | `GET /health`, `/predictions` |
+### HF Spaces Monitoring Access
+Both Prometheus and Grafana are available on the production deployment:
+| Service | URL |
+|---------|-----|
+| Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ |
+| Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ |
+### Uptime Monitoring (Better Stack)
+- External monitoring from multiple locations
+- Email notifications on failures
+- Tracked endpoints: `/health`, `/openapi.json`, `/docs`

docs/user_guide.md ADDED Viewed

	@@ -0,0 +1,497 @@

+# User Guide
+Complete operational guide for the Hopcroft Skill Classification system covering all components: API, GUI, load testing, and monitoring.
+---
+## Table of Contents
+1. [System Setup](#1-system-setup)
+2. [API Usage](#2-api-usage)
+3. [GUI (Streamlit)](#3-gui-streamlit)
+4. [Load Testing (Locust)](#4-load-testing-locust)
+5. [Monitoring (Prometheus & Grafana)](#5-monitoring-prometheus--grafana)
+---
+## 1. System Setup
+### Prerequisites
+| Requirement | Version | Purpose |
+|-------------|---------|---------|
+| Python | 3.10+ | Runtime environment |
+| Docker | 20.10+ | Containerization |
+| Docker Compose | 2.0+ | Multi-service orchestration |
+| Git | 2.30+ | Version control |
+### Option A: Docker Setup (Recommended)
+**1. Clone and Configure**
+```bash
+git clone https://github.com/se4ai2526-uniba/Hopcroft.git
+cd Hopcroft
+# Create environment file
+cp .env.example .env
+```
+**2. Edit `.env` with Your Credentials**
+```env
+MLFLOW_TRACKING_URI=https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow
+MLFLOW_TRACKING_USERNAME=your_dagshub_username
+MLFLOW_TRACKING_PASSWORD=your_dagshub_token
+```
+> [!TIP]
+> Get your DagsHub token at: https://dagshub.com/user/settings/tokens
+**3. Start All Services**
+```bash
+docker compose -f docker/docker-compose.yml up -d --build
+```
+**4. Verify Services**
+| Service | URL | Purpose |
+|---------|-----|---------|
+| API (Swagger) | http://localhost:8080/docs | Interactive API documentation |
+| GUI (Streamlit) | http://localhost:8501 | Web interface |
+| Health Check | http://localhost:8080/health | Service status |
+### Option B: Virtual Environment Setup
+**1. Create Virtual Environment**
+```bash
+python -m venv venv
+# Windows
+venv\Scripts\activate
+# Linux/macOS
+source venv/bin/activate
+```
+**2. Install Dependencies**
+```bash
+pip install -r requirements.txt
+pip install -e .
+```
+**3. Configure DVC (for Model Access)**
+```bash
+dvc remote modify origin --local auth basic
+dvc remote modify origin --local user YOUR_DAGSHUB_USERNAME
+dvc remote modify origin --local password YOUR_DAGSHUB_TOKEN
+dvc pull
+```
+**4. Start Services Manually**
+```bash
+# Terminal 1: Start API
+make api-dev
+# Terminal 2: Start Streamlit
+streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
+```
+### Docker Compose Commands Reference
+| Command | Description |
+|---------|-------------|
+| `docker compose -f docker/docker-compose.yml up -d` | Start in background |
+| `docker compose -f docker/docker-compose.yml down` | Stop all services |
+| `docker compose -f docker/docker-compose.yml logs -f` | Stream logs |
+| `docker compose -f docker/docker-compose.yml ps` | Check status |
+| `docker compose -f docker/docker-compose.yml restart` | Restart services |
+---
+## 2. API Usage
+### Base URLs
+| Environment | URL |
+|-------------|-----|
+| Local (Docker) | http://localhost:8080 |
+| Local (Dev) | http://localhost:8000 |
+| Production (HF Spaces) | https://se4ai2526-uniba-hopcroft.hf.space |
+### Endpoints Overview
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| `POST` | `/predict` | Predict skills for single issue |
+| `POST` | `/predict/batch` | Batch prediction (max 100) |
+| `GET` | `/predictions` | List recent predictions |
+| `GET` | `/predictions/{run_id}` | Get prediction by ID |
+| `GET` | `/health` | Health check |
+| `GET` | `/metrics` | Prometheus metrics |
+### Interactive Documentation
+Access Swagger UI for interactive testing:
+- **Swagger**: http://localhost:8080/docs
+- **ReDoc**: http://localhost:8080/redoc
+### Example Requests
+#### Single Prediction
+```bash
+curl -X POST "http://localhost:8080/predict" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "issue_text": "Fix authentication bug in OAuth2 login flow",
+    "repo_name": "my-project",
+    "pr_number": 42
+  }'
+```
+**Response:**
+```json
+{
+  "run_id": "abc123...",
+  "predictions": [
+    {"skill": "authentication", "confidence": 0.92},
+    {"skill": "security", "confidence": 0.78},
+    {"skill": "oauth", "confidence": 0.65}
+  ],
+  "model_version": "1.0.0",
+  "timestamp": "2025-01-05T15:00:00Z"
+}
+```
+#### Batch Prediction
+```bash
+curl -X POST "http://localhost:8080/predict/batch" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "issues": [
+      {"issue_text": "Database connection timeout"},
+      {"issue_text": "UI button not responding"}
+    ]
+  }'
+```
+#### List Predictions
+```bash
+curl "http://localhost:8080/predictions?limit=10&skip=0"
+```
+#### Health Check
+```bash
+curl "http://localhost:8080/health"
+```
+**Response:**
+```json
+{
+  "status": "healthy",
+  "model_loaded": true,
+  "model_version": "1.0.0"
+}
+```
+### Makefile Shortcuts
+```bash
+make test-api-health      # Test health endpoint
+make test-api-predict     # Test prediction
+make test-api-list        # List predictions
+make test-api-all         # Run all API tests
+```
+---
+## 3. GUI (Streamlit)
+### Access Points
+| Environment | URL |
+|-------------|-----|
+| Local (Docker) | http://localhost:8501 |
+| Production | https://se4ai2526-uniba-hopcroft.hf.space |
+### Features
+- **Real-time Prediction**: Instant skill classification
+- **Confidence Scores**: Probability for each predicted skill
+- **Multiple Input Modes**: Quick input, detailed input, examples
+- **API Health Indicator**: Connection status in sidebar
+### User Interface
+#### Main Dashboard
+![Main Dashboard](./img/gui_main_dashboard.png)
+The sidebar displays:
+- API connection status
+- Confidence threshold slider
+- Model information
+#### Quick Input Mode
+![Quick Input](./img/gui_quick_input.png)
+1. Paste GitHub issue text
+2. Click "Predict Skills"
+3. View results instantly
+#### Detailed Input Mode
+![Detailed Input](./img/gui_detailed_input.png)
+Optional metadata fields:
+- Repository name
+- PR number
+- Extended description
+#### Prediction Results
+![Results](./img/gui_detailed.png)
+Results display:
+- Top-5 predicted skills with confidence bars
+- Full predictions table with filtering
+- Processing time metrics
+- Raw JSON response (expandable)
+#### Example Gallery
+![Examples](./img/gui_ex.png)
+Pre-loaded test cases:
+- Authentication bugs
+- ML feature requests
+- Database issues
+- UI enhancements
+---
+## 4. Load Testing (Locust)
+### Installation
+```bash
+pip install locust
+```
+### Configuration
+The Locust configuration is in `monitoring/locust/locustfile.py`:
+| Task | Weight | Endpoint |
+|------|--------|----------|
+| Single Prediction | 60% (weight: 3) | `POST /predict` |
+| Batch Prediction | 20% (weight: 1) | `POST /predict/batch` |
+| Monitoring | 20% (weight: 1) | `GET /health`, `/predictions` |
+### Running Load Tests
+#### Web UI Mode
+```bash
+cd monitoring/locust
+locust
+```
+Then open: http://localhost:8089
+Configure in the Web UI:
+- **Number of users**: Total concurrent users
+- **Spawn rate**: Users per second to add
+- **Host**: Target URL (e.g., `http://localhost:8080`)
+#### Headless Mode
+```bash
+locust --headless \
+  --users 50 \
+  --spawn-rate 10 \
+  --run-time 5m \
+  --host http://localhost:8080 \
+  --csv results
+```
+### Target URLs
+| Environment | Host URL |
+|-------------|----------|
+| Local Docker | `http://localhost:8080` |
+| Local Dev | `http://localhost:8000` |
+| HF Spaces | `https://dacrow13-hopcroft-skill-classification.hf.space` |
+### Interpreting Results
+| Metric | Description | Target |
+|--------|-------------|--------|
+| RPS | Requests per second | Higher = better |
+| Median Response Time | 50th percentile latency | < 500ms |
+| 95th Percentile | Worst-case latency | < 2s |
+| Failure Rate | Percentage of errors | < 1% |
+---
+## 5. Monitoring (Prometheus & Grafana)
+### Access Points
+**Local Development:**
+| Service | URL |
+|---------|-----|
+| Prometheus | http://localhost:9090 |
+| Grafana | http://localhost:3000 |
+| Pushgateway | http://localhost:9091 |
+**Hugging Face Spaces (Production):**
+| Service | URL |
+|---------|-----|
+| Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ |
+| Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ |
+### Prometheus Metrics
+Access the metrics endpoint: http://localhost:8080/metrics
+#### Available Metrics
+| Metric | Type | Description |
+|--------|------|-------------|
+| `hopcroft_requests_total` | Counter | Total requests by method/endpoint |
+| `hopcroft_request_duration_seconds` | Histogram | Request latency distribution |
+| `hopcroft_in_progress_requests` | Gauge | Currently processing requests |
+| `hopcroft_prediction_processing_seconds` | Summary | Model inference time |
+#### Useful PromQL Queries
+**Request Rate (per second)**
+```promql
+rate(hopcroft_requests_total[1m])
+```
+**Average Latency**
+```promql
+rate(hopcroft_request_duration_seconds_sum[5m]) / rate(hopcroft_request_duration_seconds_count[5m])
+```
+**In-Progress Requests**
+```promql
+hopcroft_in_progress_requests
+```
+**Model Prediction Time (P90)**
+```promql
+hopcroft_prediction_processing_seconds{quantile="0.9"}
+```
+### Grafana Dashboards
+The pre-configured dashboard includes:
+| Panel | Description |
+|-------|-------------|
+| API Request Rate | Real-time requests per endpoint |
+| API Latency | Response time distribution |
+| Drift Detection Status | Binary indicator (0=No Drift, 1=Drift) |
+| Drift P-Value | Statistical significance |
+| Drift Distance | KS test distance metric |
+### Data Drift Detection
+#### Prepare Baseline (One-time)
+```bash
+cd monitoring/drift/scripts
+python prepare_baseline.py
+```
+#### Run Drift Check
+```bash
+python run_drift_check.py
+```
+#### Verify Results
+```bash
+# Check Pushgateway
+curl http://localhost:9091/metrics | grep drift
+# PromQL queries
+drift_detected
+drift_p_value
+drift_distance
+```
+### Alerting Rules
+Pre-configured alerts in `monitoring/prometheus/alert_rules.yml`:
+| Alert | Condition | Severity |
+|-------|-----------|----------|
+| `ServiceDown` | Target down for 5m | Critical |
+| `HighErrorRate` | 5xx > 10% for 5m | Warning |
+| `SlowRequests` | P95 > 2s | Warning |
+### Starting Monitoring Stack
+```bash
+# Start all monitoring services
+docker compose up -d
+# Verify containers
+docker compose ps
+# Check Prometheus targets
+curl http://localhost:9090/targets
+```
+---
+## Troubleshooting
+### Common Issues
+#### API Returns 500 Error
+1. Check `.env` credentials are correct
+2. Restart services: `docker compose down && docker compose up -d`
+3. Verify model files: `docker exec hopcroft-api ls -la /app/models/`
+#### GUI Shows "API Unavailable"
+1. Wait 30-60 seconds for API initialization
+2. Check API health: `curl http://localhost:8080/health`
+3. View logs: `docker compose logs hopcroft-api`
+#### Port Already in Use
+```bash
+# Check port usage
+netstat -ano | findstr :8080
+# Stop conflicting containers
+docker compose down
+```
+#### DVC Pull Fails
+```bash
+# Clean cache and retry
+rm -rf .dvc/cache
+dvc pull
+```