| # User Guide | |
| Complete operational guide for the Hopcroft Skill Classification system covering all components: API, GUI, load testing, and monitoring. | |
| --- | |
| ## Table of Contents | |
| 1. [System Setup](#1-system-setup) | |
| 2. [API Usage](#2-api-usage) | |
| 3. [GUI (Streamlit)](#3-gui-streamlit) | |
| 4. [Load Testing (Locust)](#4-load-testing-locust) | |
| 5. [Monitoring (Prometheus & Grafana)](#5-monitoring-prometheus--grafana) | |
| --- | |
| ## 1. System Setup (Local) | |
| ### Prerequisites | |
| | Requirement | Version | Purpose | | |
| |-------------|---------|---------| | |
| | Python | 3.10+ | Runtime environment | | |
| | Docker | 20.10+ | Containerization | | |
| | Docker Compose | 2.0+ | Multi-service orchestration | | |
| | Git | 2.30+ | Version control | | |
| ### Option A: Docker Setup | |
| **1. Clone and Configure** | |
| ```bash | |
| git clone https://github.com/se4ai2526-uniba/Hopcroft.git | |
| cd Hopcroft | |
| # Create environment file | |
| cp .env.example .env | |
| ``` | |
| **2. Edit `.env` with Your Credentials** | |
| ```env | |
| MLFLOW_TRACKING_URI=https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow | |
| MLFLOW_TRACKING_USERNAME=your_dagshub_username | |
| MLFLOW_TRACKING_PASSWORD=your_dagshub_token | |
| ``` | |
| > [!TIP] | |
| > Get your DagsHub token at: https://dagshub.com/user/settings/tokens | |
| **3. Start All Services** | |
| ```bash | |
| docker compose -f docker/docker-compose.yml up -d --build | |
| ``` | |
| **4. Verify Services** | |
| | Service | URL | Purpose | | |
| |---------|-----|---------| | |
| | API (Swagger) | http://localhost:8080/docs | Interactive API documentation | | |
| | GUI (Streamlit) | http://localhost:8501 | Web interface | | |
| | Health Check | http://localhost:8080/health | Service status | | |
| ### Option B: Virtual Environment Setup | |
| **1. Create Virtual Environment** | |
| ```bash | |
| python -m venv venv | |
| # Windows | |
| venv\Scripts\activate | |
| # Linux/macOS | |
| source venv/bin/activate | |
| ``` | |
| **2. Install Dependencies** | |
| ```bash | |
| pip install -r requirements.txt | |
| pip install -e . | |
| ``` | |
| **3. Configure DVC (for Model Access)** | |
| ```bash | |
| dvc remote modify origin --local auth basic | |
| dvc remote modify origin --local user YOUR_DAGSHUB_USERNAME | |
| dvc remote modify origin --local password YOUR_DAGSHUB_TOKEN | |
| dvc pull | |
| ``` | |
| **4. Start Services Manually** | |
| ```bash | |
| # Terminal 1: Start API | |
| make api-dev | |
| # Terminal 2: Start Streamlit | |
| streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py | |
| ``` | |
| ### Docker Compose Commands Reference | |
| | Command | Description | | |
| |---------|-------------| | |
| | `docker compose -f docker/docker-compose.yml up -d` | Start in background | | |
| | `docker compose -f docker/docker-compose.yml down` | Stop all services | | |
| | `docker compose -f docker/docker-compose.yml logs -f` | Stream logs | | |
| | `docker compose -f docker/docker-compose.yml ps` | Check status | | |
| | `docker compose -f docker/docker-compose.yml restart` | Restart services | | |
| --- | |
| ## 2. API Usage | |
| ### Base URLs | |
| | Environment | URL | | |
| |-------------|-----| | |
| | Local (Docker) | http://localhost:8080 | | |
| | Local (Dev) | http://localhost:8000 | | |
| | Production (HF Spaces) | https://dacrow13-hopcroft-skill-classification.hf.space/docs| | |
| ### Endpoints Overview | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `POST` | `/predict` | Predict skills for single issue | | |
| | `POST` | `/predict/batch` | Batch prediction (max 100) | | |
| | `GET` | `/predictions` | List recent predictions | | |
| | `GET` | `/predictions/{run_id}` | Get prediction by ID | | |
| | `GET` | `/health` | Health check | | |
| | `GET` | `/metrics` | Prometheus metrics | | |
| ### Interactive Documentation | |
| Access Swagger UI for interactive testing: | |
| - **Swagger**: http://localhost:8080/docs | |
| - **ReDoc**: http://localhost:8080/redoc | |
| ### Example Requests | |
| #### Single Prediction | |
| ```bash | |
| curl -X POST "http://localhost:8080/predict" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "issue_text": "Fix authentication bug in OAuth2 login flow", | |
| "repo_name": "my-project", | |
| "pr_number": 42 | |
| }' | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "run_id": "abc123...", | |
| "predictions": [ | |
| {"skill": "authentication", "confidence": 0.92}, | |
| {"skill": "security", "confidence": 0.78}, | |
| {"skill": "oauth", "confidence": 0.65} | |
| ], | |
| "model_version": "1.0.0", | |
| "timestamp": "2025-01-05T15:00:00Z" | |
| } | |
| ``` | |
| #### Batch Prediction | |
| ```bash | |
| curl -X POST "http://localhost:8080/predict/batch" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "issues": [ | |
| {"issue_text": "Database connection timeout"}, | |
| {"issue_text": "UI button not responding"} | |
| ] | |
| }' | |
| ``` | |
| #### List Predictions | |
| ```bash | |
| curl "http://localhost:8080/predictions?limit=10&skip=0" | |
| ``` | |
| #### Health Check | |
| ```bash | |
| curl "http://localhost:8080/health" | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "status": "healthy", | |
| "model_loaded": true, | |
| "model_version": "1.0.0" | |
| } | |
| ``` | |
| ### Makefile Shortcuts | |
| ```bash | |
| make test-api-health # Test health endpoint | |
| make test-api-predict # Test prediction | |
| make test-api-list # List predictions | |
| make test-api-all # Run all API tests | |
| ``` | |
| --- | |
| ## 3. GUI (Streamlit) | |
| ### Access Points | |
| | Environment | URL | | |
| |-------------|-----| | |
| | Local (Docker) | http://localhost:8501 | | |
| | Production | https://dacrow13-hopcroft-skill-classification.hf.space | | |
| ### Features | |
| - **Real-time Prediction**: Instant skill classification | |
| - **Confidence Scores**: Probability for each predicted skill | |
| - **Multiple Input Modes**: Quick input, detailed input, examples | |
| - **API Health Indicator**: Connection status in sidebar | |
| ### User Interface | |
| #### Main Dashboard | |
|  | |
| The sidebar displays: | |
| - API connection status | |
| - Confidence threshold slider | |
| - Model information | |
| #### Quick Input Mode | |
|  | |
| 1. Paste GitHub issue text | |
| 2. Click "Predict Skills" | |
| 3. View results instantly | |
| #### Detailed Input Mode | |
|  | |
| Optional metadata fields: | |
| - Repository name | |
| - PR number | |
| - Extended description | |
| #### Prediction Results | |
|  | |
| Results display: | |
| - Top-5 predicted skills with confidence bars | |
| - Full predictions table with filtering | |
| - Processing time metrics | |
| - Raw JSON response (expandable) | |
| #### Example Gallery | |
|  | |
| Pre-loaded test cases: | |
| - Authentication bugs | |
| - ML feature requests | |
| - Database issues | |
| - UI enhancements | |
| --- | |
| ## 4. Load Testing (Locust) | |
| ### Installation | |
| ```bash | |
| pip install locust | |
| ``` | |
| ### Configuration | |
| The Locust configuration is in `monitoring/locust/locustfile.py`: | |
| | Task | Weight | Endpoint | | |
| |------|--------|----------| | |
| | Single Prediction | 60% (weight: 3) | `POST /predict` | | |
| | Batch Prediction | 20% (weight: 1) | `POST /predict/batch` | | |
| | Monitoring | 20% (weight: 1) | `GET /health`, `/predictions` | | |
| ### Running Load Tests | |
| #### Web UI Mode | |
| ```bash | |
| cd monitoring/locust | |
| locust | |
| ``` | |
| Then open: http://localhost:8089 | |
| Configure in the Web UI: | |
| - **Number of users**: Total concurrent users | |
| - **Spawn rate**: Users per second to add | |
| - **Host**: Target URL (e.g., `http://localhost:8080`) | |
| #### Headless Mode | |
| ```bash | |
| locust --headless \ | |
| --users 50 \ | |
| --spawn-rate 10 \ | |
| --run-time 5m \ | |
| --host http://localhost:8080 \ | |
| --csv results | |
| ``` | |
| ### Target URLs | |
| | Environment | Host URL | | |
| |-------------|----------| | |
| | Local Docker | `http://localhost:8080` | | |
| | Local Dev | `http://localhost:8000` | | |
| | HF Spaces | `https://dacrow13-hopcroft-skill-classification.hf.space` | | |
| ### Interpreting Results | |
| | Metric | Description | Target | | |
| |--------|-------------|--------| | |
| | RPS | Requests per second | Higher = better | | |
| | Median Response Time | 50th percentile latency | < 500ms | | |
| | 95th Percentile | Worst-case latency | < 2s | | |
| | Failure Rate | Percentage of errors | < 1% | | |
|  | |
| --- | |
| ## 5. Monitoring (Prometheus & Grafana) | |
| ### Access Points | |
| **Local Development:** | |
| | Service | URL | | |
| |---------|-----| | |
| | Prometheus | http://localhost:9090 | | |
| | Grafana | http://localhost:3000 | | |
| | Pushgateway | http://localhost:9091 | | |
| **Hugging Face Spaces (Production):** | |
| | Service | URL | | |
| |---------|-----| | |
| | Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ | | |
| | Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ | | |
| ### Prometheus Metrics | |
| Access the metrics endpoint: http://localhost:8080/metrics | |
| #### Available Metrics | |
| | Metric | Type | Description | | |
| |--------|------|-------------| | |
| | `hopcroft_requests_total` | Counter | Total requests by method/endpoint | | |
| | `hopcroft_request_duration_seconds` | Histogram | Request latency distribution | | |
| | `hopcroft_in_progress_requests` | Gauge | Currently processing requests | | |
| | `hopcroft_prediction_processing_seconds` | Summary | Model inference time | | |
| #### Useful PromQL Queries | |
| **Request Rate (per second)** | |
| ```promql | |
| rate(hopcroft_requests_total[1m]) | |
| ``` | |
| **Average Latency** | |
| ```promql | |
| rate(hopcroft_request_duration_seconds_sum[5m]) / rate(hopcroft_request_duration_seconds_count[5m]) | |
| ``` | |
| **In-Progress Requests** | |
| ```promql | |
| hopcroft_in_progress_requests | |
| ``` | |
| **Model Prediction Time (P90)** | |
| ```promql | |
| hopcroft_prediction_processing_seconds{quantile="0.9"} | |
| ``` | |
| ### Grafana Dashboards | |
| The pre-configured dashboard includes: | |
| | Panel | Description | | |
| |-------|-------------| | |
| | Request Rate | Real-time requests per second | | |
| | Request Latency (p50, p95) | Response time percentiles | | |
| | In-Progress Requests | Currently processing requests | | |
| | Error Rate (5xx) | Percentage of failed requests | | |
| | Model Prediction Time | Average model inference latency | | |
| | Requests by Endpoint | Traffic distribution per endpoint | | |
| ### Data Drift Detection | |
| #### Prepare Baseline (One-time) | |
| ```bash | |
| cd monitoring/drift/scripts | |
| python prepare_baseline.py | |
| ``` | |
| #### Run Drift Check | |
| ```bash | |
| python run_drift_check.py | |
| ``` | |
| #### Verify Results | |
| ```bash | |
| # Check Pushgateway | |
| curl http://localhost:9091/metrics | grep drift | |
| # PromQL queries | |
| drift_detected | |
| drift_p_value | |
| drift_distance | |
| ``` | |
| ### Alerting Rules | |
| Pre-configured alerts in `monitoring/prometheus/alert_rules.yml`: | |
| | Alert | Condition | Severity | | |
| |-------|-----------|----------| | |
| | `ServiceDown` | Target down for 5m | Critical | | |
| | `HighErrorRate` | 5xx > 10% for 5m | Warning | | |
| | `SlowRequests` | P95 > 2s | Warning | | |
| ### Starting Monitoring Stack | |
| ```bash | |
| # Start all monitoring services | |
| docker compose up -d | |
| # Verify containers | |
| docker compose ps | |
| # Check Prometheus targets | |
| curl http://localhost:9090/targets | |
| ``` | |
| --- | |
| ## Troubleshooting | |
| ### Common Issues | |
| #### API Returns 500 Error | |
| 1. Check `.env` credentials are correct | |
| 2. Restart services: `docker compose down && docker compose up -d` | |
| 3. Verify model files: `docker exec hopcroft-api ls -la /app/models/` | |
| #### GUI Shows "API Unavailable" | |
| 1. Wait 30-60 seconds for API initialization | |
| 2. Check API health: `curl http://localhost:8080/health` | |
| 3. View logs: `docker compose logs hopcroft-api` | |
| #### Port Already in Use | |
| ```bash | |
| # Check port usage | |
| netstat -ano | findstr :8080 | |
| # Stop conflicting containers | |
| docker compose down | |
| ``` | |
| #### DVC Pull Fails | |
| ```bash | |
| # Clean cache and retry | |
| rm -rf .dvc/cache | |
| dvc pull | |
| ``` | |