User Guide
Complete operational guide for the Hopcroft Skill Classification system covering all components: API, GUI, load testing, and monitoring.
Table of Contents
1. System Setup (Local)
Prerequisites
| Requirement | Version | Purpose |
|---|---|---|
| Python | 3.10+ | Runtime environment |
| Docker | 20.10+ | Containerization |
| Docker Compose | 2.0+ | Multi-service orchestration |
| Git | 2.30+ | Version control |
Option A: Docker Setup
1. Clone and Configure
git clone https://github.com/se4ai2526-uniba/Hopcroft.git
cd Hopcroft
# Create environment file
cp .env.example .env
2. Edit .env with Your Credentials
MLFLOW_TRACKING_URI=https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow
MLFLOW_TRACKING_USERNAME=your_dagshub_username
MLFLOW_TRACKING_PASSWORD=your_dagshub_token
Get your DagsHub token at: https://dagshub.com/user/settings/tokens
3. Start All Services
docker compose -f docker/docker-compose.yml up -d --build
4. Verify Services
| Service | URL | Purpose |
|---|---|---|
| API (Swagger) | http://localhost:8080/docs | Interactive API documentation |
| GUI (Streamlit) | http://localhost:8501 | Web interface |
| Health Check | http://localhost:8080/health | Service status |
Option B: Virtual Environment Setup
1. Create Virtual Environment
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/macOS
source venv/bin/activate
2. Install Dependencies
pip install -r requirements.txt
pip install -e .
3. Configure DVC (for Model Access)
dvc remote modify origin --local auth basic
dvc remote modify origin --local user YOUR_DAGSHUB_USERNAME
dvc remote modify origin --local password YOUR_DAGSHUB_TOKEN
dvc pull
4. Start Services Manually
# Terminal 1: Start API
make api-dev
# Terminal 2: Start Streamlit
streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
Docker Compose Commands Reference
| Command | Description |
|---|---|
docker compose -f docker/docker-compose.yml up -d |
Start in background |
docker compose -f docker/docker-compose.yml down |
Stop all services |
docker compose -f docker/docker-compose.yml logs -f |
Stream logs |
docker compose -f docker/docker-compose.yml ps |
Check status |
docker compose -f docker/docker-compose.yml restart |
Restart services |
2. API Usage
Base URLs
| Environment | URL |
|---|---|
| Local (Docker) | http://localhost:8080 |
| Local (Dev) | http://localhost:8000 |
| Production (HF Spaces) | https://dacrow13-hopcroft-skill-classification.hf.space/docs |
Endpoints Overview
| Method | Endpoint | Description |
|---|---|---|
POST |
/predict |
Predict skills for single issue |
POST |
/predict/batch |
Batch prediction (max 100) |
GET |
/predictions |
List recent predictions |
GET |
/predictions/{run_id} |
Get prediction by ID |
GET |
/health |
Health check |
GET |
/metrics |
Prometheus metrics |
Interactive Documentation
Access Swagger UI for interactive testing:
- Swagger: http://localhost:8080/docs
- ReDoc: http://localhost:8080/redoc
Example Requests
Single Prediction
curl -X POST "http://localhost:8080/predict" \
-H "Content-Type: application/json" \
-d '{
"issue_text": "Fix authentication bug in OAuth2 login flow",
"repo_name": "my-project",
"pr_number": 42
}'
Response:
{
"run_id": "abc123...",
"predictions": [
{"skill": "authentication", "confidence": 0.92},
{"skill": "security", "confidence": 0.78},
{"skill": "oauth", "confidence": 0.65}
],
"model_version": "1.0.0",
"timestamp": "2025-01-05T15:00:00Z"
}
Batch Prediction
curl -X POST "http://localhost:8080/predict/batch" \
-H "Content-Type: application/json" \
-d '{
"issues": [
{"issue_text": "Database connection timeout"},
{"issue_text": "UI button not responding"}
]
}'
List Predictions
curl "http://localhost:8080/predictions?limit=10&skip=0"
Health Check
curl "http://localhost:8080/health"
Response:
{
"status": "healthy",
"model_loaded": true,
"model_version": "1.0.0"
}
Makefile Shortcuts
make test-api-health # Test health endpoint
make test-api-predict # Test prediction
make test-api-list # List predictions
make test-api-all # Run all API tests
3. GUI (Streamlit)
Access Points
| Environment | URL |
|---|---|
| Local (Docker) | http://localhost:8501 |
| Production | https://dacrow13-hopcroft-skill-classification.hf.space |
Features
- Real-time Prediction: Instant skill classification
- Confidence Scores: Probability for each predicted skill
- Multiple Input Modes: Quick input, detailed input, examples
- API Health Indicator: Connection status in sidebar
User Interface
Main Dashboard
The sidebar displays:
- API connection status
- Confidence threshold slider
- Model information
Quick Input Mode
- Paste GitHub issue text
- Click "Predict Skills"
- View results instantly
Detailed Input Mode
Optional metadata fields:
- Repository name
- PR number
- Extended description
Prediction Results
Results display:
- Top-5 predicted skills with confidence bars
- Full predictions table with filtering
- Processing time metrics
- Raw JSON response (expandable)
Example Gallery
Pre-loaded test cases:
- Authentication bugs
- ML feature requests
- Database issues
- UI enhancements
4. Load Testing (Locust)
Installation
pip install locust
Configuration
The Locust configuration is in monitoring/locust/locustfile.py:
| Task | Weight | Endpoint |
|---|---|---|
| Single Prediction | 60% (weight: 3) | POST /predict |
| Batch Prediction | 20% (weight: 1) | POST /predict/batch |
| Monitoring | 20% (weight: 1) | GET /health, /predictions |
Running Load Tests
Web UI Mode
cd monitoring/locust
locust
Then open: http://localhost:8089
Configure in the Web UI:
- Number of users: Total concurrent users
- Spawn rate: Users per second to add
- Host: Target URL (e.g.,
http://localhost:8080)
Headless Mode
locust --headless \
--users 50 \
--spawn-rate 10 \
--run-time 5m \
--host http://localhost:8080 \
--csv results
Target URLs
| Environment | Host URL |
|---|---|
| Local Docker | http://localhost:8080 |
| Local Dev | http://localhost:8000 |
| HF Spaces | https://dacrow13-hopcroft-skill-classification.hf.space |
Interpreting Results
| Metric | Description | Target |
|---|---|---|
| RPS | Requests per second | Higher = better |
| Median Response Time | 50th percentile latency | < 500ms |
| 95th Percentile | Worst-case latency | < 2s |
| Failure Rate | Percentage of errors | < 1% |
5. Monitoring (Prometheus & Grafana)
Access Points
Local Development:
| Service | URL |
|---|---|
| Prometheus | http://localhost:9090 |
| Grafana | http://localhost:3000 |
| Pushgateway | http://localhost:9091 |
Hugging Face Spaces (Production):
| Service | URL |
|---|---|
| Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ |
| Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ |
Prometheus Metrics
Access the metrics endpoint: http://localhost:8080/metrics
Available Metrics
| Metric | Type | Description |
|---|---|---|
hopcroft_requests_total |
Counter | Total requests by method/endpoint |
hopcroft_request_duration_seconds |
Histogram | Request latency distribution |
hopcroft_in_progress_requests |
Gauge | Currently processing requests |
hopcroft_prediction_processing_seconds |
Summary | Model inference time |
Useful PromQL Queries
Request Rate (per second)
rate(hopcroft_requests_total[1m])
Average Latency
rate(hopcroft_request_duration_seconds_sum[5m]) / rate(hopcroft_request_duration_seconds_count[5m])
In-Progress Requests
hopcroft_in_progress_requests
Model Prediction Time (P90)
hopcroft_prediction_processing_seconds{quantile="0.9"}
Grafana Dashboards
The pre-configured dashboard includes:
| Panel | Description |
|---|---|
| Request Rate | Real-time requests per second |
| Request Latency (p50, p95) | Response time percentiles |
| In-Progress Requests | Currently processing requests |
| Error Rate (5xx) | Percentage of failed requests |
| Model Prediction Time | Average model inference latency |
| Requests by Endpoint | Traffic distribution per endpoint |
Data Drift Detection
Prepare Baseline (One-time)
cd monitoring/drift/scripts
python prepare_baseline.py
Run Drift Check
python run_drift_check.py
Verify Results
# Check Pushgateway
curl http://localhost:9091/metrics | grep drift
# PromQL queries
drift_detected
drift_p_value
drift_distance
Alerting Rules
Pre-configured alerts in monitoring/prometheus/alert_rules.yml:
| Alert | Condition | Severity |
|---|---|---|
ServiceDown |
Target down for 5m | Critical |
HighErrorRate |
5xx > 10% for 5m | Warning |
SlowRequests |
P95 > 2s | Warning |
Starting Monitoring Stack
# Start all monitoring services
docker compose up -d
# Verify containers
docker compose ps
# Check Prometheus targets
curl http://localhost:9090/targets
Troubleshooting
Common Issues
API Returns 500 Error
- Check
.envcredentials are correct - Restart services:
docker compose down && docker compose up -d - Verify model files:
docker exec hopcroft-api ls -la /app/models/
GUI Shows "API Unavailable"
- Wait 30-60 seconds for API initialization
- Check API health:
curl http://localhost:8080/health - View logs:
docker compose logs hopcroft-api
Port Already in Use
# Check port usage
netstat -ano | findstr :8080
# Stop conflicting containers
docker compose down
DVC Pull Fails
# Clean cache and retry
rm -rf .dvc/cache
dvc pull





