maurocarlu's picture
nginx endpoints addition - grafana documentation update
70cbf15
# User Guide
Complete operational guide for the Hopcroft Skill Classification system covering all components: API, GUI, load testing, and monitoring.
---
## Table of Contents
1. [System Setup](#1-system-setup)
2. [API Usage](#2-api-usage)
3. [GUI (Streamlit)](#3-gui-streamlit)
4. [Load Testing (Locust)](#4-load-testing-locust)
5. [Monitoring (Prometheus & Grafana)](#5-monitoring-prometheus--grafana)
---
## 1. System Setup (Local)
### Prerequisites
| Requirement | Version | Purpose |
|-------------|---------|---------|
| Python | 3.10+ | Runtime environment |
| Docker | 20.10+ | Containerization |
| Docker Compose | 2.0+ | Multi-service orchestration |
| Git | 2.30+ | Version control |
### Option A: Docker Setup
**1. Clone and Configure**
```bash
git clone https://github.com/se4ai2526-uniba/Hopcroft.git
cd Hopcroft
# Create environment file
cp .env.example .env
```
**2. Edit `.env` with Your Credentials**
```env
MLFLOW_TRACKING_URI=https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow
MLFLOW_TRACKING_USERNAME=your_dagshub_username
MLFLOW_TRACKING_PASSWORD=your_dagshub_token
```
> [!TIP]
> Get your DagsHub token at: https://dagshub.com/user/settings/tokens
**3. Start All Services**
```bash
docker compose -f docker/docker-compose.yml up -d --build
```
**4. Verify Services**
| Service | URL | Purpose |
|---------|-----|---------|
| API (Swagger) | http://localhost:8080/docs | Interactive API documentation |
| GUI (Streamlit) | http://localhost:8501 | Web interface |
| Health Check | http://localhost:8080/health | Service status |
### Option B: Virtual Environment Setup
**1. Create Virtual Environment**
```bash
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/macOS
source venv/bin/activate
```
**2. Install Dependencies**
```bash
pip install -r requirements.txt
pip install -e .
```
**3. Configure DVC (for Model Access)**
```bash
dvc remote modify origin --local auth basic
dvc remote modify origin --local user YOUR_DAGSHUB_USERNAME
dvc remote modify origin --local password YOUR_DAGSHUB_TOKEN
dvc pull
```
**4. Start Services Manually**
```bash
# Terminal 1: Start API
make api-dev
# Terminal 2: Start Streamlit
streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
```
### Docker Compose Commands Reference
| Command | Description |
|---------|-------------|
| `docker compose -f docker/docker-compose.yml up -d` | Start in background |
| `docker compose -f docker/docker-compose.yml down` | Stop all services |
| `docker compose -f docker/docker-compose.yml logs -f` | Stream logs |
| `docker compose -f docker/docker-compose.yml ps` | Check status |
| `docker compose -f docker/docker-compose.yml restart` | Restart services |
---
## 2. API Usage
### Base URLs
| Environment | URL |
|-------------|-----|
| Local (Docker) | http://localhost:8080 |
| Local (Dev) | http://localhost:8000 |
| Production (HF Spaces) | https://dacrow13-hopcroft-skill-classification.hf.space/docs|
### Endpoints Overview
| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/predict` | Predict skills for single issue |
| `POST` | `/predict/batch` | Batch prediction (max 100) |
| `GET` | `/predictions` | List recent predictions |
| `GET` | `/predictions/{run_id}` | Get prediction by ID |
| `GET` | `/health` | Health check |
| `GET` | `/metrics` | Prometheus metrics |
### Interactive Documentation
Access Swagger UI for interactive testing:
- **Swagger**: http://localhost:8080/docs
- **ReDoc**: http://localhost:8080/redoc
### Example Requests
#### Single Prediction
```bash
curl -X POST "http://localhost:8080/predict" \
-H "Content-Type: application/json" \
-d '{
"issue_text": "Fix authentication bug in OAuth2 login flow",
"repo_name": "my-project",
"pr_number": 42
}'
```
**Response:**
```json
{
"run_id": "abc123...",
"predictions": [
{"skill": "authentication", "confidence": 0.92},
{"skill": "security", "confidence": 0.78},
{"skill": "oauth", "confidence": 0.65}
],
"model_version": "1.0.0",
"timestamp": "2025-01-05T15:00:00Z"
}
```
#### Batch Prediction
```bash
curl -X POST "http://localhost:8080/predict/batch" \
-H "Content-Type: application/json" \
-d '{
"issues": [
{"issue_text": "Database connection timeout"},
{"issue_text": "UI button not responding"}
]
}'
```
#### List Predictions
```bash
curl "http://localhost:8080/predictions?limit=10&skip=0"
```
#### Health Check
```bash
curl "http://localhost:8080/health"
```
**Response:**
```json
{
"status": "healthy",
"model_loaded": true,
"model_version": "1.0.0"
}
```
### Makefile Shortcuts
```bash
make test-api-health # Test health endpoint
make test-api-predict # Test prediction
make test-api-list # List predictions
make test-api-all # Run all API tests
```
---
## 3. GUI (Streamlit)
### Access Points
| Environment | URL |
|-------------|-----|
| Local (Docker) | http://localhost:8501 |
| Production | https://dacrow13-hopcroft-skill-classification.hf.space |
### Features
- **Real-time Prediction**: Instant skill classification
- **Confidence Scores**: Probability for each predicted skill
- **Multiple Input Modes**: Quick input, detailed input, examples
- **API Health Indicator**: Connection status in sidebar
### User Interface
#### Main Dashboard
![Main Dashboard](./img/gui_main_dashboard.png)
The sidebar displays:
- API connection status
- Confidence threshold slider
- Model information
#### Quick Input Mode
![Quick Input](./img/gui_quick_input.png)
1. Paste GitHub issue text
2. Click "Predict Skills"
3. View results instantly
#### Detailed Input Mode
![Detailed Input](./img/gui_detailed_input.png)
Optional metadata fields:
- Repository name
- PR number
- Extended description
#### Prediction Results
![Results](./img/gui_detailed.png)
Results display:
- Top-5 predicted skills with confidence bars
- Full predictions table with filtering
- Processing time metrics
- Raw JSON response (expandable)
#### Example Gallery
![Examples](./img/gui_ex.png)
Pre-loaded test cases:
- Authentication bugs
- ML feature requests
- Database issues
- UI enhancements
---
## 4. Load Testing (Locust)
### Installation
```bash
pip install locust
```
### Configuration
The Locust configuration is in `monitoring/locust/locustfile.py`:
| Task | Weight | Endpoint |
|------|--------|----------|
| Single Prediction | 60% (weight: 3) | `POST /predict` |
| Batch Prediction | 20% (weight: 1) | `POST /predict/batch` |
| Monitoring | 20% (weight: 1) | `GET /health`, `/predictions` |
### Running Load Tests
#### Web UI Mode
```bash
cd monitoring/locust
locust
```
Then open: http://localhost:8089
Configure in the Web UI:
- **Number of users**: Total concurrent users
- **Spawn rate**: Users per second to add
- **Host**: Target URL (e.g., `http://localhost:8080`)
#### Headless Mode
```bash
locust --headless \
--users 50 \
--spawn-rate 10 \
--run-time 5m \
--host http://localhost:8080 \
--csv results
```
### Target URLs
| Environment | Host URL |
|-------------|----------|
| Local Docker | `http://localhost:8080` |
| Local Dev | `http://localhost:8000` |
| HF Spaces | `https://dacrow13-hopcroft-skill-classification.hf.space` |
### Interpreting Results
| Metric | Description | Target |
|--------|-------------|--------|
| RPS | Requests per second | Higher = better |
| Median Response Time | 50th percentile latency | < 500ms |
| 95th Percentile | Worst-case latency | < 2s |
| Failure Rate | Percentage of errors | < 1% |
![Locust Results](./img/locust.png)
---
## 5. Monitoring (Prometheus & Grafana)
### Access Points
**Local Development:**
| Service | URL |
|---------|-----|
| Prometheus | http://localhost:9090 |
| Grafana | http://localhost:3000 |
| Pushgateway | http://localhost:9091 |
**Hugging Face Spaces (Production):**
| Service | URL |
|---------|-----|
| Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ |
| Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ |
### Prometheus Metrics
Access the metrics endpoint: http://localhost:8080/metrics
#### Available Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `hopcroft_requests_total` | Counter | Total requests by method/endpoint |
| `hopcroft_request_duration_seconds` | Histogram | Request latency distribution |
| `hopcroft_in_progress_requests` | Gauge | Currently processing requests |
| `hopcroft_prediction_processing_seconds` | Summary | Model inference time |
#### Useful PromQL Queries
**Request Rate (per second)**
```promql
rate(hopcroft_requests_total[1m])
```
**Average Latency**
```promql
rate(hopcroft_request_duration_seconds_sum[5m]) / rate(hopcroft_request_duration_seconds_count[5m])
```
**In-Progress Requests**
```promql
hopcroft_in_progress_requests
```
**Model Prediction Time (P90)**
```promql
hopcroft_prediction_processing_seconds{quantile="0.9"}
```
### Grafana Dashboards
The pre-configured dashboard includes:
| Panel | Description |
|-------|-------------|
| Request Rate | Real-time requests per second |
| Request Latency (p50, p95) | Response time percentiles |
| In-Progress Requests | Currently processing requests |
| Error Rate (5xx) | Percentage of failed requests |
| Model Prediction Time | Average model inference latency |
| Requests by Endpoint | Traffic distribution per endpoint |
### Data Drift Detection
#### Prepare Baseline (One-time)
```bash
cd monitoring/drift/scripts
python prepare_baseline.py
```
#### Run Drift Check
```bash
python run_drift_check.py
```
#### Verify Results
```bash
# Check Pushgateway
curl http://localhost:9091/metrics | grep drift
# PromQL queries
drift_detected
drift_p_value
drift_distance
```
### Alerting Rules
Pre-configured alerts in `monitoring/prometheus/alert_rules.yml`:
| Alert | Condition | Severity |
|-------|-----------|----------|
| `ServiceDown` | Target down for 5m | Critical |
| `HighErrorRate` | 5xx > 10% for 5m | Warning |
| `SlowRequests` | P95 > 2s | Warning |
### Starting Monitoring Stack
```bash
# Start all monitoring services
docker compose up -d
# Verify containers
docker compose ps
# Check Prometheus targets
curl http://localhost:9090/targets
```
---
## Troubleshooting
### Common Issues
#### API Returns 500 Error
1. Check `.env` credentials are correct
2. Restart services: `docker compose down && docker compose up -d`
3. Verify model files: `docker exec hopcroft-api ls -la /app/models/`
#### GUI Shows "API Unavailable"
1. Wait 30-60 seconds for API initialization
2. Check API health: `curl http://localhost:8080/health`
3. View logs: `docker compose logs hopcroft-api`
#### Port Already in Use
```bash
# Check port usage
netstat -ano | findstr :8080
# Stop conflicting containers
docker compose down
```
#### DVC Pull Fails
```bash
# Clean cache and retry
rm -rf .dvc/cache
dvc pull
```