# User Guide

Complete operational guide for the Hopcroft Skill Classification system covering all components: API, GUI, load testing, and monitoring.

---

## Table of Contents

1. [System Setup](#1-system-setup)
2. [API Usage](#2-api-usage)
3. [GUI (Streamlit)](#3-gui-streamlit)
4. [Load Testing (Locust)](#4-load-testing-locust)
5. [Monitoring (Prometheus & Grafana)](#5-monitoring-prometheus--grafana)

---

## 1. System Setup (Local)

### Prerequisites

| Requirement | Version | Purpose |
|-------------|---------|---------|
| Python | 3.10+ | Runtime environment |
| Docker | 20.10+ | Containerization |
| Docker Compose | 2.0+ | Multi-service orchestration |
| Git | 2.30+ | Version control |

### Option A: Docker Setup

**1. Clone and Configure**

```bash
git clone https://github.com/se4ai2526-uniba/Hopcroft.git
cd Hopcroft

# Create environment file
cp .env.example .env
```

**2. Edit `.env` with Your Credentials**

```env
MLFLOW_TRACKING_URI=https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow
MLFLOW_TRACKING_USERNAME=your_dagshub_username
MLFLOW_TRACKING_PASSWORD=your_dagshub_token
```

> [!TIP]
> Get your DagsHub token at: https://dagshub.com/user/settings/tokens

**3. Start All Services**

```bash
docker compose -f docker/docker-compose.yml up -d --build
```

**4. Verify Services**

| Service | URL | Purpose |
|---------|-----|---------|
| API (Swagger) | http://localhost:8080/docs | Interactive API documentation |
| GUI (Streamlit) | http://localhost:8501 | Web interface |
| Health Check | http://localhost:8080/health | Service status |

### Option B: Virtual Environment Setup

**1. Create Virtual Environment**

```bash
python -m venv venv

# Windows
venv\Scripts\activate

# Linux/macOS
source venv/bin/activate
```

**2. Install Dependencies**

```bash
pip install -r requirements.txt
pip install -e .
```

**3. Configure DVC (for Model Access)**

```bash
dvc remote modify origin --local auth basic
dvc remote modify origin --local user YOUR_DAGSHUB_USERNAME
dvc remote modify origin --local password YOUR_DAGSHUB_TOKEN
dvc pull
```

**4. Start Services Manually**

```bash
# Terminal 1: Start API
make api-dev

# Terminal 2: Start Streamlit
streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
```

### Docker Compose Commands Reference

| Command | Description |
|---------|-------------|
| `docker compose -f docker/docker-compose.yml up -d` | Start in background |
| `docker compose -f docker/docker-compose.yml down` | Stop all services |
| `docker compose -f docker/docker-compose.yml logs -f` | Stream logs |
| `docker compose -f docker/docker-compose.yml ps` | Check status |
| `docker compose -f docker/docker-compose.yml restart` | Restart services |

---

## 2. API Usage

### Base URLs

| Environment | URL |
|-------------|-----|
| Local (Docker) | http://localhost:8080 |
| Local (Dev) | http://localhost:8000 |
| Production (HF Spaces) | https://dacrow13-hopcroft-skill-classification.hf.space/docs|

### Endpoints Overview

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/predict` | Predict skills for single issue |
| `POST` | `/predict/batch` | Batch prediction (max 100) |
| `GET` | `/predictions` | List recent predictions |
| `GET` | `/predictions/{run_id}` | Get prediction by ID |
| `GET` | `/health` | Health check |
| `GET` | `/metrics` | Prometheus metrics |

### Interactive Documentation

Access Swagger UI for interactive testing:
- **Swagger**: http://localhost:8080/docs
- **ReDoc**: http://localhost:8080/redoc

### Example Requests

#### Single Prediction

```bash
curl -X POST "http://localhost:8080/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "issue_text": "Fix authentication bug in OAuth2 login flow",
    "repo_name": "my-project",
    "pr_number": 42
  }'
```

**Response:**
```json
{
  "run_id": "abc123...",
  "predictions": [
    {"skill": "authentication", "confidence": 0.92},
    {"skill": "security", "confidence": 0.78},
    {"skill": "oauth", "confidence": 0.65}
  ],
  "model_version": "1.0.0",
  "timestamp": "2025-01-05T15:00:00Z"
}
```

#### Batch Prediction

```bash
curl -X POST "http://localhost:8080/predict/batch" \
  -H "Content-Type: application/json" \
  -d '{
    "issues": [
      {"issue_text": "Database connection timeout"},
      {"issue_text": "UI button not responding"}
    ]
  }'
```

#### List Predictions

```bash
curl "http://localhost:8080/predictions?limit=10&skip=0"
```

#### Health Check

```bash
curl "http://localhost:8080/health"
```

**Response:**
```json
{
  "status": "healthy",
  "model_loaded": true,
  "model_version": "1.0.0"
}
```

### Makefile Shortcuts

```bash
make test-api-health      # Test health endpoint
make test-api-predict     # Test prediction
make test-api-list        # List predictions
make test-api-all         # Run all API tests
```

---

## 3. GUI (Streamlit)

### Access Points

| Environment | URL |
|-------------|-----|
| Local (Docker) | http://localhost:8501 |
| Production | https://dacrow13-hopcroft-skill-classification.hf.space |

### Features

- **Real-time Prediction**: Instant skill classification
- **Confidence Scores**: Probability for each predicted skill
- **Multiple Input Modes**: Quick input, detailed input, examples
- **API Health Indicator**: Connection status in sidebar

### User Interface

#### Main Dashboard

![Main Dashboard](./img/gui_main_dashboard.png)

The sidebar displays:
- API connection status
- Confidence threshold slider
- Model information

#### Quick Input Mode

![Quick Input](./img/gui_quick_input.png)

1. Paste GitHub issue text
2. Click "Predict Skills"
3. View results instantly

#### Detailed Input Mode

![Detailed Input](./img/gui_detailed_input.png)

Optional metadata fields:
- Repository name
- PR number
- Extended description

#### Prediction Results

![Results](./img/gui_detailed.png)

Results display:
- Top-5 predicted skills with confidence bars
- Full predictions table with filtering
- Processing time metrics
- Raw JSON response (expandable)

#### Example Gallery

![Examples](./img/gui_ex.png)

Pre-loaded test cases:
- Authentication bugs
- ML feature requests
- Database issues
- UI enhancements

---

## 4. Load Testing (Locust)

### Installation

```bash
pip install locust
```

### Configuration

The Locust configuration is in `monitoring/locust/locustfile.py`:

| Task | Weight | Endpoint |
|------|--------|----------|
| Single Prediction | 60% (weight: 3) | `POST /predict` |
| Batch Prediction | 20% (weight: 1) | `POST /predict/batch` |
| Monitoring | 20% (weight: 1) | `GET /health`, `/predictions` |

### Running Load Tests

#### Web UI Mode

```bash
cd monitoring/locust
locust
```

Then open: http://localhost:8089

Configure in the Web UI:
- **Number of users**: Total concurrent users
- **Spawn rate**: Users per second to add
- **Host**: Target URL (e.g., `http://localhost:8080`)

#### Headless Mode

```bash
locust --headless \
  --users 50 \
  --spawn-rate 10 \
  --run-time 5m \
  --host http://localhost:8080 \
  --csv results
```

### Target URLs

| Environment | Host URL |
|-------------|----------|
| Local Docker | `http://localhost:8080` |
| Local Dev | `http://localhost:8000` |
| HF Spaces | `https://dacrow13-hopcroft-skill-classification.hf.space` |

### Interpreting Results

| Metric | Description | Target |
|--------|-------------|--------|
| RPS | Requests per second | Higher = better |
| Median Response Time | 50th percentile latency | < 500ms |
| 95th Percentile | Worst-case latency | < 2s |
| Failure Rate | Percentage of errors | < 1% |

![Locust Results](./img/locust.png)

---

## 5. Monitoring (Prometheus & Grafana)

### Access Points

**Local Development:**

| Service | URL |
|---------|-----|
| Prometheus | http://localhost:9090 |
| Grafana | http://localhost:3000 |
| Pushgateway | http://localhost:9091 |

**Hugging Face Spaces (Production):**

| Service | URL |
|---------|-----|
| Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ |
| Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ |  

### Prometheus Metrics

Access the metrics endpoint: http://localhost:8080/metrics

#### Available Metrics

| Metric | Type | Description |
|--------|------|-------------|
| `hopcroft_requests_total` | Counter | Total requests by method/endpoint |
| `hopcroft_request_duration_seconds` | Histogram | Request latency distribution |
| `hopcroft_in_progress_requests` | Gauge | Currently processing requests |
| `hopcroft_prediction_processing_seconds` | Summary | Model inference time |

#### Useful PromQL Queries

**Request Rate (per second)**
```promql
rate(hopcroft_requests_total[1m])
```

**Average Latency**
```promql
rate(hopcroft_request_duration_seconds_sum[5m]) / rate(hopcroft_request_duration_seconds_count[5m])
```

**In-Progress Requests**
```promql
hopcroft_in_progress_requests
```

**Model Prediction Time (P90)**
```promql
hopcroft_prediction_processing_seconds{quantile="0.9"}
```

### Grafana Dashboards

The pre-configured dashboard includes:

| Panel | Description |
|-------|-------------|
| Request Rate | Real-time requests per second |
| Request Latency (p50, p95) | Response time percentiles |
| In-Progress Requests | Currently processing requests |
| Error Rate (5xx) | Percentage of failed requests |
| Model Prediction Time | Average model inference latency |
| Requests by Endpoint | Traffic distribution per endpoint |

### Data Drift Detection

#### Prepare Baseline (One-time)

```bash
cd monitoring/drift/scripts
python prepare_baseline.py
```

#### Run Drift Check

```bash
python run_drift_check.py
```

#### Verify Results

```bash
# Check Pushgateway
curl http://localhost:9091/metrics | grep drift

# PromQL queries
drift_detected
drift_p_value
drift_distance
```

### Alerting Rules

Pre-configured alerts in `monitoring/prometheus/alert_rules.yml`:

| Alert | Condition | Severity |
|-------|-----------|----------|
| `ServiceDown` | Target down for 5m | Critical |
| `HighErrorRate` | 5xx > 10% for 5m | Warning |
| `SlowRequests` | P95 > 2s | Warning |

### Starting Monitoring Stack

```bash
# Start all monitoring services
docker compose up -d

# Verify containers
docker compose ps

# Check Prometheus targets
curl http://localhost:9090/targets
```

---

## Troubleshooting

### Common Issues

#### API Returns 500 Error

1. Check `.env` credentials are correct
2. Restart services: `docker compose down && docker compose up -d`
3. Verify model files: `docker exec hopcroft-api ls -la /app/models/`

#### GUI Shows "API Unavailable"

1. Wait 30-60 seconds for API initialization
2. Check API health: `curl http://localhost:8080/health`
3. View logs: `docker compose logs hopcroft-api`

#### Port Already in Use

```bash
# Check port usage
netstat -ano | findstr :8080

# Stop conflicting containers
docker compose down
```

#### DVC Pull Fails

```bash
# Clean cache and retry
rm -rf .dvc/cache
dvc pull
```