File size: 10,953 Bytes
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d08d694
bba28e5
 
 
 
 
 
 
 
 
 
d08d694
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d08d694
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b15754
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d08d694
 
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70cbf15
 
 
 
 
 
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
# User Guide

Complete operational guide for the Hopcroft Skill Classification system covering all components: API, GUI, load testing, and monitoring.

---

## Table of Contents

1. [System Setup](#1-system-setup)
2. [API Usage](#2-api-usage)
3. [GUI (Streamlit)](#3-gui-streamlit)
4. [Load Testing (Locust)](#4-load-testing-locust)
5. [Monitoring (Prometheus & Grafana)](#5-monitoring-prometheus--grafana)

---

## 1. System Setup (Local)

### Prerequisites

| Requirement | Version | Purpose |
|-------------|---------|---------|
| Python | 3.10+ | Runtime environment |
| Docker | 20.10+ | Containerization |
| Docker Compose | 2.0+ | Multi-service orchestration |
| Git | 2.30+ | Version control |

### Option A: Docker Setup

**1. Clone and Configure**

```bash
git clone https://github.com/se4ai2526-uniba/Hopcroft.git
cd Hopcroft

# Create environment file
cp .env.example .env
```

**2. Edit `.env` with Your Credentials**

```env
MLFLOW_TRACKING_URI=https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow
MLFLOW_TRACKING_USERNAME=your_dagshub_username
MLFLOW_TRACKING_PASSWORD=your_dagshub_token
```

> [!TIP]
> Get your DagsHub token at: https://dagshub.com/user/settings/tokens

**3. Start All Services**

```bash
docker compose -f docker/docker-compose.yml up -d --build
```

**4. Verify Services**

| Service | URL | Purpose |
|---------|-----|---------|
| API (Swagger) | http://localhost:8080/docs | Interactive API documentation |
| GUI (Streamlit) | http://localhost:8501 | Web interface |
| Health Check | http://localhost:8080/health | Service status |

### Option B: Virtual Environment Setup

**1. Create Virtual Environment**

```bash
python -m venv venv

# Windows
venv\Scripts\activate

# Linux/macOS
source venv/bin/activate
```

**2. Install Dependencies**

```bash
pip install -r requirements.txt
pip install -e .
```

**3. Configure DVC (for Model Access)**

```bash
dvc remote modify origin --local auth basic
dvc remote modify origin --local user YOUR_DAGSHUB_USERNAME
dvc remote modify origin --local password YOUR_DAGSHUB_TOKEN
dvc pull
```

**4. Start Services Manually**

```bash
# Terminal 1: Start API
make api-dev

# Terminal 2: Start Streamlit
streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
```

### Docker Compose Commands Reference

| Command | Description |
|---------|-------------|
| `docker compose -f docker/docker-compose.yml up -d` | Start in background |
| `docker compose -f docker/docker-compose.yml down` | Stop all services |
| `docker compose -f docker/docker-compose.yml logs -f` | Stream logs |
| `docker compose -f docker/docker-compose.yml ps` | Check status |
| `docker compose -f docker/docker-compose.yml restart` | Restart services |

---

## 2. API Usage

### Base URLs

| Environment | URL |
|-------------|-----|
| Local (Docker) | http://localhost:8080 |
| Local (Dev) | http://localhost:8000 |
| Production (HF Spaces) | https://dacrow13-hopcroft-skill-classification.hf.space/docs|

### Endpoints Overview

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/predict` | Predict skills for single issue |
| `POST` | `/predict/batch` | Batch prediction (max 100) |
| `GET` | `/predictions` | List recent predictions |
| `GET` | `/predictions/{run_id}` | Get prediction by ID |
| `GET` | `/health` | Health check |
| `GET` | `/metrics` | Prometheus metrics |

### Interactive Documentation

Access Swagger UI for interactive testing:
- **Swagger**: http://localhost:8080/docs
- **ReDoc**: http://localhost:8080/redoc

### Example Requests

#### Single Prediction

```bash
curl -X POST "http://localhost:8080/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "issue_text": "Fix authentication bug in OAuth2 login flow",
    "repo_name": "my-project",
    "pr_number": 42
  }'
```

**Response:**
```json
{
  "run_id": "abc123...",
  "predictions": [
    {"skill": "authentication", "confidence": 0.92},
    {"skill": "security", "confidence": 0.78},
    {"skill": "oauth", "confidence": 0.65}
  ],
  "model_version": "1.0.0",
  "timestamp": "2025-01-05T15:00:00Z"
}
```

#### Batch Prediction

```bash
curl -X POST "http://localhost:8080/predict/batch" \
  -H "Content-Type: application/json" \
  -d '{
    "issues": [
      {"issue_text": "Database connection timeout"},
      {"issue_text": "UI button not responding"}
    ]
  }'
```

#### List Predictions

```bash
curl "http://localhost:8080/predictions?limit=10&skip=0"
```

#### Health Check

```bash
curl "http://localhost:8080/health"
```

**Response:**
```json
{
  "status": "healthy",
  "model_loaded": true,
  "model_version": "1.0.0"
}
```

### Makefile Shortcuts

```bash
make test-api-health      # Test health endpoint
make test-api-predict     # Test prediction
make test-api-list        # List predictions
make test-api-all         # Run all API tests
```

---

## 3. GUI (Streamlit)

### Access Points

| Environment | URL |
|-------------|-----|
| Local (Docker) | http://localhost:8501 |
| Production | https://dacrow13-hopcroft-skill-classification.hf.space |

### Features

- **Real-time Prediction**: Instant skill classification
- **Confidence Scores**: Probability for each predicted skill
- **Multiple Input Modes**: Quick input, detailed input, examples
- **API Health Indicator**: Connection status in sidebar

### User Interface

#### Main Dashboard

![Main Dashboard](./img/gui_main_dashboard.png)

The sidebar displays:
- API connection status
- Confidence threshold slider
- Model information

#### Quick Input Mode

![Quick Input](./img/gui_quick_input.png)

1. Paste GitHub issue text
2. Click "Predict Skills"
3. View results instantly

#### Detailed Input Mode

![Detailed Input](./img/gui_detailed_input.png)

Optional metadata fields:
- Repository name
- PR number
- Extended description

#### Prediction Results

![Results](./img/gui_detailed.png)

Results display:
- Top-5 predicted skills with confidence bars
- Full predictions table with filtering
- Processing time metrics
- Raw JSON response (expandable)

#### Example Gallery

![Examples](./img/gui_ex.png)

Pre-loaded test cases:
- Authentication bugs
- ML feature requests
- Database issues
- UI enhancements

---

## 4. Load Testing (Locust)

### Installation

```bash
pip install locust
```

### Configuration

The Locust configuration is in `monitoring/locust/locustfile.py`:

| Task | Weight | Endpoint |
|------|--------|----------|
| Single Prediction | 60% (weight: 3) | `POST /predict` |
| Batch Prediction | 20% (weight: 1) | `POST /predict/batch` |
| Monitoring | 20% (weight: 1) | `GET /health`, `/predictions` |

### Running Load Tests

#### Web UI Mode

```bash
cd monitoring/locust
locust
```

Then open: http://localhost:8089

Configure in the Web UI:
- **Number of users**: Total concurrent users
- **Spawn rate**: Users per second to add
- **Host**: Target URL (e.g., `http://localhost:8080`)

#### Headless Mode

```bash
locust --headless \
  --users 50 \
  --spawn-rate 10 \
  --run-time 5m \
  --host http://localhost:8080 \
  --csv results
```

### Target URLs

| Environment | Host URL |
|-------------|----------|
| Local Docker | `http://localhost:8080` |
| Local Dev | `http://localhost:8000` |
| HF Spaces | `https://dacrow13-hopcroft-skill-classification.hf.space` |

### Interpreting Results

| Metric | Description | Target |
|--------|-------------|--------|
| RPS | Requests per second | Higher = better |
| Median Response Time | 50th percentile latency | < 500ms |
| 95th Percentile | Worst-case latency | < 2s |
| Failure Rate | Percentage of errors | < 1% |

![Locust Results](./img/locust.png)

---

## 5. Monitoring (Prometheus & Grafana)

### Access Points

**Local Development:**

| Service | URL |
|---------|-----|
| Prometheus | http://localhost:9090 |
| Grafana | http://localhost:3000 |
| Pushgateway | http://localhost:9091 |

**Hugging Face Spaces (Production):**

| Service | URL |
|---------|-----|
| Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ |
| Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ |  

### Prometheus Metrics

Access the metrics endpoint: http://localhost:8080/metrics

#### Available Metrics

| Metric | Type | Description |
|--------|------|-------------|
| `hopcroft_requests_total` | Counter | Total requests by method/endpoint |
| `hopcroft_request_duration_seconds` | Histogram | Request latency distribution |
| `hopcroft_in_progress_requests` | Gauge | Currently processing requests |
| `hopcroft_prediction_processing_seconds` | Summary | Model inference time |

#### Useful PromQL Queries

**Request Rate (per second)**
```promql
rate(hopcroft_requests_total[1m])
```

**Average Latency**
```promql
rate(hopcroft_request_duration_seconds_sum[5m]) / rate(hopcroft_request_duration_seconds_count[5m])
```

**In-Progress Requests**
```promql
hopcroft_in_progress_requests
```

**Model Prediction Time (P90)**
```promql
hopcroft_prediction_processing_seconds{quantile="0.9"}
```

### Grafana Dashboards

The pre-configured dashboard includes:

| Panel | Description |
|-------|-------------|
| Request Rate | Real-time requests per second |
| Request Latency (p50, p95) | Response time percentiles |
| In-Progress Requests | Currently processing requests |
| Error Rate (5xx) | Percentage of failed requests |
| Model Prediction Time | Average model inference latency |
| Requests by Endpoint | Traffic distribution per endpoint |

### Data Drift Detection

#### Prepare Baseline (One-time)

```bash
cd monitoring/drift/scripts
python prepare_baseline.py
```

#### Run Drift Check

```bash
python run_drift_check.py
```

#### Verify Results

```bash
# Check Pushgateway
curl http://localhost:9091/metrics | grep drift

# PromQL queries
drift_detected
drift_p_value
drift_distance
```

### Alerting Rules

Pre-configured alerts in `monitoring/prometheus/alert_rules.yml`:

| Alert | Condition | Severity |
|-------|-----------|----------|
| `ServiceDown` | Target down for 5m | Critical |
| `HighErrorRate` | 5xx > 10% for 5m | Warning |
| `SlowRequests` | P95 > 2s | Warning |

### Starting Monitoring Stack

```bash
# Start all monitoring services
docker compose up -d

# Verify containers
docker compose ps

# Check Prometheus targets
curl http://localhost:9090/targets
```

---

## Troubleshooting

### Common Issues

#### API Returns 500 Error

1. Check `.env` credentials are correct
2. Restart services: `docker compose down && docker compose up -d`
3. Verify model files: `docker exec hopcroft-api ls -la /app/models/`

#### GUI Shows "API Unavailable"

1. Wait 30-60 seconds for API initialization
2. Check API health: `curl http://localhost:8080/health`
3. View logs: `docker compose logs hopcroft-api`

#### Port Already in Use

```bash
# Check port usage
netstat -ano | findstr :8080

# Stop conflicting containers
docker compose down
```

#### DVC Pull Fails

```bash
# Clean cache and retry
rm -rf .dvc/cache
dvc pull
```