Spaces:

DaCrow13
/

Hopcroft-Skill-Classification

Running

App Files Files Community

Hopcroft-Skill-Classification / docs /user_guide.md

maurocarlu

nginx endpoints addition - grafana documentation update

70cbf15 17 days ago

preview code

raw

history blame contribute delete

11 kB

	# User Guide

	Complete operational guide for the Hopcroft Skill Classification system covering all components: API, GUI, load testing, and monitoring.

	---

	## Table of Contents

	1. [System Setup](#1-system-setup)
	2. [API Usage](#2-api-usage)
	3. [GUI (Streamlit)](#3-gui-streamlit)
	4. [Load Testing (Locust)](#4-load-testing-locust)
	5. [Monitoring (Prometheus & Grafana)](#5-monitoring-prometheus--grafana)

	---

	## 1. System Setup (Local)

	### Prerequisites

	\| Requirement \| Version \| Purpose \|
	\|-------------\|---------\|---------\|
	\| Python \| 3.10+ \| Runtime environment \|
	\| Docker \| 20.10+ \| Containerization \|
	\| Docker Compose \| 2.0+ \| Multi-service orchestration \|
	\| Git \| 2.30+ \| Version control \|

	### Option A: Docker Setup

	1. Clone and Configure

	```bash
	git clone https://github.com/se4ai2526-uniba/Hopcroft.git
	cd Hopcroft

	# Create environment file
	cp .env.example .env
	```

	2. Edit `.env` with Your Credentials

	```env
	MLFLOW_TRACKING_URI=https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow
	MLFLOW_TRACKING_USERNAME=your_dagshub_username
	MLFLOW_TRACKING_PASSWORD=your_dagshub_token
	```

	> [!TIP]
	> Get your DagsHub token at: https://dagshub.com/user/settings/tokens

	3. Start All Services

	```bash
	docker compose -f docker/docker-compose.yml up -d --build
	```

	4. Verify Services

	\| Service \| URL \| Purpose \|
	\|---------\|-----\|---------\|
	\| API (Swagger) \| http://localhost:8080/docs \| Interactive API documentation \|
	\| GUI (Streamlit) \| http://localhost:8501 \| Web interface \|
	\| Health Check \| http://localhost:8080/health \| Service status \|

	### Option B: Virtual Environment Setup

	1. Create Virtual Environment

	```bash
	python -m venv venv

	# Windows
	venv\Scripts\activate

	# Linux/macOS
	source venv/bin/activate
	```

	2. Install Dependencies

	```bash
	pip install -r requirements.txt
	pip install -e .
	```

	3. Configure DVC (for Model Access)

	```bash
	dvc remote modify origin --local auth basic
	dvc remote modify origin --local user YOUR_DAGSHUB_USERNAME
	dvc remote modify origin --local password YOUR_DAGSHUB_TOKEN
	dvc pull
	```

	4. Start Services Manually

	```bash
	# Terminal 1: Start API
	make api-dev

	# Terminal 2: Start Streamlit
	streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
	```

	### Docker Compose Commands Reference

	\| Command \| Description \|
	\|---------\|-------------\|
	\| `docker compose -f docker/docker-compose.yml up -d` \| Start in background \|
	\| `docker compose -f docker/docker-compose.yml down` \| Stop all services \|
	\| `docker compose -f docker/docker-compose.yml logs -f` \| Stream logs \|
	\| `docker compose -f docker/docker-compose.yml ps` \| Check status \|
	\| `docker compose -f docker/docker-compose.yml restart` \| Restart services \|

	---

	## 2. API Usage

	### Base URLs

	\| Environment \| URL \|
	\|-------------\|-----\|
	\| Local (Docker) \| http://localhost:8080 \|
	\| Local (Dev) \| http://localhost:8000 \|
	\| Production (HF Spaces) \| https://dacrow13-hopcroft-skill-classification.hf.space/docs\|

	### Endpoints Overview

	\| Method \| Endpoint \| Description \|
	\|--------\|----------\|-------------\|
	\| `POST` \| `/predict` \| Predict skills for single issue \|
	\| `POST` \| `/predict/batch` \| Batch prediction (max 100) \|
	\| `GET` \| `/predictions` \| List recent predictions \|
	\| `GET` \| `/predictions/{run_id}` \| Get prediction by ID \|
	\| `GET` \| `/health` \| Health check \|
	\| `GET` \| `/metrics` \| Prometheus metrics \|

	### Interactive Documentation

	Access Swagger UI for interactive testing:
	- Swagger: http://localhost:8080/docs
	- ReDoc: http://localhost:8080/redoc

	### Example Requests

	#### Single Prediction

	```bash
	curl -X POST "http://localhost:8080/predict" \
	-H "Content-Type: application/json" \
	-d '{
	"issue_text": "Fix authentication bug in OAuth2 login flow",
	"repo_name": "my-project",
	"pr_number": 42
	}'
	```

	Response:
	```json
	{
	"run_id": "abc123...",
	"predictions": [
	{"skill": "authentication", "confidence": 0.92},
	{"skill": "security", "confidence": 0.78},
	{"skill": "oauth", "confidence": 0.65}
	],
	"model_version": "1.0.0",
	"timestamp": "2025-01-05T15:00:00Z"
	}
	```

	#### Batch Prediction

	```bash
	curl -X POST "http://localhost:8080/predict/batch" \
	-H "Content-Type: application/json" \
	-d '{
	"issues": [
	{"issue_text": "Database connection timeout"},
	{"issue_text": "UI button not responding"}
	]
	}'
	```

	#### List Predictions

	```bash
	curl "http://localhost:8080/predictions?limit=10&skip=0"
	```

	#### Health Check

	```bash
	curl "http://localhost:8080/health"
	```

	Response:
	```json
	{
	"status": "healthy",
	"model_loaded": true,
	"model_version": "1.0.0"
	}
	```

	### Makefile Shortcuts

	```bash
	make test-api-health # Test health endpoint
	make test-api-predict # Test prediction
	make test-api-list # List predictions
	make test-api-all # Run all API tests
	```

	---

	## 3. GUI (Streamlit)

	### Access Points

	\| Environment \| URL \|
	\|-------------\|-----\|
	\| Local (Docker) \| http://localhost:8501 \|
	\| Production \| https://dacrow13-hopcroft-skill-classification.hf.space \|

	### Features

	- Real-time Prediction: Instant skill classification
	- Confidence Scores: Probability for each predicted skill
	- Multiple Input Modes: Quick input, detailed input, examples
	- API Health Indicator: Connection status in sidebar

	### User Interface

	#### Main Dashboard

	![Main Dashboard](./img/gui_main_dashboard.png)

	The sidebar displays:
	- API connection status
	- Confidence threshold slider
	- Model information

	#### Quick Input Mode

	![Quick Input](./img/gui_quick_input.png)

	1. Paste GitHub issue text
	2. Click "Predict Skills"
	3. View results instantly

	#### Detailed Input Mode

	![Detailed Input](./img/gui_detailed_input.png)

	Optional metadata fields:
	- Repository name
	- PR number
	- Extended description

	#### Prediction Results

	![Results](./img/gui_detailed.png)

	Results display:
	- Top-5 predicted skills with confidence bars
	- Full predictions table with filtering
	- Processing time metrics
	- Raw JSON response (expandable)

	#### Example Gallery

	![Examples](./img/gui_ex.png)

	Pre-loaded test cases:
	- Authentication bugs
	- ML feature requests
	- Database issues
	- UI enhancements

	---

	## 4. Load Testing (Locust)

	### Installation

	```bash
	pip install locust
	```

	### Configuration

	The Locust configuration is in `monitoring/locust/locustfile.py`:

	\| Task \| Weight \| Endpoint \|
	\|------\|--------\|----------\|
	\| Single Prediction \| 60% (weight: 3) \| `POST /predict` \|
	\| Batch Prediction \| 20% (weight: 1) \| `POST /predict/batch` \|
	\| Monitoring \| 20% (weight: 1) \| `GET /health`, `/predictions` \|

	### Running Load Tests

	#### Web UI Mode

	```bash
	cd monitoring/locust
	locust
	```

	Then open: http://localhost:8089

	Configure in the Web UI:
	- Number of users: Total concurrent users
	- Spawn rate: Users per second to add
	- Host: Target URL (e.g., `http://localhost:8080`)

	#### Headless Mode

	```bash
	locust --headless \
	--users 50 \
	--spawn-rate 10 \
	--run-time 5m \
	--host http://localhost:8080 \
	--csv results
	```

	### Target URLs

	\| Environment \| Host URL \|
	\|-------------\|----------\|
	\| Local Docker \| `http://localhost:8080` \|
	\| Local Dev \| `http://localhost:8000` \|
	\| HF Spaces \| `https://dacrow13-hopcroft-skill-classification.hf.space` \|

	### Interpreting Results

	\| Metric \| Description \| Target \|
	\|--------\|-------------\|--------\|
	\| RPS \| Requests per second \| Higher = better \|
	\| Median Response Time \| 50th percentile latency \| < 500ms \|
	\| 95th Percentile \| Worst-case latency \| < 2s \|
	\| Failure Rate \| Percentage of errors \| < 1% \|

	![Locust Results](./img/locust.png)

	---

	## 5. Monitoring (Prometheus & Grafana)

	### Access Points

	Local Development:

	\| Service \| URL \|
	\|---------\|-----\|
	\| Prometheus \| http://localhost:9090 \|
	\| Grafana \| http://localhost:3000 \|
	\| Pushgateway \| http://localhost:9091 \|

	Hugging Face Spaces (Production):

	\| Service \| URL \|
	\|---------\|-----\|
	\| Prometheus \| https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ \|
	\| Grafana \| https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ \|

	### Prometheus Metrics

	Access the metrics endpoint: http://localhost:8080/metrics

	#### Available Metrics

	\| Metric \| Type \| Description \|
	\|--------\|------\|-------------\|
	\| `hopcroft_requests_total` \| Counter \| Total requests by method/endpoint \|
	\| `hopcroft_request_duration_seconds` \| Histogram \| Request latency distribution \|
	\| `hopcroft_in_progress_requests` \| Gauge \| Currently processing requests \|
	\| `hopcroft_prediction_processing_seconds` \| Summary \| Model inference time \|

	#### Useful PromQL Queries

	Request Rate (per second)
	```promql
	rate(hopcroft_requests_total[1m])
	```

	Average Latency
	```promql
	rate(hopcroft_request_duration_seconds_sum[5m]) / rate(hopcroft_request_duration_seconds_count[5m])
	```

	In-Progress Requests
	```promql
	hopcroft_in_progress_requests
	```

	Model Prediction Time (P90)
	```promql
	hopcroft_prediction_processing_seconds{quantile="0.9"}
	```

	### Grafana Dashboards

	The pre-configured dashboard includes:

	\| Panel \| Description \|
	\|-------\|-------------\|
	\| Request Rate \| Real-time requests per second \|
	\| Request Latency (p50, p95) \| Response time percentiles \|
	\| In-Progress Requests \| Currently processing requests \|
	\| Error Rate (5xx) \| Percentage of failed requests \|
	\| Model Prediction Time \| Average model inference latency \|
	\| Requests by Endpoint \| Traffic distribution per endpoint \|

	### Data Drift Detection

	#### Prepare Baseline (One-time)

	```bash
	cd monitoring/drift/scripts
	python prepare_baseline.py
	```

	#### Run Drift Check

	```bash
	python run_drift_check.py
	```

	#### Verify Results

	```bash
	# Check Pushgateway
	curl http://localhost:9091/metrics \| grep drift

	# PromQL queries
	drift_detected
	drift_p_value
	drift_distance
	```

	### Alerting Rules

	Pre-configured alerts in `monitoring/prometheus/alert_rules.yml`:

	\| Alert \| Condition \| Severity \|
	\|-------\|-----------\|----------\|
	\| `ServiceDown` \| Target down for 5m \| Critical \|
	\| `HighErrorRate` \| 5xx > 10% for 5m \| Warning \|
	\| `SlowRequests` \| P95 > 2s \| Warning \|

	### Starting Monitoring Stack

	```bash
	# Start all monitoring services
	docker compose up -d

	# Verify containers
	docker compose ps

	# Check Prometheus targets
	curl http://localhost:9090/targets
	```

	---

	## Troubleshooting

	### Common Issues

	#### API Returns 500 Error

	1. Check `.env` credentials are correct
	2. Restart services: `docker compose down && docker compose up -d`
	3. Verify model files: `docker exec hopcroft-api ls -la /app/models/`

	#### GUI Shows "API Unavailable"

	1. Wait 30-60 seconds for API initialization
	2. Check API health: `curl http://localhost:8080/health`
	3. View logs: `docker compose logs hopcroft-api`

	#### Port Already in Use

	```bash
	# Check port usage
	netstat -ano \| findstr :8080

	# Stop conflicting containers
	docker compose down
	```

	#### DVC Pull Fails

	```bash
	# Clean cache and retry
	rm -rf .dvc/cache
	dvc pull
	```