maurocarlu's picture
adding Production links to the root Readme
fab0e43
---
title: Hopcroft Skill Classification
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
api_docs_url: /docs
---
# Hopcroft Skill Classification
[![CI Pipeline](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml/badge.svg)](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft)
[![MLflow](https://img.shields.io/badge/MLflow-Tracking-blue)](https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow)
**Multi-label skill classification for GitHub issues and pull requests** β€” Automatically identify technical skills required to resolve software issues using machine learning.
---
## Overview
Hopcroft is an ML-enabled system that classifies GitHub issues into 217 technical skill categories, enabling automated developer assignment and optimized resource allocation. Built following professional MLOps and Software Engineering standards.
### Key Features
- 🎯 **Multi-label Classification**: Predict multiple skills per issue
- πŸš€ **REST API**: FastAPI with Swagger documentation
- πŸ–₯️ **Web Interface**: Streamlit GUI for interactive predictions
- πŸ“Š **Monitoring**: Prometheus/Grafana dashboards with drift detection
- πŸ”„ **CI/CD**: GitHub Actions with Docker deployment
- πŸ“ˆ **Experiment Tracking**: MLflow on DagsHub
---
## Architecture
```mermaid
graph TB
subgraph "Data Layer"
A[(SkillScope DB)] --> B[Feature Engineering]
B --> C[TF-IDF / Embeddings]
end
subgraph "ML Pipeline"
C --> D[Model Training]
D --> E[(MLflow Tracking)]
D --> F[Random Forest Model]
end
subgraph "Serving Layer"
F --> G[FastAPI Service]
G --> H[predict endpoint]
G --> I[predictions endpoint]
G --> J[health endpoint]
end
subgraph "Frontend"
G --> K[Streamlit GUI]
end
subgraph "Monitoring"
G --> L[Prometheus]
L --> M[Grafana]
N[Drift Detection] --> L
end
subgraph "Deployment"
O[GitHub Actions] --> P[Docker Build]
P --> Q[HF Spaces]
end
```
---
## Documentation
| Document | Description |
|----------|-------------|
| πŸ“‹ [Milestone Summaries](docs/milestone_summaries.md) | All 6 project phases documented |
| πŸ“– [User Guide](docs/user_guide.md) | Setup, API, GUI, testing, monitoring |
| πŸ—οΈ [Design Choices](docs/design_choices.md) | Technical decisions & rationale |
| 🎯 [ML Canvas](docs/ML%20Canvas.md) | Requirements engineering framework |
| βœ… [Testing & Validation](docs/testing_and_validation.md) | QA strategy & results |
| πŸ“Š [Model Card](models/README.md) | Model details & performance |
| πŸ“Š [Dataset Card](data/README.md) | Dataset details & preprocessing |
---
## Quick Start
### Docker (Recommended)
```bash
# Clone and configure
git clone https://github.com/se4ai2526-uniba/Hopcroft.git
cd Hopcroft
cp .env.example .env
# Edit .env with your DagsHub credentials
# Start services
docker compose -f docker/docker-compose.yml up -d --build
```
**Access (Local):**
- 🌐 **API Docs**: http://localhost:8080/docs
- πŸ–₯️ **GUI**: http://localhost:8501
- ❀️ **Health**: http://localhost:8080/health
### Local Development
```bash
# Setup environment
python -m venv venv && source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt && pip install -e .
# Start API
make api-dev
# Start GUI (new terminal)
streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
```
---
## Project Structure
```
β”œβ”€β”€ hopcroft_skill_classification_tool_competition/
β”‚ β”œβ”€β”€ main.py # FastAPI application
β”‚ β”œβ”€β”€ streamlit_app.py # Streamlit GUI
β”‚ β”œβ”€β”€ features.py # Feature engineering
β”‚ β”œβ”€β”€ modeling/ # Training & prediction
β”‚ └── config.py # Configuration
β”œβ”€β”€ data/ # DVC-tracked datasets
β”œβ”€β”€ models/ # DVC-tracked models
β”œβ”€β”€ tests/ # Pytest test suites
β”œβ”€β”€ monitoring/ # Prometheus, Grafana, Locust
β”œβ”€β”€ docker/ # Docker configurations
β”œβ”€β”€ docs/ # Documentation
└── .github/workflows/ # CI/CD pipelines
```
---
## API Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/predict` | Classify single issue |
| `POST` | `/predict/batch` | Batch classification |
| `GET` | `/predictions` | List recent predictions |
| `GET` | `/predictions/{id}` | Get by MLflow run ID |
| `GET` | `/health` | Health check |
| `GET` | `/metrics` | Prometheus metrics |
**Example:**
```bash
curl -X POST "http://localhost:8080/predict" \
-H "Content-Type: application/json" \
-d '{"issue_text": "Fix OAuth2 authentication bug"}'
```
---
## Live Deployment
- **API**: https://dacrow13-hopcroft-skill-classification.hf.space/docs
- **GUI**: https://dacrow13-hopcroft-skill-classification.hf.space
- **MLflow**: https://dagshub.com/se4ai2526-uniba/Hopcroft/experiments
- **Prometheus**: https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/
- **Grafana**: https://dacrow13-hopcroft-skill-classification.hf.space/grafana/
- **Betterstack**: Alerting configured. [Alert System Evidence](monitoring/screenshots)
---
## Development
```bash
# Run tests
make test-all # All tests
make test-behavioral # ML behavioral tests
make validate-deepchecks # Data validation
# Lint & format
make lint # Check code style
make format # Auto-fix issues
# Training
make train-baseline-tfidf # Train baseline model
```
---
## License
This project was developed as part of the SE4AI 2025-26 course at the University of Bari.