|
|
--- |
|
|
title: Hopcroft Skill Classification |
|
|
emoji: π§ |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: docker |
|
|
app_port: 7860 |
|
|
api_docs_url: /docs |
|
|
--- |
|
|
|
|
|
# Hopcroft Skill Classification |
|
|
|
|
|
[](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml) |
|
|
[](https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft) |
|
|
[](https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow) |
|
|
|
|
|
**Multi-label skill classification for GitHub issues and pull requests** β Automatically identify technical skills required to resolve software issues using machine learning. |
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
Hopcroft is an ML-enabled system that classifies GitHub issues into 217 technical skill categories, enabling automated developer assignment and optimized resource allocation. Built following professional MLOps and Software Engineering standards. |
|
|
|
|
|
### Key Features |
|
|
|
|
|
- π― **Multi-label Classification**: Predict multiple skills per issue |
|
|
- π **REST API**: FastAPI with Swagger documentation |
|
|
- π₯οΈ **Web Interface**: Streamlit GUI for interactive predictions |
|
|
- π **Monitoring**: Prometheus/Grafana dashboards with drift detection |
|
|
- π **CI/CD**: GitHub Actions with Docker deployment |
|
|
- π **Experiment Tracking**: MLflow on DagsHub |
|
|
|
|
|
--- |
|
|
|
|
|
## Architecture |
|
|
|
|
|
```mermaid |
|
|
graph TB |
|
|
subgraph "Data Layer" |
|
|
A[(SkillScope DB)] --> B[Feature Engineering] |
|
|
B --> C[TF-IDF / Embeddings] |
|
|
end |
|
|
|
|
|
subgraph "ML Pipeline" |
|
|
C --> D[Model Training] |
|
|
D --> E[(MLflow Tracking)] |
|
|
D --> F[Random Forest Model] |
|
|
end |
|
|
|
|
|
subgraph "Serving Layer" |
|
|
F --> G[FastAPI Service] |
|
|
G --> H[predict endpoint] |
|
|
G --> I[predictions endpoint] |
|
|
G --> J[health endpoint] |
|
|
end |
|
|
|
|
|
subgraph "Frontend" |
|
|
G --> K[Streamlit GUI] |
|
|
end |
|
|
|
|
|
subgraph "Monitoring" |
|
|
G --> L[Prometheus] |
|
|
L --> M[Grafana] |
|
|
N[Drift Detection] --> L |
|
|
end |
|
|
|
|
|
subgraph "Deployment" |
|
|
O[GitHub Actions] --> P[Docker Build] |
|
|
P --> Q[HF Spaces] |
|
|
end |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Documentation |
|
|
|
|
|
| Document | Description | |
|
|
|----------|-------------| |
|
|
| π [Milestone Summaries](docs/milestone_summaries.md) | All 6 project phases documented | |
|
|
| π [User Guide](docs/user_guide.md) | Setup, API, GUI, testing, monitoring | |
|
|
| ποΈ [Design Choices](docs/design_choices.md) | Technical decisions & rationale | |
|
|
| π― [ML Canvas](docs/ML%20Canvas.md) | Requirements engineering framework | |
|
|
| β
[Testing & Validation](docs/testing_and_validation.md) | QA strategy & results | |
|
|
| π [Model Card](models/README.md) | Model details & performance | |
|
|
| π [Dataset Card](data/README.md) | Dataset details & preprocessing | |
|
|
--- |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Docker (Recommended) |
|
|
|
|
|
```bash |
|
|
# Clone and configure |
|
|
git clone https://github.com/se4ai2526-uniba/Hopcroft.git |
|
|
cd Hopcroft |
|
|
cp .env.example .env |
|
|
# Edit .env with your DagsHub credentials |
|
|
|
|
|
# Start services |
|
|
docker compose -f docker/docker-compose.yml up -d --build |
|
|
``` |
|
|
|
|
|
**Access (Local):** |
|
|
- π **API Docs**: http://localhost:8080/docs |
|
|
- π₯οΈ **GUI**: http://localhost:8501 |
|
|
- β€οΈ **Health**: http://localhost:8080/health |
|
|
|
|
|
### Local Development |
|
|
|
|
|
```bash |
|
|
# Setup environment |
|
|
python -m venv venv && source venv/bin/activate # or venv\Scripts\activate on Windows |
|
|
pip install -r requirements.txt && pip install -e . |
|
|
|
|
|
# Start API |
|
|
make api-dev |
|
|
|
|
|
# Start GUI (new terminal) |
|
|
streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
``` |
|
|
βββ hopcroft_skill_classification_tool_competition/ |
|
|
β βββ main.py # FastAPI application |
|
|
β βββ streamlit_app.py # Streamlit GUI |
|
|
β βββ features.py # Feature engineering |
|
|
β βββ modeling/ # Training & prediction |
|
|
β βββ config.py # Configuration |
|
|
βββ data/ # DVC-tracked datasets |
|
|
βββ models/ # DVC-tracked models |
|
|
βββ tests/ # Pytest test suites |
|
|
βββ monitoring/ # Prometheus, Grafana, Locust |
|
|
βββ docker/ # Docker configurations |
|
|
βββ docs/ # Documentation |
|
|
βββ .github/workflows/ # CI/CD pipelines |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## API Endpoints |
|
|
|
|
|
| Method | Endpoint | Description | |
|
|
|--------|----------|-------------| |
|
|
| `POST` | `/predict` | Classify single issue | |
|
|
| `POST` | `/predict/batch` | Batch classification | |
|
|
| `GET` | `/predictions` | List recent predictions | |
|
|
| `GET` | `/predictions/{id}` | Get by MLflow run ID | |
|
|
| `GET` | `/health` | Health check | |
|
|
| `GET` | `/metrics` | Prometheus metrics | |
|
|
|
|
|
**Example:** |
|
|
```bash |
|
|
curl -X POST "http://localhost:8080/predict" \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{"issue_text": "Fix OAuth2 authentication bug"}' |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Live Deployment |
|
|
- **API**: https://dacrow13-hopcroft-skill-classification.hf.space/docs |
|
|
- **GUI**: https://dacrow13-hopcroft-skill-classification.hf.space |
|
|
- **MLflow**: https://dagshub.com/se4ai2526-uniba/Hopcroft/experiments |
|
|
- **Prometheus**: https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ |
|
|
- **Grafana**: https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ |
|
|
- **Betterstack**: Alerting configured. [Alert System Evidence](monitoring/screenshots) |
|
|
|
|
|
--- |
|
|
|
|
|
## Development |
|
|
|
|
|
```bash |
|
|
# Run tests |
|
|
make test-all # All tests |
|
|
make test-behavioral # ML behavioral tests |
|
|
make validate-deepchecks # Data validation |
|
|
|
|
|
# Lint & format |
|
|
make lint # Check code style |
|
|
make format # Auto-fix issues |
|
|
|
|
|
# Training |
|
|
make train-baseline-tfidf # Train baseline model |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
This project was developed as part of the SE4AI 2025-26 course at the University of Bari. |