--- title: Hopcroft Skill Classification emoji: 🧠 colorFrom: blue colorTo: green sdk: docker app_port: 7860 api_docs_url: /docs --- # Hopcroft Skill Classification [![CI Pipeline](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml/badge.svg)](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft) [![MLflow](https://img.shields.io/badge/MLflow-Tracking-blue)](https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow) **Multi-label skill classification for GitHub issues and pull requests** — Automatically identify technical skills required to resolve software issues using machine learning. --- ## Overview Hopcroft is an ML-enabled system that classifies GitHub issues into 217 technical skill categories, enabling automated developer assignment and optimized resource allocation. Built following professional MLOps and Software Engineering standards. ### Key Features - 🎯 **Multi-label Classification**: Predict multiple skills per issue - 🚀 **REST API**: FastAPI with Swagger documentation - 🖥️ **Web Interface**: Streamlit GUI for interactive predictions - 📊 **Monitoring**: Prometheus/Grafana dashboards with drift detection - 🔄 **CI/CD**: GitHub Actions with Docker deployment - 📈 **Experiment Tracking**: MLflow on DagsHub --- ## Architecture ```mermaid graph TB subgraph "Data Layer" A[(SkillScope DB)] --> B[Feature Engineering] B --> C[TF-IDF / Embeddings] end subgraph "ML Pipeline" C --> D[Model Training] D --> E[(MLflow Tracking)] D --> F[Random Forest Model] end subgraph "Serving Layer" F --> G[FastAPI Service] G --> H[predict endpoint] G --> I[predictions endpoint] G --> J[health endpoint] end subgraph "Frontend" G --> K[Streamlit GUI] end subgraph "Monitoring" G --> L[Prometheus] L --> M[Grafana] N[Drift Detection] --> L end subgraph "Deployment" O[GitHub Actions] --> P[Docker Build] P --> Q[HF Spaces] end ``` --- ## Documentation | Document | Description | |----------|-------------| | 📋 [Milestone Summaries](docs/milestone_summaries.md) | All 6 project phases documented | | 📖 [User Guide](docs/user_guide.md) | Setup, API, GUI, testing, monitoring | | 🏗️ [Design Choices](docs/design_choices.md) | Technical decisions & rationale | | 🎯 [ML Canvas](docs/ML%20Canvas.md) | Requirements engineering framework | | ✅ [Testing & Validation](docs/testing_and_validation.md) | QA strategy & results | | 📊 [Model Card](models/README.md) | Model details & performance | | 📊 [Dataset Card](data/README.md) | Dataset details & preprocessing | --- ## Quick Start ### Docker (Recommended) ```bash # Clone and configure git clone https://github.com/se4ai2526-uniba/Hopcroft.git cd Hopcroft cp .env.example .env # Edit .env with your DagsHub credentials # Start services docker compose -f docker/docker-compose.yml up -d --build ``` **Access (Local):** - 🌐 **API Docs**: http://localhost:8080/docs - 🖥️ **GUI**: http://localhost:8501 - ❤️ **Health**: http://localhost:8080/health ### Local Development ```bash # Setup environment python -m venv venv && source venv/bin/activate # or venv\Scripts\activate on Windows pip install -r requirements.txt && pip install -e . # Start API make api-dev # Start GUI (new terminal) streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py ``` --- ## Project Structure ``` ├── hopcroft_skill_classification_tool_competition/ │ ├── main.py # FastAPI application │ ├── streamlit_app.py # Streamlit GUI │ ├── features.py # Feature engineering │ ├── modeling/ # Training & prediction │ └── config.py # Configuration ├── data/ # DVC-tracked datasets ├── models/ # DVC-tracked models ├── tests/ # Pytest test suites ├── monitoring/ # Prometheus, Grafana, Locust ├── docker/ # Docker configurations ├── docs/ # Documentation └── .github/workflows/ # CI/CD pipelines ``` --- ## API Endpoints | Method | Endpoint | Description | |--------|----------|-------------| | `POST` | `/predict` | Classify single issue | | `POST` | `/predict/batch` | Batch classification | | `GET` | `/predictions` | List recent predictions | | `GET` | `/predictions/{id}` | Get by MLflow run ID | | `GET` | `/health` | Health check | | `GET` | `/metrics` | Prometheus metrics | **Example:** ```bash curl -X POST "http://localhost:8080/predict" \ -H "Content-Type: application/json" \ -d '{"issue_text": "Fix OAuth2 authentication bug"}' ``` --- ## Live Deployment - **API**: https://dacrow13-hopcroft-skill-classification.hf.space/docs - **GUI**: https://dacrow13-hopcroft-skill-classification.hf.space - **MLflow**: https://dagshub.com/se4ai2526-uniba/Hopcroft/experiments - **Prometheus**: https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ - **Grafana**: https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ - **Betterstack**: Alerting configured. [Alert System Evidence](monitoring/screenshots) --- ## Development ```bash # Run tests make test-all # All tests make test-behavioral # ML behavioral tests make validate-deepchecks # Data validation # Lint & format make lint # Check code style make format # Auto-fix issues # Training make train-baseline-tfidf # Train baseline model ``` --- ## License This project was developed as part of the SE4AI 2025-26 course at the University of Bari.