File size: 5,951 Bytes
d721bf1 9e1edfd d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 7bf66da bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 fab0e43 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 7af74d7 d721bf1 fab0e43 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 fab0e43 575dd41 fab0e43 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 d721bf1 bba28e5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
---
title: Hopcroft Skill Classification
emoji: π§
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
api_docs_url: /docs
---
# Hopcroft Skill Classification
[](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml)
[](https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft)
[](https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow)
**Multi-label skill classification for GitHub issues and pull requests** β Automatically identify technical skills required to resolve software issues using machine learning.
---
## Overview
Hopcroft is an ML-enabled system that classifies GitHub issues into 217 technical skill categories, enabling automated developer assignment and optimized resource allocation. Built following professional MLOps and Software Engineering standards.
### Key Features
- π― **Multi-label Classification**: Predict multiple skills per issue
- π **REST API**: FastAPI with Swagger documentation
- π₯οΈ **Web Interface**: Streamlit GUI for interactive predictions
- π **Monitoring**: Prometheus/Grafana dashboards with drift detection
- π **CI/CD**: GitHub Actions with Docker deployment
- π **Experiment Tracking**: MLflow on DagsHub
---
## Architecture
```mermaid
graph TB
subgraph "Data Layer"
A[(SkillScope DB)] --> B[Feature Engineering]
B --> C[TF-IDF / Embeddings]
end
subgraph "ML Pipeline"
C --> D[Model Training]
D --> E[(MLflow Tracking)]
D --> F[Random Forest Model]
end
subgraph "Serving Layer"
F --> G[FastAPI Service]
G --> H[predict endpoint]
G --> I[predictions endpoint]
G --> J[health endpoint]
end
subgraph "Frontend"
G --> K[Streamlit GUI]
end
subgraph "Monitoring"
G --> L[Prometheus]
L --> M[Grafana]
N[Drift Detection] --> L
end
subgraph "Deployment"
O[GitHub Actions] --> P[Docker Build]
P --> Q[HF Spaces]
end
```
---
## Documentation
| Document | Description |
|----------|-------------|
| π [Milestone Summaries](docs/milestone_summaries.md) | All 6 project phases documented |
| π [User Guide](docs/user_guide.md) | Setup, API, GUI, testing, monitoring |
| ποΈ [Design Choices](docs/design_choices.md) | Technical decisions & rationale |
| π― [ML Canvas](docs/ML%20Canvas.md) | Requirements engineering framework |
| β
[Testing & Validation](docs/testing_and_validation.md) | QA strategy & results |
| π [Model Card](models/README.md) | Model details & performance |
| π [Dataset Card](data/README.md) | Dataset details & preprocessing |
---
## Quick Start
### Docker (Recommended)
```bash
# Clone and configure
git clone https://github.com/se4ai2526-uniba/Hopcroft.git
cd Hopcroft
cp .env.example .env
# Edit .env with your DagsHub credentials
# Start services
docker compose -f docker/docker-compose.yml up -d --build
```
**Access (Local):**
- π **API Docs**: http://localhost:8080/docs
- π₯οΈ **GUI**: http://localhost:8501
- β€οΈ **Health**: http://localhost:8080/health
### Local Development
```bash
# Setup environment
python -m venv venv && source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt && pip install -e .
# Start API
make api-dev
# Start GUI (new terminal)
streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
```
---
## Project Structure
```
βββ hopcroft_skill_classification_tool_competition/
β βββ main.py # FastAPI application
β βββ streamlit_app.py # Streamlit GUI
β βββ features.py # Feature engineering
β βββ modeling/ # Training & prediction
β βββ config.py # Configuration
βββ data/ # DVC-tracked datasets
βββ models/ # DVC-tracked models
βββ tests/ # Pytest test suites
βββ monitoring/ # Prometheus, Grafana, Locust
βββ docker/ # Docker configurations
βββ docs/ # Documentation
βββ .github/workflows/ # CI/CD pipelines
```
---
## API Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/predict` | Classify single issue |
| `POST` | `/predict/batch` | Batch classification |
| `GET` | `/predictions` | List recent predictions |
| `GET` | `/predictions/{id}` | Get by MLflow run ID |
| `GET` | `/health` | Health check |
| `GET` | `/metrics` | Prometheus metrics |
**Example:**
```bash
curl -X POST "http://localhost:8080/predict" \
-H "Content-Type: application/json" \
-d '{"issue_text": "Fix OAuth2 authentication bug"}'
```
---
## Live Deployment
- **API**: https://dacrow13-hopcroft-skill-classification.hf.space/docs
- **GUI**: https://dacrow13-hopcroft-skill-classification.hf.space
- **MLflow**: https://dagshub.com/se4ai2526-uniba/Hopcroft/experiments
- **Prometheus**: https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/
- **Grafana**: https://dacrow13-hopcroft-skill-classification.hf.space/grafana/
- **Betterstack**: Alerting configured. [Alert System Evidence](monitoring/screenshots)
---
## Development
```bash
# Run tests
make test-all # All tests
make test-behavioral # ML behavioral tests
make validate-deepchecks # Data validation
# Lint & format
make lint # Check code style
make format # Auto-fix issues
# Training
make train-baseline-tfidf # Train baseline model
```
---
## License
This project was developed as part of the SE4AI 2025-26 course at the University of Bari. |