File size: 5,951 Bytes
d721bf1
 
 
 
 
 
 
9e1edfd
d721bf1
 
bba28e5
d721bf1
 
bba28e5
 
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
 
 
 
 
 
d721bf1
bba28e5
d721bf1
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7bf66da
 
 
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d721bf1
 
 
 
bba28e5
d721bf1
bba28e5
 
 
 
 
 
 
fab0e43
 
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
 
bba28e5
 
 
 
 
 
 
7af74d7
d721bf1
 
fab0e43
bba28e5
 
 
d721bf1
bba28e5
d721bf1
 
bba28e5
 
 
d721bf1
bba28e5
 
d721bf1
bba28e5
 
d721bf1
 
bba28e5
d721bf1
bba28e5
d721bf1
 
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
d721bf1
 
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
 
 
 
 
 
 
 
d721bf1
bba28e5
d721bf1
bba28e5
 
 
 
d721bf1
bba28e5
d721bf1
bba28e5
fab0e43
 
575dd41
 
fab0e43
 
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
 
bba28e5
 
 
 
d721bf1
bba28e5
 
 
d721bf1
bba28e5
 
d721bf1
 
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
---
title: Hopcroft Skill Classification
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
api_docs_url: /docs
---

# Hopcroft Skill Classification

[![CI Pipeline](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml/badge.svg)](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft)
[![MLflow](https://img.shields.io/badge/MLflow-Tracking-blue)](https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow)

**Multi-label skill classification for GitHub issues and pull requests** β€” Automatically identify technical skills required to resolve software issues using machine learning.

---

## Overview

Hopcroft is an ML-enabled system that classifies GitHub issues into 217 technical skill categories, enabling automated developer assignment and optimized resource allocation. Built following professional MLOps and Software Engineering standards.

### Key Features

- 🎯 **Multi-label Classification**: Predict multiple skills per issue
- πŸš€ **REST API**: FastAPI with Swagger documentation
- πŸ–₯️ **Web Interface**: Streamlit GUI for interactive predictions
- πŸ“Š **Monitoring**: Prometheus/Grafana dashboards with drift detection
- πŸ”„ **CI/CD**: GitHub Actions with Docker deployment
- πŸ“ˆ **Experiment Tracking**: MLflow on DagsHub

---

## Architecture

```mermaid
graph TB
    subgraph "Data Layer"
        A[(SkillScope DB)] --> B[Feature Engineering]
        B --> C[TF-IDF / Embeddings]
    end
    
    subgraph "ML Pipeline"
        C --> D[Model Training]
        D --> E[(MLflow Tracking)]
        D --> F[Random Forest Model]
    end
    
    subgraph "Serving Layer"
        F --> G[FastAPI Service]
        G --> H[predict endpoint]
        G --> I[predictions endpoint]
        G --> J[health endpoint]
    end
    
    subgraph "Frontend"
        G --> K[Streamlit GUI]
    end
    
    subgraph "Monitoring"
        G --> L[Prometheus]
        L --> M[Grafana]
        N[Drift Detection] --> L
    end
    
    subgraph "Deployment"
        O[GitHub Actions] --> P[Docker Build]
        P --> Q[HF Spaces]
    end
```

---

## Documentation

| Document | Description |
|----------|-------------|
| πŸ“‹ [Milestone Summaries](docs/milestone_summaries.md) | All 6 project phases documented |
| πŸ“– [User Guide](docs/user_guide.md) | Setup, API, GUI, testing, monitoring |
| πŸ—οΈ [Design Choices](docs/design_choices.md) | Technical decisions & rationale |
| 🎯 [ML Canvas](docs/ML%20Canvas.md) | Requirements engineering framework |
| βœ… [Testing & Validation](docs/testing_and_validation.md) | QA strategy & results |
| πŸ“Š [Model Card](models/README.md) | Model details & performance |
| πŸ“Š [Dataset Card](data/README.md) | Dataset details & preprocessing |
---

## Quick Start

### Docker (Recommended)

```bash
# Clone and configure
git clone https://github.com/se4ai2526-uniba/Hopcroft.git
cd Hopcroft
cp .env.example .env
# Edit .env with your DagsHub credentials

# Start services
docker compose -f docker/docker-compose.yml up -d --build
```

**Access (Local):**
- 🌐 **API Docs**: http://localhost:8080/docs
- πŸ–₯️ **GUI**: http://localhost:8501
- ❀️ **Health**: http://localhost:8080/health

### Local Development

```bash
# Setup environment
python -m venv venv && source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt && pip install -e .

# Start API
make api-dev

# Start GUI (new terminal)
streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
```

---

## Project Structure

```
β”œβ”€β”€ hopcroft_skill_classification_tool_competition/
β”‚   β”œβ”€β”€ main.py              # FastAPI application
β”‚   β”œβ”€β”€ streamlit_app.py     # Streamlit GUI
β”‚   β”œβ”€β”€ features.py          # Feature engineering
β”‚   β”œβ”€β”€ modeling/            # Training & prediction
β”‚   └── config.py            # Configuration
β”œβ”€β”€ data/                    # DVC-tracked datasets
β”œβ”€β”€ models/                  # DVC-tracked models
β”œβ”€β”€ tests/                   # Pytest test suites
β”œβ”€β”€ monitoring/              # Prometheus, Grafana, Locust
β”œβ”€β”€ docker/                  # Docker configurations
β”œβ”€β”€ docs/                    # Documentation
└── .github/workflows/       # CI/CD pipelines
```

---

## API Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/predict` | Classify single issue |
| `POST` | `/predict/batch` | Batch classification |
| `GET` | `/predictions` | List recent predictions |
| `GET` | `/predictions/{id}` | Get by MLflow run ID |
| `GET` | `/health` | Health check |
| `GET` | `/metrics` | Prometheus metrics |

**Example:**
```bash
curl -X POST "http://localhost:8080/predict" \
  -H "Content-Type: application/json" \
  -d '{"issue_text": "Fix OAuth2 authentication bug"}'
```

---

## Live Deployment
- **API**: https://dacrow13-hopcroft-skill-classification.hf.space/docs
- **GUI**: https://dacrow13-hopcroft-skill-classification.hf.space
- **MLflow**: https://dagshub.com/se4ai2526-uniba/Hopcroft/experiments
- **Prometheus**: https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/
- **Grafana**: https://dacrow13-hopcroft-skill-classification.hf.space/grafana/
- **Betterstack**: Alerting configured. [Alert System Evidence](monitoring/screenshots)

---

## Development

```bash
# Run tests
make test-all              # All tests
make test-behavioral       # ML behavioral tests
make validate-deepchecks   # Data validation

# Lint & format
make lint                  # Check code style
make format                # Auto-fix issues

# Training
make train-baseline-tfidf  # Train baseline model
```

---

## License

This project was developed as part of the SE4AI 2025-26 course at the University of Bari.