Spaces:

DaCrow13
/

Hopcroft-Skill-Classification

Running

File size: 5,951 Bytes

d721bf1
 
 
 
 
 
 
9e1edfd
d721bf1
 
bba28e5
d721bf1
 
bba28e5
 
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
 
 
 
 
 
d721bf1
bba28e5
d721bf1
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7bf66da
 
 
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d721bf1
 
 
 
bba28e5
d721bf1
bba28e5
 
 
 
 
 
 
fab0e43
 
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
 
bba28e5
 
 
 
 
 
 
7af74d7
d721bf1
 
fab0e43
bba28e5
 
 
d721bf1
bba28e5
d721bf1
 
bba28e5
 
 
d721bf1
bba28e5
 
d721bf1
bba28e5
 
d721bf1
 
bba28e5
d721bf1
bba28e5
d721bf1
 
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
d721bf1
 
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5
 
 
 
 
 
 
 
d721bf1
bba28e5
d721bf1
bba28e5
 
 
 
d721bf1
bba28e5
d721bf1
bba28e5
fab0e43
 
575dd41
 
fab0e43
 
d721bf1
bba28e5
d721bf1
bba28e5
d721bf1
 
bba28e5
 
 
 
d721bf1
bba28e5
 
 
d721bf1
bba28e5
 
d721bf1
 
bba28e5
d721bf1
bba28e5
d721bf1
bba28e5

---
title: Hopcroft Skill Classification
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
api_docs_url: /docs
---

# Hopcroft Skill Classification

[![CI Pipeline](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml/badge.svg)](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft)
[![MLflow](https://img.shields.io/badge/MLflow-Tracking-blue)](https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow)

**Multi-label skill classification for GitHub issues and pull requests** — Automatically identify technical skills required to resolve software issues using machine learning.

---

## Overview

Hopcroft is an ML-enabled system that classifies GitHub issues into 217 technical skill categories, enabling automated developer assignment and optimized resource allocation. Built following professional MLOps and Software Engineering standards.

### Key Features

- 🎯 **Multi-label Classification**: Predict multiple skills per issue
- 🚀 **REST API**: FastAPI with Swagger documentation
- 🖥️ **Web Interface**: Streamlit GUI for interactive predictions
- 📊 **Monitoring**: Prometheus/Grafana dashboards with drift detection
- 🔄 **CI/CD**: GitHub Actions with Docker deployment
- 📈 **Experiment Tracking**: MLflow on DagsHub

---

## Architecture

```mermaid
graph TB
    subgraph "Data Layer"
        A[(SkillScope DB)] --> B[Feature Engineering]
        B --> C[TF-IDF / Embeddings]
    end
    
    subgraph "ML Pipeline"
        C --> D[Model Training]
        D --> E[(MLflow Tracking)]
        D --> F[Random Forest Model]
    end
    
    subgraph "Serving Layer"
        F --> G[FastAPI Service]
        G --> H[predict endpoint]
        G --> I[predictions endpoint]
        G --> J[health endpoint]
    end
    
    subgraph "Frontend"
        G --> K[Streamlit GUI]
    end
    
    subgraph "Monitoring"
        G --> L[Prometheus]
        L --> M[Grafana]
        N[Drift Detection] --> L
    end
    
    subgraph "Deployment"
        O[GitHub Actions] --> P[Docker Build]
        P --> Q[HF Spaces]
    end
```

---

## Documentation

| Document | Description |
|----------|-------------|
| 📋 [Milestone Summaries](docs/milestone_summaries.md) | All 6 project phases documented |
| 📖 [User Guide](docs/user_guide.md) | Setup, API, GUI, testing, monitoring |
| 🏗️ [Design Choices](docs/design_choices.md) | Technical decisions & rationale |
| 🎯 [ML Canvas](docs/ML%20Canvas.md) | Requirements engineering framework |
| ✅ [Testing & Validation](docs/testing_and_validation.md) | QA strategy & results |
| 📊 [Model Card](models/README.md) | Model details & performance |
| 📊 [Dataset Card](data/README.md) | Dataset details & preprocessing |
---

## Quick Start

### Docker (Recommended)

```bash
# Clone and configure
git clone https://github.com/se4ai2526-uniba/Hopcroft.git
cd Hopcroft
cp .env.example .env
# Edit .env with your DagsHub credentials

# Start services
docker compose -f docker/docker-compose.yml up -d --build
```

**Access (Local):**
- 🌐 **API Docs**: http://localhost:8080/docs
- 🖥️ **GUI**: http://localhost:8501
- ❤️ **Health**: http://localhost:8080/health

### Local Development

```bash
# Setup environment
python -m venv venv && source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt && pip install -e .

# Start API
make api-dev

# Start GUI (new terminal)
streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
```

---

## Project Structure

```
├── hopcroft_skill_classification_tool_competition/
│   ├── main.py              # FastAPI application
│   ├── streamlit_app.py     # Streamlit GUI
│   ├── features.py          # Feature engineering
│   ├── modeling/            # Training & prediction
│   └── config.py            # Configuration
├── data/                    # DVC-tracked datasets
├── models/                  # DVC-tracked models
├── tests/                   # Pytest test suites
├── monitoring/              # Prometheus, Grafana, Locust
├── docker/                  # Docker configurations
├── docs/                    # Documentation
└── .github/workflows/       # CI/CD pipelines
```

---

## API Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/predict` | Classify single issue |
| `POST` | `/predict/batch` | Batch classification |
| `GET` | `/predictions` | List recent predictions |
| `GET` | `/predictions/{id}` | Get by MLflow run ID |
| `GET` | `/health` | Health check |
| `GET` | `/metrics` | Prometheus metrics |

**Example:**
```bash
curl -X POST "http://localhost:8080/predict" \
  -H "Content-Type: application/json" \
  -d '{"issue_text": "Fix OAuth2 authentication bug"}'
```

---

## Live Deployment
- **API**: https://dacrow13-hopcroft-skill-classification.hf.space/docs
- **GUI**: https://dacrow13-hopcroft-skill-classification.hf.space
- **MLflow**: https://dagshub.com/se4ai2526-uniba/Hopcroft/experiments
- **Prometheus**: https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/
- **Grafana**: https://dacrow13-hopcroft-skill-classification.hf.space/grafana/
- **Betterstack**: Alerting configured. [Alert System Evidence](monitoring/screenshots)

---

## Development

```bash
# Run tests
make test-all              # All tests
make test-behavioral       # ML behavioral tests
make validate-deepchecks   # Data validation

# Lint & format
make lint                  # Check code style
make format                # Auto-fix issues

# Training
make train-baseline-tfidf  # Train baseline model
```

---

## License

This project was developed as part of the SE4AI 2025-26 course at the University of Bari.