Spaces:

DaCrow13
/

Hopcroft-Skill-Classification

Running

App Files Files Community

Hopcroft-Skill-Classification / README.md

maurocarlu

adding Production links to the root Readme

fab0e43 19 days ago

preview code

raw

history blame contribute delete

5.95 kB

	---
	title: Hopcroft Skill Classification
	emoji: 🧠
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_port: 7860
	api_docs_url: /docs
	---

	# Hopcroft Skill Classification

	[![CI Pipeline](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml/badge.svg)](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml)
	[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft)
	[![MLflow](https://img.shields.io/badge/MLflow-Tracking-blue)](https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow)

	Multi-label skill classification for GitHub issues and pull requests — Automatically identify technical skills required to resolve software issues using machine learning.

	---

	## Overview

	Hopcroft is an ML-enabled system that classifies GitHub issues into 217 technical skill categories, enabling automated developer assignment and optimized resource allocation. Built following professional MLOps and Software Engineering standards.

	### Key Features

	- 🎯 Multi-label Classification: Predict multiple skills per issue
	- 🚀 REST API: FastAPI with Swagger documentation
	- 🖥️ Web Interface: Streamlit GUI for interactive predictions
	- 📊 Monitoring: Prometheus/Grafana dashboards with drift detection
	- 🔄 CI/CD: GitHub Actions with Docker deployment
	- 📈 Experiment Tracking: MLflow on DagsHub

	---

	## Architecture

	```mermaid
	graph TB
	subgraph "Data Layer"
	A[(SkillScope DB)] --> B[Feature Engineering]
	B --> C[TF-IDF / Embeddings]
	end

	subgraph "ML Pipeline"
	C --> D[Model Training]
	D --> E[(MLflow Tracking)]
	D --> F[Random Forest Model]
	end

	subgraph "Serving Layer"
	F --> G[FastAPI Service]
	G --> H[predict endpoint]
	G --> I[predictions endpoint]
	G --> J[health endpoint]
	end

	subgraph "Frontend"
	G --> K[Streamlit GUI]
	end

	subgraph "Monitoring"
	G --> L[Prometheus]
	L --> M[Grafana]
	N[Drift Detection] --> L
	end

	subgraph "Deployment"
	O[GitHub Actions] --> P[Docker Build]
	P --> Q[HF Spaces]
	end
	```

	---

	## Documentation

	\| Document \| Description \|
	\|----------\|-------------\|
	\| 📋 [Milestone Summaries](docs/milestone_summaries.md) \| All 6 project phases documented \|
	\| 📖 [User Guide](docs/user_guide.md) \| Setup, API, GUI, testing, monitoring \|
	\| 🏗️ [Design Choices](docs/design_choices.md) \| Technical decisions & rationale \|
	\| 🎯 [ML Canvas](docs/ML%20Canvas.md) \| Requirements engineering framework \|
	\| ✅ [Testing & Validation](docs/testing_and_validation.md) \| QA strategy & results \|
	\| 📊 [Model Card](models/README.md) \| Model details & performance \|
	\| 📊 [Dataset Card](data/README.md) \| Dataset details & preprocessing \|
	---

	## Quick Start

	### Docker (Recommended)

	```bash
	# Clone and configure
	git clone https://github.com/se4ai2526-uniba/Hopcroft.git
	cd Hopcroft
	cp .env.example .env
	# Edit .env with your DagsHub credentials

	# Start services
	docker compose -f docker/docker-compose.yml up -d --build
	```

	Access (Local):
	- 🌐 API Docs: http://localhost:8080/docs
	- 🖥️ GUI: http://localhost:8501
	- ❤️ Health: http://localhost:8080/health

	### Local Development

	```bash
	# Setup environment
	python -m venv venv && source venv/bin/activate # or venv\Scripts\activate on Windows
	pip install -r requirements.txt && pip install -e .

	# Start API
	make api-dev

	# Start GUI (new terminal)
	streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
	```

	---

	## Project Structure

	```
	├── hopcroft_skill_classification_tool_competition/
	│ ├── main.py # FastAPI application
	│ ├── streamlit_app.py # Streamlit GUI
	│ ├── features.py # Feature engineering
	│ ├── modeling/ # Training & prediction
	│ └── config.py # Configuration
	├── data/ # DVC-tracked datasets
	├── models/ # DVC-tracked models
	├── tests/ # Pytest test suites
	├── monitoring/ # Prometheus, Grafana, Locust
	├── docker/ # Docker configurations
	├── docs/ # Documentation
	└── .github/workflows/ # CI/CD pipelines
	```

	---

	## API Endpoints

	\| Method \| Endpoint \| Description \|
	\|--------\|----------\|-------------\|
	\| `POST` \| `/predict` \| Classify single issue \|
	\| `POST` \| `/predict/batch` \| Batch classification \|
	\| `GET` \| `/predictions` \| List recent predictions \|
	\| `GET` \| `/predictions/{id}` \| Get by MLflow run ID \|
	\| `GET` \| `/health` \| Health check \|
	\| `GET` \| `/metrics` \| Prometheus metrics \|

	Example:
	```bash
	curl -X POST "http://localhost:8080/predict" \
	-H "Content-Type: application/json" \
	-d '{"issue_text": "Fix OAuth2 authentication bug"}'
	```

	---

	## Live Deployment
	- API: https://dacrow13-hopcroft-skill-classification.hf.space/docs
	- GUI: https://dacrow13-hopcroft-skill-classification.hf.space
	- MLflow: https://dagshub.com/se4ai2526-uniba/Hopcroft/experiments
	- Prometheus: https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/
	- Grafana: https://dacrow13-hopcroft-skill-classification.hf.space/grafana/
	- Betterstack: Alerting configured. [Alert System Evidence](monitoring/screenshots)

	---

	## Development

	```bash
	# Run tests
	make test-all # All tests
	make test-behavioral # ML behavioral tests
	make validate-deepchecks # Data validation

	# Lint & format
	make lint # Check code style
	make format # Auto-fix issues

	# Training
	make train-baseline-tfidf # Train baseline model
	```

	---

	## License

	This project was developed as part of the SE4AI 2025-26 course at the University of Bari.