File size: 15,374 Bytes
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70cbf15
 
 
 
 
 
bba28e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
# Design Choices

Technical justification of the architectural and engineering decisions made during the Hopcroft project development, following professional MLOps and Software Engineering standards.

---

## Table of Contents

1. [Inception (Requirements Engineering)](#1-inception-requirements-engineering)
2. [Reproducibility (Versioning & Pipelines)](#2-reproducibility-versioning--pipelines)
3. [Quality Assurance](#3-quality-assurance)
4. [API (Inference Service)](#4-api-inference-service)
5. [Deployment (Containerization & CI/CD)](#5-deployment-containerization--cicd)
6. [Monitoring](#6-monitoring)

---

## 1. Inception (Requirements Engineering)

### Machine Learning Canvas

The project adopted the **Machine Learning Canvas** framework to systematically define the problem space before implementation. This structured approach ensures alignment between business objectives and technical solutions.

| Canvas Section | Application |
|----------------|-------------|
| **Prediction Task** | Multi-label classification of 217 technical skills from GitHub issue text |
| **Decisions** | Automated developer assignment based on predicted skill requirements |
| **Value Proposition** | Reduced issue resolution time, optimized resource allocation |
| **Data Sources** | SkillScope DB (7,245 PRs from 11 Java repositories) |
| **Making Predictions** | Real-time classification upon issue creation |
| **Building Models** | Iterative improvement over RF+TF-IDF baseline |
| **Monitoring** | Continuous evaluation with drift detection |

The complete ML Canvas is documented in [ML Canvas.md](./ML%20Canvas.md).

### Functional vs Non-Functional Requirements

#### Functional Requirements

| Requirement | Target | Metric |
|-------------|--------|--------|
| **Precision** | β‰₯ Baseline | True positives / Predicted positives |
| **Recall** | β‰₯ Baseline | True positives / Actual positives |
| **Micro-F1** | > Baseline | Harmonic mean across all labels |
| **Multi-label Support** | 217 skills | Simultaneous prediction of multiple labels |

#### Non-Functional Requirements

| Category | Requirement | Implementation |
|----------|-------------|----------------|
| **Reproducibility** | Auditable experiments | MLflow tracking, DVC versioning |
| **Explainability** | Interpretable predictions | Confidence scores per skill |
| **Performance** | Low latency inference | FastAPI async, model caching |
| **Scalability** | Batch processing | `/predict/batch` endpoint (max 100) |
| **Maintainability** | Clean code | Ruff linting, type hints, docstrings |

### System-First vs Model-First Development

The project adopted a **System-First** approach, prioritizing infrastructure and pipeline development before model optimization:

```
Timeline:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Phase 1: Infrastructure β”‚ Phase 2: Model Development        β”‚
β”‚ - DVC/MLflow setup      β”‚ - Feature engineering              β”‚
β”‚ - CI/CD pipeline        β”‚ - Hyperparameter tuning            β”‚
β”‚ - Docker containers     β”‚ - SMOTE/ADASYN experiments         β”‚
β”‚ - API skeleton          β”‚ - Performance optimization         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**Rationale:**
- Enables rapid iteration once infrastructure is stable
- Ensures reproducibility from day one
- Reduces technical debt during model development
- Facilitates team collaboration with shared tooling

---

## 2. Reproducibility (Versioning & Pipelines)

### Code Versioning (Git)

Standard Git workflow with branch protection:

| Branch | Purpose |
|--------|---------|
| `main` | Production-ready code |
| `feature/*` | New development |
| `milestone/*` | Grouping all features before merging into main |

### Data & Model Versioning (DVC)

**Design Decision:** Use DVC (Data Version Control) with DagsHub remote storage for large file management.

```
.dvc/config
β”œβ”€β”€ remote: origin
β”œβ”€β”€ url: https://dagshub.com/se4ai2526-uniba/Hopcroft.dvc
└── auth: basic (credentials via environment)
```

**Tracked Artifacts:**

| File | Purpose |
|------|---------|
| `data/raw/skillscope_data.db` | Original SQLite database |
| `data/processed/*.npy` | TF-IDF and embedding features |
| `models/*.pkl` | Trained models and vectorizers |

**Versioning Workflow:**
```bash
# Track new data
dvc add data/raw/new_dataset.db
git add data/raw/.gitignore data/raw/new_dataset.db.dvc

# Push to remote
dvc push
git commit -m "Add new dataset version"
git push
```

### Experiment Tracking (MLflow)

**Design Decision:** Remote MLflow instance on DagsHub for collaborative experiment tracking.

| Configuration | Value |
|---------------|-------|
| Tracking URI | `https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow` |
| Experiments | `skill_classification`, `skill_prediction_api` |

**Logged Metrics:**
- Training: precision, recall, F1-score, training time
- Inference: prediction latency, confidence scores, timestamps

**Artifact Storage:**
- Model binaries (`.pkl`)
- Vectorizers and scalers
- Hyperparameter configurations

### Auditable ML Pipeline

The pipeline is designed for complete reproducibility:

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   dataset.py │───▢│  features.py │───▢│   train.py   β”‚
β”‚   (DVC pull) β”‚    β”‚  (TF-IDF)    β”‚    β”‚  (MLflow)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                   β”‚                   β”‚
       β–Ό                   β–Ό                   β–Ό
    .dvc files         .dvc files          MLflow Run
```

---

## 3. Quality Assurance

### Testing Strategy

#### Static Analysis (Ruff)

**Design Decision:** Use Ruff as the primary linter for speed and comprehensive rule coverage.

| Configuration | Value |
|---------------|-------|
| Line Length | 88 (Black compatible) |
| Target Python | 3.10+ |
| Rule Sets | PEP 8, isort, pyflakes |

**CI Integration:**
```yaml
- name: Lint with Ruff
  run: make lint
```

#### Dynamic Testing (Pytest)

**Test Organization:**

```
tests/
β”œβ”€β”€ unit/              # Isolated function tests
β”œβ”€β”€ integration/       # Component interaction tests
β”œβ”€β”€ system/            # End-to-end tests
β”œβ”€β”€ behavioral/        # ML-specific tests
β”œβ”€β”€ deepchecks/        # Data validation
└── great expectations/ # Schema validation
```

**Markers for Selective Execution:**
```python
@pytest.mark.unit
@pytest.mark.integration
@pytest.mark.system
@pytest.mark.slow
```

### Model Validation vs Model Verification

| Concept | Definition | Implementation |
|---------|------------|----------------|
| **Validation** | Does the model fit user needs? | Micro-F1 vs baseline comparison |
| **Verification** | Is the model correctly built? | Unit tests, behavioral tests |

### Behavioral Testing

**Design Decision:** Implement CheckList-inspired behavioral tests to evaluate model robustness beyond accuracy metrics.

| Test Type | Count | Purpose |
|-----------|-------|---------|
| **Invariance** | 9 | Stability under perturbations (typos, case changes) |
| **Directional** | 10 | Expected behavior with keyword additions |
| **Minimum Functionality** | 17 | Basic sanity checks on clear examples |

**Example Invariance Test:**
```python
def test_case_insensitivity():
    """Model should predict same skills regardless of case."""
    assert predict("Fix BUG") == predict("fix bug")
```

### Data Quality Checks

#### Great Expectations (10 Tests)

**Design Decision:** Validate data at pipeline boundaries to catch quality issues early.

| Validation Point | Tests |
|------------------|-------|
| Raw Database | Schema, row count, required columns |
| Feature Matrix | No NaN/Inf, sparsity, SMOTE compatibility |
| Label Matrix | Binary format, distribution, consistency |
| Train/Test Split | No leakage, stratification |

#### Deepchecks (24 Checks)

**Suites:**
- **Data Integrity Suite** (12 checks): Duplicates, nulls, correlations
- **Train-Test Validation Suite** (12 checks): Leakage, drift, distribution

**Status:** Production-ready (96% overall score)

---

## 4. API (Inference Service)

### FastAPI Implementation

**Design Decision:** Use FastAPI for async request handling, automatic OpenAPI generation, and native Pydantic validation.

**Key Features:**
- Async lifespan management for model loading
- Middleware for Prometheus metrics collection
- Structured exception handling

### RESTful Principles

**Design Decision:** Follow REST best practices for intuitive API design.

| Principle | Implementation |
|-----------|----------------|
| **Nouns, not verbs** | `/predictions` instead of `/getPrediction` |
| **Plural resources** | `/predictions`, `/issues` |
| **HTTP methods** | GET (retrieve), POST (create) |
| **Status codes** | 200 (OK), 201 (Created), 404 (Not Found), 500 (Error) |

**Endpoint Design:**

| Method | Endpoint | Action |
|--------|----------|--------|
| `POST` | `/predict` | Create new prediction |
| `POST` | `/predict/batch` | Create batch predictions |
| `GET` | `/predictions` | List predictions |
| `GET` | `/predictions/{run_id}` | Get specific prediction |

### OpenAPI/Swagger Documentation

**Auto-generated documentation at runtime:**
- Swagger UI: `/docs`
- ReDoc: `/redoc`
- OpenAPI JSON: `/openapi.json`

**Pydantic Models for Schema Enforcement:**
```python
class IssueInput(BaseModel):
    issue_text: str
    repo_name: Optional[str] = None
    pr_number: Optional[int] = None

class PredictionResponse(BaseModel):
    run_id: str
    predictions: List[SkillPrediction]
    model_version: str
```

---

## 5. Deployment (Containerization & CI/CD)

### Docker Containerization

**Design Decision:** Multi-stage Docker builds with security best practices.

**Dockerfile Features:**
- Python 3.10 slim base image (minimal footprint)
- Non-root user for security
- DVC integration for model pulling
- Health check endpoint configuration

**Multi-Service Architecture:**

```
docker-compose.yml
β”œβ”€β”€ hopcroft-api (FastAPI)
β”‚   β”œβ”€β”€ Port: 8080
β”‚   β”œβ”€β”€ Volumes: source code, logs
β”‚   └── Health check: /health
β”‚
β”œβ”€β”€ hopcroft-gui (Streamlit)
β”‚   β”œβ”€β”€ Port: 8501
β”‚   β”œβ”€β”€ Depends on: hopcroft-api
β”‚   └── Environment: API_BASE_URL
β”‚
└── hopcroft-net (Bridge network)
```

**Design Rationale:**
- Separation of concerns (API vs GUI)
- Independent scaling
- Health-based dependency management
- Shared network for internal communication

### CI/CD Pipeline (GitHub Actions)

**Design Decision:** Implement Continuous Delivery for ML (CD4ML) with automated testing and image builds.

**Pipeline Stages:**

```yaml
Jobs:
  unit-tests:
    - Checkout code
    - Setup Python 3.10
    - Install dependencies
    - Ruff linting
    - Pytest unit tests
    - Upload test report (on failure)

  build-image:
    - Needs: unit-tests
    - Configure DVC credentials
    - Pull models
    - Build Docker image
```

**Triggers:**
- Push to `main`, `feature/*`
- Pull requests to `main`

**Secrets Management:**
- `DAGSHUB_USERNAME`: DagsHub authentication
- `DAGSHUB_TOKEN`: DagsHub access token

### Hugging Face Spaces Hosting

**Design Decision:** Deploy on HF Spaces for free GPU-enabled hosting with Docker SDK support.

**Configuration:**
```yaml
---
title: Hopcroft Skill Classification
sdk: docker
app_port: 7860
---
```

**Startup Flow:**
1. `start_space.sh` configures DVC credentials
2. Pull models from DagsHub
3. Start FastAPI (port 8000)
4. Start Streamlit (port 8501)
5. Start Nginx (port 7860) for routing

**Nginx Reverse Proxy:**
- `/` β†’ Streamlit GUI
- `/docs`, `/predict`, `/predictions` β†’ FastAPI
- `/prometheus` β†’ Prometheus metrics

---

## 6. Monitoring

### Resource-Level Monitoring

**Design Decision:** Implement Prometheus metrics for real-time observability.

| Metric | Type | Purpose |
|--------|------|---------|
| `hopcroft_requests_total` | Counter | Request volume by endpoint |
| `hopcroft_request_duration_seconds` | Histogram | Latency distribution (P50, P90, P99) |
| `hopcroft_in_progress_requests` | Gauge | Concurrent request load |
| `hopcroft_prediction_processing_seconds` | Summary | Model inference time |

**Middleware Implementation:**
```python
@app.middleware("http")
async def monitor_requests(request, call_next):
    IN_PROGRESS.inc()
    with REQUEST_LATENCY.labels(method, endpoint).time():
        response = await call_next(request)
    REQUESTS_TOTAL.labels(method, endpoint, status).inc()
    IN_PROGRESS.dec()
    return response
```

### Performance-Level Monitoring

**Model Staleness Indicators:**
- Prediction confidence trends over time
- Drift detection alerts
- Error rate monitoring

### Drift Detection Strategy

**Design Decision:** Implement statistical drift detection using Kolmogorov-Smirnov test with Bonferroni correction.

| Component | Details |
|-----------|---------|
| **Algorithm** | KS Two-Sample Test |
| **Baseline** | 1000 samples from training data |
| **Threshold** | p-value < 0.05 (Bonferroni corrected) |
| **Execution** | Scheduled via cron or manual trigger |

**Drift Types Monitored:**

| Type | Definition | Detection Method |
|------|------------|------------------|
| **Data Drift** | Feature distribution shift | KS test on input features |
| **Target Drift** | Label distribution shift | Chi-square test on predictions |
| **Concept Drift** | Relationship change | Performance degradation monitoring |

**Metrics Published to Pushgateway:**
- `drift_detected`: Binary indicator (0/1)
- `drift_p_value`: Statistical significance
- `drift_distance`: KS distance metric
- `drift_check_timestamp`: Last check time

### Alerting Configuration

**Prometheus Alert Rules:**

| Alert | Condition | Severity |
|-------|-----------|----------|
| `ServiceDown` | Target down for 5m | Critical |
| `HighErrorRate` | 5xx rate > 10% | Warning |
| `SlowRequests` | P95 latency > 2s | Warning |
| `DriftDetected` | drift_detected = 1 | Warning |

**Alertmanager Integration:**
- Severity-based routing
- Email notifications
- Inhibition rules to prevent alert storms

### Grafana Visualization

**Dashboard Panels:**
1. Request Rate (gauge)
2. Request Latency p50/p95 (time series)
3. In-Progress Requests (stat panel)
4. Error Rate 5xx (stat panel)
5. Model Prediction Time (time series)
6. Requests by Endpoint (bar chart)

**Data Sources:**
- Prometheus: Real-time metrics
- Pushgateway: Batch job metrics (drift detection)

### HF Spaces Deployment

Both Prometheus and Grafana are deployed on Hugging Face Spaces via Nginx reverse proxy:

| Service | Production URL |
|---------|----------------|
| Prometheus | `https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/` |
| Grafana | `https://dacrow13-hopcroft-skill-classification.hf.space/grafana/` |

This enables real-time monitoring of the production deployment without additional infrastructure.