File size: 3,989 Bytes
52b0ede
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46cc63a
52b0ede
 
 
 
 
 
 
 
 
46cc63a
52b0ede
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46cc63a
 
 
 
 
 
 
 
52b0ede
46cc63a
52b0ede
46cc63a
 
 
52b0ede
 
46cc63a
52b0ede
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# API reference (FastAPI)

Base URL (local): `http://localhost:8000`  
Interactive docs: `/docs` (Swagger), `/redoc` (ReDoc)

Implementation: [`src/api/main.py`](../src/api/main.py)  
Inference: [`src/service/model_service.py`](../src/service/model_service.py)

---

## Endpoints

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/` | Health check and active model name |
| `GET` | `/model-info` | Metadata for the loaded model |
| `GET` | `/models` | List available models and active one |
| `PUT` | `/model/{model_name}` | Switch active model (lazy load on next predict) |
| `POST` | `/predict` | Classify one comment |
| `POST` | `/predict-batch` | Classify up to 100 comments |
| `POST` | `/predict-video` | Fetch YouTube comments and classify (needs API key or demo fallback) |

---

## `POST /predict`

**Request body**

```json
{
  "text": "Comment text here",
  "threshold": 0.5
}
```

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `text` | string | yes | 1–5000 characters, non-empty after trim |
| `threshold` | float | no | Toxic if `probability >= threshold` (**0.381** production, **0.5** LR baseline, **0.12** frozen BERT baseline) |

**Response**

```json
{
  "text": "Comment text here",
  "is_toxic": false,
  "probability": 0.0821,
  "labels": [],
  "model_used": "Meta-Feature Stacking (Production)",
  "latency_ms": 15.2
}
```

| Field | Description |
|-------|-------------|
| `is_toxic` | `true` = **Toxic**, `false` = **Safe** |
| `probability` | P(toxic), 0.0–1.0 |
| `labels` | Optional category hints when toxic (keyword/heuristic or HF labels) |
| `model_used` | Active model id from `ModelService` |

**curl**

```bash
curl -s -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "Thanks for the tutorial!", "threshold": 0.5}'
```

**Toxic example**

```bash
curl -s -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "You are worthless garbage", "threshold": 0.5}'
```

---

## `POST /predict-batch`

```json
{
  "texts": ["Safe comment", "Another line"],
  "threshold": 0.5
}
```

Response includes `results` (list of predict objects), `total`, `toxic_count`, `latency_ms`.

```bash
curl -s -X POST http://localhost:8000/predict-batch \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Nice video", "I hate you"], "threshold": 0.5}'
```

---

## `POST /predict-video`

```json
{
  "url": "https://www.youtube.com/watch?v=VIDEO_ID",
  "max_comments": 50,
  "threshold": 0.5
}
```

Set `YOUTUBE_API_KEY` in `.env` for live comment fetch. Without a key, the API may use a limited fallback scraper or demo data (see implementation in `main.py`).

---

## `GET /models` and model switch

Demo models from [`configs/model_catalog.yaml`](../configs/model_catalog.yaml):

| Name | Type | Artifact / weights |
|------|------|-------------------|
| `Meta-Feature Stacking (Production)` | meta_stack | `models/production_final/meta_stack_final.joblib` |
| `LR + TF-IDF (Baseline)` | local | `models/baseline/lr_tfidf.joblib` |
| `Frozen Toxic-BERT (Baseline)` | hf_remote | Hugging Face `unitary/toxic-bert` |

```bash
curl -s http://localhost:8000/models/status

curl -s -X POST http://localhost:8000/models/select \
  -H "Content-Type: application/json" \
  -d '{"model_name": "LR + TF-IDF (Baseline)"}'
```

Default at startup: `Meta-Feature Stacking (Production)` (`MODEL_NAME` in `.env`).

---

## Environment variables

| Variable | Used by | Description |
|----------|---------|-------------|
| `MODEL_NAME` | API startup | Initial model from `AVAILABLE_MODELS` |
| `YOUTUBE_API_KEY` | `/predict-video` | YouTube Data API v3 |
| `ENV` | logging / behavior | `development` or `production` |

Copy from [`.env.example`](../.env.example).

---

## Errors

| Status | When |
|--------|------|
| `422` | Invalid body (e.g. empty `text`) |
| `503` | Model not loaded yet |
| `500` | Prediction failure |