File size: 3,626 Bytes
52b0ede
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# API reference (FastAPI)

Base URL (local): `http://localhost:8000`  
Interactive docs: `/docs` (Swagger), `/redoc` (ReDoc)

Implementation: [`src/api/main.py`](../src/api/main.py)  
Inference: [`src/service/model_service.py`](../src/service/model_service.py)

---

## Endpoints

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/` | Health check and active model name |
| `GET` | `/model-info` | Metadata for the loaded model |
| `GET` | `/models` | List available models and active one |
| `PUT` | `/model/{model_name}` | Switch active model (lazy load on next predict) |
| `POST` | `/predict` | Classify one comment |
| `POST` | `/predict-batch` | Classify up to 100 comments |
| `POST` | `/predict-video` | Fetch YouTube comments and classify (needs API key or demo fallback) |

---

## `POST /predict`

**Request body**

```json
{
  "text": "Comment text here",
  "threshold": 0.5
}
```

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `text` | string | yes | 1–5000 characters, non-empty after trim |
| `threshold` | float | no | Toxic if `probability >= threshold` (default `0.5`) |

**Response**

```json
{
  "text": "Comment text here",
  "is_toxic": false,
  "probability": 0.0821,
  "labels": [],
  "model_used": "LR + TF-IDF (local)",
  "latency_ms": 15.2
}
```

| Field | Description |
|-------|-------------|
| `is_toxic` | `true` = **Toxic**, `false` = **Safe** |
| `probability` | P(toxic), 0.0–1.0 |
| `labels` | Optional category hints when toxic (keyword/heuristic or HF labels) |
| `model_used` | Active model id from `ModelService` |

**curl**

```bash
curl -s -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "Thanks for the tutorial!", "threshold": 0.5}'
```

**Toxic example**

```bash
curl -s -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "You are worthless garbage", "threshold": 0.5}'
```

---

## `POST /predict-batch`

```json
{
  "texts": ["Safe comment", "Another line"],
  "threshold": 0.5
}
```

Response includes `results` (list of predict objects), `total`, `toxic_count`, `latency_ms`.

```bash
curl -s -X POST http://localhost:8000/predict-batch \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Nice video", "I hate you"], "threshold": 0.5}'
```

---

## `POST /predict-video`

```json
{
  "url": "https://www.youtube.com/watch?v=VIDEO_ID",
  "max_comments": 50,
  "threshold": 0.5
}
```

Set `YOUTUBE_API_KEY` in `.env` for live comment fetch. Without a key, the API may use a limited fallback scraper or demo data (see implementation in `main.py`).

---

## `GET /models` and model switch

```bash
curl -s http://localhost:8000/models

curl -s -X PUT "http://localhost:8000/model/LR%20%2B%20TF-IDF%20(local)"
```

Available names match keys in `AVAILABLE_MODELS` inside `model_service.py`, for example:

- `LR + TF-IDF (local)` — default, `models/final_model.joblib`
- `DistilBERT Toxicity` — Hugging Face remote (requires `transformers`, `torch`)
- `toxic-bert (multilabel)`
- `RoBERTa Toxicity`

---

## Environment variables

| Variable | Used by | Description |
|----------|---------|-------------|
| `MODEL_NAME` | API startup | Initial model from `AVAILABLE_MODELS` |
| `YOUTUBE_API_KEY` | `/predict-video` | YouTube Data API v3 |
| `ENV` | logging / behavior | `development` or `production` |

Copy from [`.env.example`](../.env.example).

---

## Errors

| Status | When |
|--------|------|
| `422` | Invalid body (e.g. empty `text`) |
| `503` | Model not loaded yet |
| `500` | Prediction failure |