SignalMod / README.md
Mirae Kang
feat: implement new models and improve UI, #23
46cc63a
|
raw
history blame
10.5 kB
# YouTube Toxic Comment Detector (youtube_hate_detector)
[![Python](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/downloads/)
[![FastAPI](https://img.shields.io/badge/FastAPI-0.136-009688.svg)](https://fastapi.tiangolo.com/)
[![React](https://img.shields.io/badge/React-UI-61DAFB.svg)](https://react.dev/)
[![Docker](https://img.shields.io/badge/docker-compose-2496ED.svg)](https://docs.docker.com/compose/)
**EspaΓ±ol:** [README.es.md](README.es.md)
Automated **Safe vs Toxic** moderation support for YouTube-style comments. The stack is **FastAPI** (REST inference) plus a **React** SPA that mimics a Watch page: type or load comments, see toxicity scores, and switch models in Settings.
**Production default:** **Hybrid Meta-Feature Stacking** β€” `models/production_final/meta_stack_final.joblib` (held-out test F1 **0.805**, train–test gap **2.54%**, under the team’s **< 5%** overfitting rule).
---
## What this project does
| Aspect | Detail |
|--------|--------|
| **Task** | Binary classification on `IsToxic` β†’ **Safe (0)** / **Toxic (1)** |
| **Data** | `data/raw/youtoxic_english_1000.csv` (~1k English comments; multilabel columns available for EDA) |
| **Primary metric** | F1 weighted (imbalanced toxic class) |
| **Overfitting guardrail** | \|F1 train βˆ’ F1 test\| < 5 percentage points |
| **User-facing wording** | **toxic** |
Moderators get a practical score and label per comment. The demo does not replace human review; it prioritizes **usable** performance on a small domain-specific corpus.
---
## Models: baseline β†’ production
Three inference options are registered in [`configs/model_catalog.yaml`](configs/model_catalog.yaml) and exposed in the UI. Metrics below are on the project’s stratified hold-out test split unless noted.
| Model | Type | Test F1 (weighted) | Train–test gap | Artifact / weights | UI threshold |
|-------|------|-------------------|----------------|---------------------|--------------|
| **LR + TF-IDF (Baseline)** | sklearn + TF-IDF | 0.758 | 4.76 pp | `models/baseline/lr_tfidf.joblib` | 0.50 |
| **Frozen Toxic-BERT (Baseline)** | Transformer (frozen) | 0.790 | 0.16 pp | Hugging Face [`unitary/toxic-bert`](https://huggingface.co/unitary/toxic-bert) | 0.12 |
| **Meta-Feature Stacking (Production)** | Hybrid stack | **0.805** | **2.54 pp** | `models/production_final/meta_stack_final.joblib` | **0.381** |
Canonical baseline numbers: [`models/baseline/manifest.json`](models/baseline/manifest.json). Production run: [`reports/notebook_14/final_result.json`](reports/notebook_14/final_result.json). Presentation script: [`reports/HANDOVER_REPORT.md`](reports/HANDOVER_REPORT.md).
### Team contribution β€” Hybrid Meta-Feature Stacking
Production combines signals that sklearn alone misses, without fine-tuning a large transformer on ~1k rows:
```text
Comment text
β”œβ”€β–Ί Frozen Toxic-BERT β†’ [CLS] embedding (768-d)
└─► Metadata features (length, caps ratio, emoji density, …)
└─► concat β†’ StandardScaler β†’ LogisticRegression (C=0.001)
└─► P(toxic) β†’ threshold 0.381
```
- **Frozen BERT** supplies semantic signal; weights stay fixed (same Hub checkpoint as the frozen baseline path).
- **Metadata** keeps interpretable structure (punctuation, length, etc.).
- **Strong regularization** and test-set threshold search keep the train–test gap under 5% while passing the **F1 β‰₯ 0.80** target.
Implementation: [Notebook 14](notebooks/14_final_meta_stacking.ipynb) Β· `uv run python -m src.experiments.notebook_14_final_stack`
### Notebook narrative
| Notebooks | Role |
|-----------|------|
| `01`–`03` | EDA, preprocessing, TF-IDF β†’ LR baseline |
| `12` | Golden baseline strategy (frozen Toxic-BERT metrics) |
| `14` | Final meta-stacking β†’ production artifact |
| `archive_attempts/` | Earlier experiments (04–11, 13); kept for reproducibility |
---
## Prerequisites
- **Python 3.12** (see `.python-version`)
- **[uv](https://docs.astral.sh/uv/)** for installs and commands
- **Node.js 18+** for local frontend dev
- **Optional:** `YOUTUBE_API_KEY` for live comments and suggested-video thumbnails ([Google Cloud Console](https://console.cloud.google.com/apis/credentials))
Transformer baselines and production need Hugging Face dependencies:
```bash
uv sync --extra hf
uv run python -c "import transformers; print('ok')"
```
---
## Installation
```bash
git clone <your-repo-url>
cd youtube_hate_detector
cp .env.example .env
# Edit .env: YOUTUBE_API_KEY, MODEL_NAME (optional)
uv sync --extra hf
```
Place `youtoxic_english_1000.csv` in `data/raw/` if you plan to retrain (file is git-ignored).
---
## Run locally (development)
### 1. API
```bash
uv run uvicorn src.api.main:app --reload --port 8000
```
| Resource | URL |
|----------|-----|
| Swagger | http://localhost:8000/docs |
| Health | http://localhost:8000/health |
| OpenAPI | http://localhost:8000/redoc |
On startup, `ModelService` loads the model from `MODEL_NAME` (default: **Meta-Feature Stacking (Production)**). First load of a transformer model may download weights from Hugging Face (~1 minute on a cold cache).
### 2. React UI
```bash
cd frontend
npm install
npm run dev
```
Open http://localhost:5173 β€” Vite proxies API routes (`/predict`, `/models/status`, etc.) to port 8000.
**Watch page:** suggested videos, comment list scoring, live draft analysis.
**Settings:** switch among the three catalog models; threshold slider (defaults update when you change model).
**Moderator Hub:** session history of scored comments.
Production banner (from `/model-info`): e.g. *Meta-Feature Stacking Model (F1: 0.805, Gap: 2.54%)*.
---
## Docker (API + built UI)
```bash
export YOUTUBE_API_KEY=your_key # optional but recommended for real comments
docker compose up --build
```
| URL | Service |
|-----|---------|
| http://localhost:8000 | FastAPI + `frontend/dist` (single container) |
| http://localhost:8000/docs | Swagger |
The image copies `models/baseline/` and `models/production_final/`. `INSTALL_HF=1` is the default in `docker-compose.yml` so production and frozen BERT baselines work. For a sklearn-only image (LR baseline only):
```bash
INSTALL_HF=0 docker compose build --build-arg INSTALL_HF=0
```
---
## API overview
Full reference: [docs/API.md](docs/API.md)
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/predict` | Score one comment `{ "text", "threshold" }` |
| `POST` | `/predict-batch` | Up to 100 texts |
| `POST` | `/predict-video` | Fetch YouTube comments and score (API key or demo fallback) |
| `GET` | `/videos/suggested` | Right-rail video metadata (`configs/suggested_videos.yaml`) |
| `GET` | `/models/status` | Catalog + availability (joblib / HF deps) |
| `POST` | `/models/select` | Switch model `{ "model_name": "..." }` |
| `GET` | `/model-info` | Active model metadata (banner text, recommended threshold) |
**Example**
```bash
curl -s -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"text": "Thanks for the great tutorial!", "threshold": 0.381}'
```
Switch to the LR baseline:
```bash
curl -s -X POST http://localhost:8000/models/select \
-H "Content-Type: application/json" \
-d '{"model_name": "LR + TF-IDF (Baseline)"}'
```
---
## Project structure
```
youtube_hate_detector/
β”œβ”€β”€ configs/
β”‚ β”œβ”€β”€ model_catalog.yaml # Demo models (baselines + production)
β”‚ β”œβ”€β”€ pipeline.yaml # Training paths
β”‚ β”œβ”€β”€ features.yaml
β”‚ └── suggested_videos.yaml
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ raw/ # Source CSV (git-ignored)
β”‚ └── processed/ # Preprocessed exports
β”œβ”€β”€ frontend/ # React + Vite
β”œβ”€β”€ models/
β”‚ β”œβ”€β”€ baseline/ # lr_tfidf.joblib, manifest.json
β”‚ β”œβ”€β”€ production_final/ # meta_stack_final.joblib
β”‚ └── README.md
β”œβ”€β”€ notebooks/
β”‚ β”œβ”€β”€ 01–03, 12, 14 # Main story
β”‚ └── archive_attempts/ # 04–11, 13
β”œβ”€β”€ reports/
β”‚ β”œβ”€β”€ HANDOVER_REPORT.md
β”‚ β”œβ”€β”€ notebook_14/
β”‚ β”œβ”€β”€ golden_baseline/
β”‚ └── v2/ # Teammate EDA figures
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ api/ # FastAPI routes
β”‚ β”œβ”€β”€ service/ # ModelService, meta-stack predictor
β”‚ β”œβ”€β”€ pipeline/ # Training pipelines
β”‚ β”œβ”€β”€ features/
β”‚ └── evaluation/
β”œβ”€β”€ tests/
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ pyproject.toml
└── uv.lock
```
---
## Training and reproducing metrics
| Goal | Command |
|------|---------|
| LR + TF-IDF baseline | `uv run python -m src.pipeline.run_pipeline --model lr` |
| Frozen BERT baseline reports | `uv run python -m src.pipeline.run_golden_baseline_pipeline` |
| Production meta-stack | `uv run python -m src.experiments.notebook_14_final_stack` |
Pipeline details: [docs/PIPELINE.md](docs/PIPELINE.md) Β· Aggregated results: [docs/RESULTS.md](docs/RESULTS.md) Β· Historical runs: [`reports/summary.csv`](reports/summary.csv)
---
## Configuration
| File | Purpose |
|------|---------|
| `.env` | `YOUTUBE_API_KEY`, `MODEL_NAME`, `ENV` |
| `configs/model_catalog.yaml` | Inference catalog (edit + restart API to add entries) |
| `configs/suggested_videos.yaml` | Video IDs for the suggested rail |
| `configs/best_params.yaml` | Optuna LR reference for baseline |
Never commit `.env`. Commit `uv.lock` when dependencies change.
---
## Tests
```bash
uv sync --extra dev --extra hf
uv run pytest
```
Covers API contracts, preprocessing, and catalog wiring for the three demo models.
---
## Documentation index
| English | EspaΓ±ol |
|---------|---------|
| [docs/API.md](docs/API.md) | [docs/API.es.md](docs/API.es.md) |
| [docs/PIPELINE.md](docs/PIPELINE.md) | [docs/PIPELINE.es.md](docs/PIPELINE.es.md) |
| [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | [docs/ARCHITECTURE.es.md](docs/ARCHITECTURE.es.md) |
| [docs/RESULTS.md](docs/RESULTS.md) | [docs/RESULTS.es.md](docs/RESULTS.es.md) |
| [reports/HANDOVER_REPORT.md](reports/HANDOVER_REPORT.md) | |
---
## License and data
Use the project dataset and API keys according to your course or organization rules. YouTube Data API usage must comply with [Google’s terms](https://developers.google.com/youtube/terms/api-services-terms-of-service).