| # YouTube Toxic Comment Detector (youtube_hate_detector) |
|
|
| [](https://www.python.org/downloads/) |
| [](https://fastapi.tiangolo.com/) |
| [](https://react.dev/) |
| [](https://docs.docker.com/compose/) |
|
|
| **EspaΓ±ol:** [README.es.md](README.es.md) |
|
|
| Automated **Safe vs Toxic** classification for YouTube-style comments. Production stack: **FastAPI** (REST) + **React** (YouTube Watch UI). Default model: **Logistic Regression + TF-IDF** (`models/final_model.joblib`). |
|
|
| --- |
|
|
| ## Clone and layout |
|
|
| ```bash |
| git clone <your-repo-url> |
| cd youtube_hate_detector # use this folder name locally (team convention) |
| ``` |
|
|
| ``` |
| youtube_hate_detector/ |
| βββ configs/ # pipeline, features, model_catalog, suggested_videos |
| βββ frontend/ # React SPA (Vite) |
| βββ models/ # final_model.joblib, experiments/ |
| βββ src/ |
| β βββ api/ # FastAPI routes |
| β βββ service/ # ModelService (inference) |
| βββ pyproject.toml # uv dependencies |
| βββ uv.lock |
| βββ docker-compose.yml |
| ``` |
|
|
| --- |
|
|
| ## How to use FastAPI |
|
|
| The API loads `ModelService` once at startup and serves JSON only (the React app is the UI). |
|
|
| ```bash |
| cp .env.example .env |
| uv sync # baseline (LR model only) |
| uv sync --extra hf # required for DistilBERT / toxic-bert / Fine-tuned HF models |
| uv run uvicorn src.api.main:app --reload --port 8000 |
| ``` |
|
|
| Verify HF deps: `uv run python -c "import transformers; print('ok')"`. |
|
|
| | Resource | URL | |
| |----------|-----| |
| | Swagger | http://localhost:8000/docs | |
| | Health | http://localhost:8000/health | |
|
|
| **Main endpoints** |
|
|
| | Method | Path | Description | |
| |--------|------|-------------| |
| | `POST` | `/predict` | Score one comment `{ "text", "threshold" }` | |
| | `POST` | `/predict-video` | Fetch YouTube comments + score `{ "url", "max_comments", "threshold" }` | |
| | `GET` | `/videos/suggested` | Metadata for right-rail videos (from `configs/suggested_videos.yaml`) | |
| | `GET` | `/models` | Available models | |
| | `GET` | `/models/status` | Per-model availability (HF deps, local weights) | |
| | `PUT` | `/model/{name}` | Switch active model (warmup-validated) | |
|
|
| Set `YOUTUBE_API_KEY` in `.env` for real comments and suggested-video thumbnails. |
|
|
| **Change models without UI changes:** edit [`configs/model_catalog.yaml`](configs/model_catalog.yaml), then restart the API or use Settings in the app. |
|
|
| --- |
|
|
| ## React UI (local dev) |
|
|
| ```bash |
| # Terminal 1 β API |
| uv run uvicorn src.api.main:app --reload --port 8000 |
| |
| # Terminal 2 β frontend (proxies API) |
| cd frontend && npm install && npm run dev |
| ``` |
|
|
| Open http://localhost:5173 β Watch page with staged demo player, real suggested videos (click to load comments), English UI. |
|
|
| --- |
|
|
| ## Docker |
|
|
| ```bash |
| export YOUTUBE_API_KEY=your_key # optional but recommended |
| docker compose up --build # LR model only (default) |
| |
| # Hugging Face models (transformers + torch; larger image): |
| INSTALL_HF=1 docker compose build --build-arg INSTALL_HF=1 |
| INSTALL_HF=1 docker compose up |
| ``` |
|
|
| | URL | Service | |
| |-----|---------| |
| | http://localhost:8000 | API + built React SPA | |
| | http://localhost:8000/docs | Swagger | |
|
|
| Container: `youtube_hate_detector-app`. |
|
|
| --- |
|
|
| ## Training (unchanged) |
|
|
| ```bash |
| uv run python -m src.pipeline.run_pipeline --model lr |
| ``` |
|
|
| See [docs/PIPELINE.md](docs/PIPELINE.md). |
|
|
| --- |
|
|
| ## Configuration |
|
|
| | File | Purpose | |
| |------|---------| |
| | `.env` | Secrets (`YOUTUBE_API_KEY`, `MODEL_NAME`) | |
| | `configs/model_catalog.yaml` | Inference models for API/UI | |
| | `configs/suggested_videos.yaml` | YouTube IDs for the suggested rail | |
| | `configs/pipeline.yaml` | Training data paths | |
|
|
| --- |
|
|
| ## Tests |
|
|
| ```bash |
| uv sync --extra dev --extra hf |
| uv run pytest |
| ``` |
|
|
| --- |
|
|
| ## Briefing vs team stack |
|
|
| | Topic | Briefing | This repo | |
| |-------|----------|-----------| |
| | UI | Streamlit | **React** | |
| | API | FastAPI | **FastAPI** | |
| | Package manager | varies | **`uv`** | |
|
|
| Legacy Streamlit (`src/app/`) has been removed. |
|
|