# YouTube Toxic Comment Detector (youtube_hate_detector) [![Python](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/downloads/) [![FastAPI](https://img.shields.io/badge/FastAPI-0.136-009688.svg)](https://fastapi.tiangolo.com/) [![React](https://img.shields.io/badge/React-UI-61DAFB.svg)](https://react.dev/) [![Docker](https://img.shields.io/badge/docker-compose-2496ED.svg)](https://docs.docker.com/compose/) **Español:** [README.es.md](README.es.md) Automated **Safe vs Toxic** classification for YouTube-style comments. Production stack: **FastAPI** (REST) + **React** (YouTube Watch UI). Default model: **Logistic Regression + TF-IDF** (`models/final_model.joblib`). --- ## Clone and layout ```bash git clone cd youtube_hate_detector # use this folder name locally (team convention) ``` ``` youtube_hate_detector/ ├── configs/ # pipeline, features, model_catalog, suggested_videos ├── frontend/ # React SPA (Vite) ├── models/ # final_model.joblib, experiments/ ├── src/ │ ├── api/ # FastAPI routes │ └── service/ # ModelService (inference) ├── pyproject.toml # uv dependencies ├── uv.lock └── docker-compose.yml ``` --- ## How to use FastAPI The API loads `ModelService` once at startup and serves JSON only (the React app is the UI). ```bash cp .env.example .env uv sync # baseline (LR model only) uv sync --extra hf # required for DistilBERT / toxic-bert / Fine-tuned HF models uv run uvicorn src.api.main:app --reload --port 8000 ``` Verify HF deps: `uv run python -c "import transformers; print('ok')"`. | Resource | URL | |----------|-----| | Swagger | http://localhost:8000/docs | | Health | http://localhost:8000/health | **Main endpoints** | Method | Path | Description | |--------|------|-------------| | `POST` | `/predict` | Score one comment `{ "text", "threshold" }` | | `POST` | `/predict-video` | Fetch YouTube comments + score `{ "url", "max_comments", "threshold" }` | | `GET` | `/videos/suggested` | Metadata for right-rail videos (from `configs/suggested_videos.yaml`) | | `GET` | `/models` | Available models | | `GET` | `/models/status` | Per-model availability (HF deps, local weights) | | `PUT` | `/model/{name}` | Switch active model (warmup-validated) | Set `YOUTUBE_API_KEY` in `.env` for real comments and suggested-video thumbnails. **Change models without UI changes:** edit [`configs/model_catalog.yaml`](configs/model_catalog.yaml), then restart the API or use Settings in the app. --- ## React UI (local dev) ```bash # Terminal 1 — API uv run uvicorn src.api.main:app --reload --port 8000 # Terminal 2 — frontend (proxies API) cd frontend && npm install && npm run dev ``` Open http://localhost:5173 — Watch page with staged demo player, real suggested videos (click to load comments), English UI. --- ## Docker ```bash export YOUTUBE_API_KEY=your_key # optional but recommended docker compose up --build # LR model only (default) # Hugging Face models (transformers + torch; larger image): INSTALL_HF=1 docker compose build --build-arg INSTALL_HF=1 INSTALL_HF=1 docker compose up ``` | URL | Service | |-----|---------| | http://localhost:8000 | API + built React SPA | | http://localhost:8000/docs | Swagger | Container: `youtube_hate_detector-app`. --- ## Training (unchanged) ```bash uv run python -m src.pipeline.run_pipeline --model lr ``` See [docs/PIPELINE.md](docs/PIPELINE.md). --- ## Configuration | File | Purpose | |------|---------| | `.env` | Secrets (`YOUTUBE_API_KEY`, `MODEL_NAME`) | | `configs/model_catalog.yaml` | Inference models for API/UI | | `configs/suggested_videos.yaml` | YouTube IDs for the suggested rail | | `configs/pipeline.yaml` | Training data paths | --- ## Tests ```bash uv sync --extra dev --extra hf uv run pytest ``` --- ## Briefing vs team stack | Topic | Briefing | This repo | |-------|----------|-----------| | UI | Streamlit | **React** | | API | FastAPI | **FastAPI** | | Package manager | varies | **`uv`** | Legacy Streamlit (`src/app/`) has been removed.