
### Intelligent moderation for YouTube comments
π **English** Β· [EspaΓ±ol](README.es.md)










---
## Project description
**SignalMod** is an intelligent moderation assistant for YouTube comments. It automatically classifies each comment as **Safe** or **Toxic**, returns a probability between 0 and 1, and tags toxicity categories (insult, threat, identity hate, obscene content).
It is built around the team's **hybrid meta-feature stacking** model β frozen Toxic-BERT embeddings combined with metadata features and a regularised logistic regression β reaching **F1 = 0.805** with a trainβtest gap of **2.54 pp** on the project's 200-sample test split.
The product ships as a FastAPI REST service plus a React SPA that mimics the YouTube Watch experience: pick a video, the API fetches the latest 50 comments via the YouTube Data API, scores them, and persists every prediction in Supabase so any visitor can see the full history.
---
## Tools and languages
### Languages
- **Python 3.12** β backend, ML pipelines, evaluation.
- **TypeScript + React 18** β frontend SPA.
- **SQL (PostgreSQL via Supabase)** β predictions persistence.
### Backend
- **FastAPI 0.136** β REST API, Pydantic schemas, lifespan model loading.
- **Uvicorn** β ASGI server with hot reload.
- **scikit-learn 1.8** β TF-IDF baseline + meta-learner Logistic Regression.
- **Optuna** β hyperparameter search for the TF-IDF baseline.
- **PyTorch 2.x + Transformers 5.9** β frozen `unitary/toxic-bert` for CLS embeddings.
- **spaCy + NLTK** β lemmatisation, stopwords, regex-based cleanup.
- **MLflow** β experiment tracking.
- **Supabase Python SDK** β predictions persistence with anonymous RLS policies.
- **google-api-python-client** β YouTube Data API v3 integration.
### Frontend
- **React 18 + Vite 5 + TypeScript** β SPA with hot module reload.
- **CSS modules** β YouTube-like dark theme.
### Tooling and ops
- **uv** β Python package and venv manager (`pyproject.toml` + `uv.lock`).
- **pnpm** β frontend package manager.
- **Docker + Docker Compose** β single-container deploy serving API + built SPA.
- **GNU Make** β `make dev`, `make install`, `make build`, `make docker`.
- **Render** β free-tier deploy via `render.yaml` blueprint.
- **Pytest** β unit tests for API contracts and preprocessing.
---
## Project architecture
```
Project_9_Equipo3/
βββ configs/ # YAML configs for pipelines and inference catalog
β βββ pipeline.yaml # Training data paths, target columns, CV folds
β βββ features.yaml # Preprocessing and TF-IDF settings
β βββ model_catalog.yaml # Inference catalog (3 swappable models)
β βββ best_params.yaml # Optuna winner for the LR baseline
β βββ suggested_videos.yaml # YouTube IDs shown in the Up-next rail
β βββ *_training.yaml # Training profiles (golden baseline, expert, hybrid, β¦)
βββ data/ # Raw and processed datasets (git-ignored)
βββ docs/ # API.md, PIPELINE.md, ARCHITECTURE.md, DEPLOY.md
β βββ assets/signalmod_logo.png # Brand assets
βββ frontend/ # React + Vite SPA
β βββ public/signalmod_logo.png # Logo served as static asset
β βββ src/
β βββ api/ # Typed HTTP client
β βββ components/ # Layout, CommentRow, SuggestedRail, ModelBanner
β βββ context/ # Global app state (active model, threshold)
β βββ hooks/ # useDebouncedPredict
β βββ pages/ # WatchPage, HubPage, SettingsPage
β βββ utils/ # toxicityColor, randomUsername, relativeTime
βββ models/
β βββ baseline/lr_tfidf.joblib # Optuna-tuned LR baseline
β βββ production_final/ # meta_stack_final.joblib β production artifact
βββ notebooks/
β βββ 01β04 # EDA, preprocessing, TF-IDF, baseline LR
β βββ 12 # Golden baseline (frozen Toxic-BERT)
β βββ 14 # Final meta-stacking β production artifact
β βββ archive_attempts/ # Earlier experiments preserved for reproducibility
βββ reports/ # Metrics, plots, EDA figures, summary.csv
βββ src/
β βββ api/ # FastAPI app
β β βββ main.py # Lifespan, CORS, static SPA mount
β β βββ routes/ # health, models, predict (+ /predictions), videos
β β βββ schemas.py # Pydantic request/response models
β β βββ services.py # predict_single, to_predict_response
β β βββ state.py # Shared app state
β β βββ youtube.py # YouTube Data API fetch + suggested metadata
β βββ data/ # Loader, dual loader for hybrid pipelines
β βββ db/ # Supabase client + save_prediction helpers
β βββ evaluation/ # Evaluator, threshold tuning, stable CV
β βββ experiments/ # Notebook 13 / 14 script versions
β βββ features/ # text_preprocessor, vectorizer, metadata, augmentation
β βββ models/ # baseline (LR/RF/XGBoost), hybrid_ensemble, metadata_lr
β βββ pipeline/ # run_pipeline + per-strategy variants
β βββ service/ # ModelService, meta_stack_predictor, model_catalog
β βββ utils/ # Logger
βββ supabase/predictions_setup.sql # SQL to create the predictions table + RLS policies
βββ tests/ # Pytest suite
βββ Dockerfile # Multi-stage build (frontend + uv backend)
βββ docker-compose.yml # One-container deploy serving API + SPA
βββ render.yaml # Render blueprint (web service + static site)
βββ Procfile # Render process declaration
βββ Makefile # make dev / install / build / docker / test
βββ pyproject.toml + uv.lock # Python dependencies pinned with uv
βββ README.md / README.es.md # English / Spanish documentation
```
### Data flow
```
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β React SPA (Vite) http://localhost:5173β
β Layout Β· Watch Β· Hub Β· Settings β
ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ
β HTTP JSON (Vite proxy β :8000)
ββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β FastAPI http://localhost:8000β
β /predict /predict-batch /predict-video β
β /predictions (GET β Supabase history) β
β /models /models/select /model-info β
β /videos/suggested /health β
ββββββββ¬ββββββββββββββββββββββββββββββ¬ββββββββββββ
β β
ββββββββββββββββΌββββββββββββββ βββββββββββββββΌβββββββββββββββ
β ModelService β β YouTube Data API v3 β
β Β· local joblib β β Β· video metadata β
β Β· hf_remote β β Β· 50 newest comments β
β Β· meta_stack (production) β β β
ββββββββ¬ββββββββββββββββββββββ ββββββββββββββββββββββββββββββ
β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ
β Supabase (PostgreSQL) β
β table: predictions(id, created_at, text, video_id, β
β probability, is_toxic, labels, β¦) β
β RLS: anon insert + anon select β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Model catalog (swappable from the UI)
| Model | Type | F1 (test) | Trainβtest gap | Threshold | Latency | Default |
| -------------------------------- | ----------- | --------- | -------------- | --------- | ------- | ------- |
| **Meta-Feature Stacking** | Hybrid | **0.805** | **2.54 pp** | **0.381** | ~400 ms | **Yes** |
| Frozen Toxic-BERT | Transformer | 0.790 | 0.16 pp | 0.120 | ~400 ms | No |
| LR + TF-IDF (Optuna) | sklearn | 0.758 | 4.76 pp | 0.500 | < 50 ms | No |
The production model concatenates the frozen `[CLS]` embedding from `unitary/toxic-bert` (768-d) with hand-crafted metadata features (length, uppercase ratio, emoji densityβ¦), scales them with `StandardScaler`, and feeds them into a `LogisticRegression(C=0.001)` meta-learner.
---
## Setup & run
### 1. Prerequisites
| Tool | macOS / Linux | Windows |
| ----------- | ----------------------------------- | --------------------------------------------------------- |
| **Python 3.12** | `brew install python@3.12` | [python.org/downloads](https://www.python.org/downloads/) (check *Add Python to PATH*) |
| **uv** | `curl -LsSf https://astral.sh/uv/install.sh \| sh` | `powershell -c "irm https://astral.sh/uv/install.ps1 \| iex"` |
| **Node.js 18+** | `brew install node` | [nodejs.org](https://nodejs.org/) (LTS) |
| **pnpm** | `npm i -g pnpm` | `npm i -g pnpm` |
| **Make** *(optional)* | already installed | `winget install GnuWin32.Make` (or use WSL) |
### 2. Clone & configure
```bash
git clone https://github.com/Bootcamp-IA-P6/Project_9_Equipo3.git
cd Project_9_Equipo3
cp .env.example .env
# Fill: YOUTUBE_API_KEY, SUPABASE_URL, SUPABASE_KEY
```
> **Windows PowerShell**: replace `cp` with `Copy-Item .env.example .env`.
Paste `supabase/predictions_setup.sql` into the Supabase SQL editor before the first run (creates the `predictions` table + RLS policies).
### 3. Run β three ways
#### Option A β With Makefile (recommended on macOS / Linux / WSL)
```bash
make install # uv sync + pnpm install
make dev # FastAPI :8000 + Vite :5173
```
| Command | What it does |
| ------------- | --------------------------------------------- |
| `make install`| Install Python + frontend deps |
| `make dev` | Start API and UI in parallel (Ctrl+C stops both) |
| `make api` | API only |
| `make ui` | UI only |
| `make build` | Build the SPA into `frontend/dist` |
| `make test` | Run Pytest |
| `make docker` | `docker compose up --build` |
| `make stop` | Kill anything on ports 8000 / 5173 |
| `make clean` | Remove `.venv`, `node_modules`, `dist` |
#### Option B β Manual (macOS / Linux)
Two terminals.
**Terminal 1 β API**
```bash
uv sync
uv run uvicorn src.api.main:app --reload --port 8000
```
**Terminal 2 β Frontend**
```bash
cd frontend
pnpm install
pnpm dev
```
#### Option C β Manual (Windows PowerShell)
Two terminals.
**Terminal 1 β API**
```powershell
uv sync
uv run uvicorn src.api.main:app --reload --port 8000
```
**Terminal 2 β Frontend**
```powershell
cd frontend
pnpm install
pnpm dev
```
> If `uv` is not recognised after install, close and reopen PowerShell so the new `PATH` is picked up.
### 4. Open the app
| URL | What you'll see |
| ------------------------------ | ---------------------------------------- |
| http://localhost:5173 | React SPA β Watch / Hub / Settings |
| http://localhost:8000/docs | FastAPI Swagger UI |
| http://localhost:8000/health | Health check |
### 5. Docker (one container β API + SPA built)
Same commands on **macOS / Linux / Windows**:
```bash
# Normal β keeps images and volumes for fast rebuilds
docker compose up --build
# β http://localhost:8000 Β· Ctrl+C to stop Β· docker compose down
# Ephemeral demo β Ctrl+C tears down container + image + volumes
make docker-demo
# Manual full cleanup
make docker-clean
# (equivalent to: docker compose down --rmi local --volumes --remove-orphans)
```
---
More: see [docs/PIPELINE.md](docs/PIPELINE.md) for training, [docs/API.md](docs/API.md) for endpoints, [docs/DEPLOY.md](docs/DEPLOY.md) for Render deployment.
---
## Contributors