SignalMod ### Intelligent moderation for YouTube comments 🌐 **English** · [Español](README.es.md) ![Python](https://img.shields.io/badge/Python-3.12-3776AB?logo=python&logoColor=white) ![FastAPI](https://img.shields.io/badge/FastAPI-0.136-009688?logo=fastapi&logoColor=white) ![React](https://img.shields.io/badge/React-18-61DAFB?logo=react&logoColor=black) ![Vite](https://img.shields.io/badge/Vite-5-646CFF?logo=vite&logoColor=white) ![PyTorch](https://img.shields.io/badge/PyTorch-2.x-EE4C2C?logo=pytorch&logoColor=white) ![Transformers](https://img.shields.io/badge/Transformers-5.9-FFD21E?logo=huggingface&logoColor=black) ![scikit-learn](https://img.shields.io/badge/scikit--learn-1.8-F7931E?logo=scikitlearn&logoColor=white) ![Supabase](https://img.shields.io/badge/Supabase-DB-3ECF8E?logo=supabase&logoColor=white) ![Docker](https://img.shields.io/badge/Docker-compose-2496ED?logo=docker&logoColor=white) ![Render](https://img.shields.io/badge/Deploy-Render-46E3B7?logo=render&logoColor=white)
--- ## Project description **SignalMod** is an intelligent moderation assistant for YouTube comments. It automatically classifies each comment as **Safe** or **Toxic**, returns a probability between 0 and 1, and tags toxicity categories (insult, threat, identity hate, obscene content). It is built around the team's **hybrid meta-feature stacking** model β€” frozen Toxic-BERT embeddings combined with metadata features and a regularised logistic regression β€” reaching **F1 = 0.805** with a train–test gap of **2.54 pp** on the project's 200-sample test split. The product ships as a FastAPI REST service plus a React SPA that mimics the YouTube Watch experience: pick a video, the API fetches the latest 50 comments via the YouTube Data API, scores them, and persists every prediction in Supabase so any visitor can see the full history. --- ## Tools and languages ### Languages - **Python 3.12** β€” backend, ML pipelines, evaluation. - **TypeScript + React 18** β€” frontend SPA. - **SQL (PostgreSQL via Supabase)** β€” predictions persistence. ### Backend - **FastAPI 0.136** β€” REST API, Pydantic schemas, lifespan model loading. - **Uvicorn** β€” ASGI server with hot reload. - **scikit-learn 1.8** β€” TF-IDF baseline + meta-learner Logistic Regression. - **Optuna** β€” hyperparameter search for the TF-IDF baseline. - **PyTorch 2.x + Transformers 5.9** β€” frozen `unitary/toxic-bert` for CLS embeddings. - **spaCy + NLTK** β€” lemmatisation, stopwords, regex-based cleanup. - **MLflow** β€” experiment tracking. - **Supabase Python SDK** β€” predictions persistence with anonymous RLS policies. - **google-api-python-client** β€” YouTube Data API v3 integration. ### Frontend - **React 18 + Vite 5 + TypeScript** β€” SPA with hot module reload. - **CSS modules** β€” YouTube-like dark theme. ### Tooling and ops - **uv** β€” Python package and venv manager (`pyproject.toml` + `uv.lock`). - **pnpm** β€” frontend package manager. - **Docker + Docker Compose** β€” single-container deploy serving API + built SPA. - **GNU Make** β€” `make dev`, `make install`, `make build`, `make docker`. - **Render** β€” free-tier deploy via `render.yaml` blueprint. - **Pytest** β€” unit tests for API contracts and preprocessing. --- ## Project architecture ``` Project_9_Equipo3/ β”œβ”€β”€ configs/ # YAML configs for pipelines and inference catalog β”‚ β”œβ”€β”€ pipeline.yaml # Training data paths, target columns, CV folds β”‚ β”œβ”€β”€ features.yaml # Preprocessing and TF-IDF settings β”‚ β”œβ”€β”€ model_catalog.yaml # Inference catalog (3 swappable models) β”‚ β”œβ”€β”€ best_params.yaml # Optuna winner for the LR baseline β”‚ β”œβ”€β”€ suggested_videos.yaml # YouTube IDs shown in the Up-next rail β”‚ └── *_training.yaml # Training profiles (golden baseline, expert, hybrid, …) β”œβ”€β”€ data/ # Raw and processed datasets (git-ignored) β”œβ”€β”€ docs/ # API.md, PIPELINE.md, ARCHITECTURE.md, DEPLOY.md β”‚ └── assets/signalmod_logo.png # Brand assets β”œβ”€β”€ frontend/ # React + Vite SPA β”‚ β”œβ”€β”€ public/signalmod_logo.png # Logo served as static asset β”‚ └── src/ β”‚ β”œβ”€β”€ api/ # Typed HTTP client β”‚ β”œβ”€β”€ components/ # Layout, CommentRow, SuggestedRail, ModelBanner β”‚ β”œβ”€β”€ context/ # Global app state (active model, threshold) β”‚ β”œβ”€β”€ hooks/ # useDebouncedPredict β”‚ β”œβ”€β”€ pages/ # WatchPage, HubPage, SettingsPage β”‚ └── utils/ # toxicityColor, randomUsername, relativeTime β”œβ”€β”€ models/ β”‚ β”œβ”€β”€ baseline/lr_tfidf.joblib # Optuna-tuned LR baseline β”‚ └── production_final/ # meta_stack_final.joblib β€” production artifact β”œβ”€β”€ notebooks/ β”‚ β”œβ”€β”€ 01–04 # EDA, preprocessing, TF-IDF, baseline LR β”‚ β”œβ”€β”€ 12 # Golden baseline (frozen Toxic-BERT) β”‚ β”œβ”€β”€ 14 # Final meta-stacking β€” production artifact β”‚ └── archive_attempts/ # Earlier experiments preserved for reproducibility β”œβ”€β”€ reports/ # Metrics, plots, EDA figures, summary.csv β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ api/ # FastAPI app β”‚ β”‚ β”œβ”€β”€ main.py # Lifespan, CORS, static SPA mount β”‚ β”‚ β”œβ”€β”€ routes/ # health, models, predict (+ /predictions), videos β”‚ β”‚ β”œβ”€β”€ schemas.py # Pydantic request/response models β”‚ β”‚ β”œβ”€β”€ services.py # predict_single, to_predict_response β”‚ β”‚ β”œβ”€β”€ state.py # Shared app state β”‚ β”‚ └── youtube.py # YouTube Data API fetch + suggested metadata β”‚ β”œβ”€β”€ data/ # Loader, dual loader for hybrid pipelines β”‚ β”œβ”€β”€ db/ # Supabase client + save_prediction helpers β”‚ β”œβ”€β”€ evaluation/ # Evaluator, threshold tuning, stable CV β”‚ β”œβ”€β”€ experiments/ # Notebook 13 / 14 script versions β”‚ β”œβ”€β”€ features/ # text_preprocessor, vectorizer, metadata, augmentation β”‚ β”œβ”€β”€ models/ # baseline (LR/RF/XGBoost), hybrid_ensemble, metadata_lr β”‚ β”œβ”€β”€ pipeline/ # run_pipeline + per-strategy variants β”‚ β”œβ”€β”€ service/ # ModelService, meta_stack_predictor, model_catalog β”‚ └── utils/ # Logger β”œβ”€β”€ supabase/predictions_setup.sql # SQL to create the predictions table + RLS policies β”œβ”€β”€ tests/ # Pytest suite β”œβ”€β”€ Dockerfile # Multi-stage build (frontend + uv backend) β”œβ”€β”€ docker-compose.yml # One-container deploy serving API + SPA β”œβ”€β”€ render.yaml # Render blueprint (web service + static site) β”œβ”€β”€ Procfile # Render process declaration β”œβ”€β”€ Makefile # make dev / install / build / docker / test β”œβ”€β”€ pyproject.toml + uv.lock # Python dependencies pinned with uv └── README.md / README.es.md # English / Spanish documentation ``` ### Data flow ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ React SPA (Vite) http://localhost:5173β”‚ β”‚ Layout Β· Watch Β· Hub Β· Settings β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ HTTP JSON (Vite proxy β†’ :8000) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ FastAPI http://localhost:8000β”‚ β”‚ /predict /predict-batch /predict-video β”‚ β”‚ /predictions (GET β€” Supabase history) β”‚ β”‚ /models /models/select /model-info β”‚ β”‚ /videos/suggested /health β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ModelService β”‚ β”‚ YouTube Data API v3 β”‚ β”‚ Β· local joblib β”‚ β”‚ Β· video metadata β”‚ β”‚ Β· hf_remote β”‚ β”‚ Β· 50 newest comments β”‚ β”‚ Β· meta_stack (production) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Supabase (PostgreSQL) β”‚ β”‚ table: predictions(id, created_at, text, video_id, β”‚ β”‚ probability, is_toxic, labels, …) β”‚ β”‚ RLS: anon insert + anon select β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Model catalog (swappable from the UI) | Model | Type | F1 (test) | Train–test gap | Threshold | Latency | Default | | -------------------------------- | ----------- | --------- | -------------- | --------- | ------- | ------- | | **Meta-Feature Stacking** | Hybrid | **0.805** | **2.54 pp** | **0.381** | ~400 ms | **Yes** | | Frozen Toxic-BERT | Transformer | 0.790 | 0.16 pp | 0.120 | ~400 ms | No | | LR + TF-IDF (Optuna) | sklearn | 0.758 | 4.76 pp | 0.500 | < 50 ms | No | The production model concatenates the frozen `[CLS]` embedding from `unitary/toxic-bert` (768-d) with hand-crafted metadata features (length, uppercase ratio, emoji density…), scales them with `StandardScaler`, and feeds them into a `LogisticRegression(C=0.001)` meta-learner. --- ## Setup & run ### 1. Prerequisites | Tool | macOS / Linux | Windows | | ----------- | ----------------------------------- | --------------------------------------------------------- | | **Python 3.12** | `brew install python@3.12` | [python.org/downloads](https://www.python.org/downloads/) (check *Add Python to PATH*) | | **uv** | `curl -LsSf https://astral.sh/uv/install.sh \| sh` | `powershell -c "irm https://astral.sh/uv/install.ps1 \| iex"` | | **Node.js 18+** | `brew install node` | [nodejs.org](https://nodejs.org/) (LTS) | | **pnpm** | `npm i -g pnpm` | `npm i -g pnpm` | | **Make** *(optional)* | already installed | `winget install GnuWin32.Make` (or use WSL) | ### 2. Clone & configure ```bash git clone https://github.com/Bootcamp-IA-P6/Project_9_Equipo3.git cd Project_9_Equipo3 cp .env.example .env # Fill: YOUTUBE_API_KEY, SUPABASE_URL, SUPABASE_KEY ``` > **Windows PowerShell**: replace `cp` with `Copy-Item .env.example .env`. Paste `supabase/predictions_setup.sql` into the Supabase SQL editor before the first run (creates the `predictions` table + RLS policies). ### 3. Run β€” three ways #### Option A β€” With Makefile (recommended on macOS / Linux / WSL) ```bash make install # uv sync + pnpm install make dev # FastAPI :8000 + Vite :5173 ``` | Command | What it does | | ------------- | --------------------------------------------- | | `make install`| Install Python + frontend deps | | `make dev` | Start API and UI in parallel (Ctrl+C stops both) | | `make api` | API only | | `make ui` | UI only | | `make build` | Build the SPA into `frontend/dist` | | `make test` | Run Pytest | | `make docker` | `docker compose up --build` | | `make stop` | Kill anything on ports 8000 / 5173 | | `make clean` | Remove `.venv`, `node_modules`, `dist` | #### Option B β€” Manual (macOS / Linux) Two terminals. **Terminal 1 β€” API** ```bash uv sync uv run uvicorn src.api.main:app --reload --port 8000 ``` **Terminal 2 β€” Frontend** ```bash cd frontend pnpm install pnpm dev ``` #### Option C β€” Manual (Windows PowerShell) Two terminals. **Terminal 1 β€” API** ```powershell uv sync uv run uvicorn src.api.main:app --reload --port 8000 ``` **Terminal 2 β€” Frontend** ```powershell cd frontend pnpm install pnpm dev ``` > If `uv` is not recognised after install, close and reopen PowerShell so the new `PATH` is picked up. ### 4. Open the app | URL | What you'll see | | ------------------------------ | ---------------------------------------- | | http://localhost:5173 | React SPA β€” Watch / Hub / Settings | | http://localhost:8000/docs | FastAPI Swagger UI | | http://localhost:8000/health | Health check | ### 5. Docker (one container β€” API + SPA built) Same commands on **macOS / Linux / Windows**: ```bash # Normal β€” keeps images and volumes for fast rebuilds docker compose up --build # β†’ http://localhost:8000 Β· Ctrl+C to stop Β· docker compose down # Ephemeral demo β€” Ctrl+C tears down container + image + volumes make docker-demo # Manual full cleanup make docker-clean # (equivalent to: docker compose down --rmi local --volumes --remove-orphans) ``` --- More: see [docs/PIPELINE.md](docs/PIPELINE.md) for training, [docs/API.md](docs/API.md) for endpoints, [docs/DEPLOY.md](docs/DEPLOY.md) for Render deployment. --- ## Contributors
AndrΓ©s Torrez
Backend Developer
Mirae Kang
Scrum Master
Jonathan Brasales
AI Developer
Roberto Molero
Product Owner
---
**SignalMod** β€” Bootcamp IA P6 Β· Team 3 Β· 2026