SignalMod / README.md
Ruperth's picture
feat: prepare project for Hugging Face Spaces deploy
07b3c1c
metadata
title: SignalMod
emoji: πŸ›‘οΈ
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 8000
pinned: false
license: apache-2.0
short_description: Smart moderation for YouTube comments
SignalMod

Intelligent moderation for YouTube comments

🌐 English · Español

Python FastAPI React Vite PyTorch Transformers scikit-learn Supabase Docker Render


Project description

SignalMod is an intelligent moderation assistant for YouTube comments. It automatically classifies each comment as Safe or Toxic, returns a probability between 0 and 1, and tags toxicity categories (insult, threat, identity hate, obscene content).

It is built around the team's hybrid meta-feature stacking model β€” frozen Toxic-BERT embeddings combined with metadata features and a regularised logistic regression β€” reaching F1 = 0.805 with a train–test gap of 2.54 pp on the project's 200-sample test split.

The product ships as a FastAPI REST service plus a React SPA that mimics the YouTube Watch experience: pick a video, the API fetches the latest 50 comments via the YouTube Data API, scores them, and persists every prediction in Supabase so any visitor can see the full history.


Tools and languages

Languages

  • Python 3.12 β€” backend, ML pipelines, evaluation.
  • TypeScript + React 18 β€” frontend SPA.
  • SQL (PostgreSQL via Supabase) β€” predictions persistence.

Backend

  • FastAPI 0.136 β€” REST API, Pydantic schemas, lifespan model loading.
  • Uvicorn β€” ASGI server with hot reload.
  • scikit-learn 1.8 β€” TF-IDF baseline + meta-learner Logistic Regression.
  • Optuna β€” hyperparameter search for the TF-IDF baseline.
  • PyTorch 2.x + Transformers 5.9 β€” frozen unitary/toxic-bert for CLS embeddings.
  • spaCy + NLTK β€” lemmatisation, stopwords, regex-based cleanup.
  • MLflow β€” experiment tracking.
  • Supabase Python SDK β€” predictions persistence with anonymous RLS policies.
  • google-api-python-client β€” YouTube Data API v3 integration.

Frontend

  • React 18 + Vite 5 + TypeScript β€” SPA with hot module reload.
  • CSS modules β€” YouTube-like dark theme.

Tooling and ops

  • uv β€” Python package and venv manager (pyproject.toml + uv.lock).
  • pnpm β€” frontend package manager.
  • Docker + Docker Compose β€” single-container deploy serving API + built SPA.
  • GNU Make β€” make dev, make install, make build, make docker.
  • Render β€” free-tier deploy via render.yaml blueprint.
  • Pytest β€” unit tests for API contracts and preprocessing.

Project architecture

Project_9_Equipo3/
β”œβ”€β”€ configs/                       # YAML configs for pipelines and inference catalog
β”‚   β”œβ”€β”€ pipeline.yaml              # Training data paths, target columns, CV folds
β”‚   β”œβ”€β”€ features.yaml              # Preprocessing and TF-IDF settings
β”‚   β”œβ”€β”€ model_catalog.yaml         # Inference catalog (3 swappable models)
β”‚   β”œβ”€β”€ best_params.yaml           # Optuna winner for the LR baseline
β”‚   β”œβ”€β”€ suggested_videos.yaml      # YouTube IDs shown in the Up-next rail
β”‚   └── *_training.yaml            # Training profiles (golden baseline, expert, hybrid, …)
β”œβ”€β”€ data/                          # Raw and processed datasets (git-ignored)
β”œβ”€β”€ docs/                          # API.md, PIPELINE.md, ARCHITECTURE.md, DEPLOY.md
β”‚   └── assets/signalmod_logo.png  # Brand assets
β”œβ”€β”€ frontend/                      # React + Vite SPA
β”‚   β”œβ”€β”€ public/signalmod_logo.png  # Logo served as static asset
β”‚   └── src/
β”‚       β”œβ”€β”€ api/                   # Typed HTTP client
β”‚       β”œβ”€β”€ components/            # Layout, CommentRow, SuggestedRail, ModelBanner
β”‚       β”œβ”€β”€ context/               # Global app state (active model, threshold)
β”‚       β”œβ”€β”€ hooks/                 # useDebouncedPredict
β”‚       β”œβ”€β”€ pages/                 # WatchPage, HubPage, SettingsPage
β”‚       └── utils/                 # toxicityColor, randomUsername, relativeTime
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ baseline/lr_tfidf.joblib   # Optuna-tuned LR baseline
β”‚   └── production_final/          # meta_stack_final.joblib β€” production artifact
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ 01–04                      # EDA, preprocessing, TF-IDF, baseline LR
β”‚   β”œβ”€β”€ 12                         # Golden baseline (frozen Toxic-BERT)
β”‚   β”œβ”€β”€ 14                         # Final meta-stacking β€” production artifact
β”‚   └── archive_attempts/          # Earlier experiments preserved for reproducibility
β”œβ”€β”€ reports/                       # Metrics, plots, EDA figures, summary.csv
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ api/                       # FastAPI app
β”‚   β”‚   β”œβ”€β”€ main.py                # Lifespan, CORS, static SPA mount
β”‚   β”‚   β”œβ”€β”€ routes/                # health, models, predict (+ /predictions), videos
β”‚   β”‚   β”œβ”€β”€ schemas.py             # Pydantic request/response models
β”‚   β”‚   β”œβ”€β”€ services.py            # predict_single, to_predict_response
β”‚   β”‚   β”œβ”€β”€ state.py               # Shared app state
β”‚   β”‚   └── youtube.py             # YouTube Data API fetch + suggested metadata
β”‚   β”œβ”€β”€ data/                      # Loader, dual loader for hybrid pipelines
β”‚   β”œβ”€β”€ db/                        # Supabase client + save_prediction helpers
β”‚   β”œβ”€β”€ evaluation/                # Evaluator, threshold tuning, stable CV
β”‚   β”œβ”€β”€ experiments/               # Notebook 13 / 14 script versions
β”‚   β”œβ”€β”€ features/                  # text_preprocessor, vectorizer, metadata, augmentation
β”‚   β”œβ”€β”€ models/                    # baseline (LR/RF/XGBoost), hybrid_ensemble, metadata_lr
β”‚   β”œβ”€β”€ pipeline/                  # run_pipeline + per-strategy variants
β”‚   β”œβ”€β”€ service/                   # ModelService, meta_stack_predictor, model_catalog
β”‚   └── utils/                     # Logger
β”œβ”€β”€ supabase/predictions_setup.sql # SQL to create the predictions table + RLS policies
β”œβ”€β”€ tests/                         # Pytest suite
β”œβ”€β”€ Dockerfile                     # Multi-stage build (frontend + uv backend)
β”œβ”€β”€ docker-compose.yml             # One-container deploy serving API + SPA
β”œβ”€β”€ render.yaml                    # Render blueprint (web service + static site)
β”œβ”€β”€ Procfile                       # Render process declaration
β”œβ”€β”€ Makefile                       # make dev / install / build / docker / test
β”œβ”€β”€ pyproject.toml + uv.lock       # Python dependencies pinned with uv
└── README.md  /  README.es.md     # English / Spanish documentation

Data flow

                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚  React SPA (Vite)         http://localhost:5173β”‚
                β”‚  Layout Β· Watch Β· Hub Β· Settings               β”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚ HTTP JSON  (Vite proxy β†’ :8000)
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚  FastAPI                  http://localhost:8000β”‚
                β”‚  /predict  /predict-batch  /predict-video      β”‚
                β”‚  /predictions (GET β€” Supabase history)         β”‚
                β”‚  /models  /models/select  /model-info          β”‚
                β”‚  /videos/suggested  /health                    β”‚
                β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚                             β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  ModelService              β”‚ β”‚  YouTube Data API v3       β”‚
        β”‚  Β· local joblib            β”‚ β”‚  Β· video metadata          β”‚
        β”‚  Β· hf_remote               β”‚ β”‚  Β· 50 newest comments      β”‚
        β”‚  Β· meta_stack (production) β”‚ β”‚                            β”‚
        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  Supabase (PostgreSQL)                                  β”‚
        β”‚  table: predictions(id, created_at, text, video_id,     β”‚
        β”‚                     probability, is_toxic, labels, …)   β”‚
        β”‚  RLS: anon insert + anon select                         β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Model catalog (swappable from the UI)

Model Type F1 (test) Train–test gap Threshold Latency Default
Meta-Feature Stacking Hybrid 0.805 2.54 pp 0.381 ~400 ms Yes
Frozen Toxic-BERT Transformer 0.790 0.16 pp 0.120 ~400 ms No
LR + TF-IDF (Optuna) sklearn 0.758 4.76 pp 0.500 < 50 ms No

The production model concatenates the frozen [CLS] embedding from unitary/toxic-bert (768-d) with hand-crafted metadata features (length, uppercase ratio, emoji density…), scales them with StandardScaler, and feeds them into a LogisticRegression(C=0.001) meta-learner.


Setup & run

1. Prerequisites

Tool macOS / Linux Windows
Python 3.12 brew install python@3.12 python.org/downloads (check Add Python to PATH)
uv curl -LsSf https://astral.sh/uv/install.sh | sh powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
Node.js 18+ brew install node nodejs.org (LTS)
pnpm npm i -g pnpm npm i -g pnpm
Make (optional) already installed winget install GnuWin32.Make (or use WSL)

2. Clone & configure

git clone https://github.com/Bootcamp-IA-P6/Project_9_Equipo3.git
cd Project_9_Equipo3

cp .env.example .env
# Fill: YOUTUBE_API_KEY, SUPABASE_URL, SUPABASE_KEY

Windows PowerShell: replace cp with Copy-Item .env.example .env.

Paste supabase/predictions_setup.sql into the Supabase SQL editor before the first run (creates the predictions table + RLS policies).

3. Run β€” three ways

Option A β€” With Makefile (recommended on macOS / Linux / WSL)

make install     # uv sync  +  pnpm install
make dev         # FastAPI :8000  +  Vite :5173
Command What it does
make install Install Python + frontend deps
make dev Start API and UI in parallel (Ctrl+C stops both)
make api API only
make ui UI only
make build Build the SPA into frontend/dist
make test Run Pytest
make docker docker compose up --build
make stop Kill anything on ports 8000 / 5173
make clean Remove .venv, node_modules, dist

Option B β€” Manual (macOS / Linux)

Two terminals.

Terminal 1 β€” API

uv sync
uv run uvicorn src.api.main:app --reload --port 8000

Terminal 2 β€” Frontend

cd frontend
pnpm install
pnpm dev

Option C β€” Manual (Windows PowerShell)

Two terminals.

Terminal 1 β€” API

uv sync
uv run uvicorn src.api.main:app --reload --port 8000

Terminal 2 β€” Frontend

cd frontend
pnpm install
pnpm dev

If uv is not recognised after install, close and reopen PowerShell so the new PATH is picked up.

4. Open the app

URL What you'll see
http://localhost:5173 React SPA β€” Watch / Hub / Settings
http://localhost:8000/docs FastAPI Swagger UI
http://localhost:8000/health Health check

5. Docker (one container β€” API + SPA built)

Same commands on macOS / Linux / Windows:

# Normal β€” keeps images and volumes for fast rebuilds
docker compose up --build
# β†’ http://localhost:8000  Β·  Ctrl+C to stop  Β·  docker compose down

# Ephemeral demo β€” Ctrl+C tears down container + image + volumes
make docker-demo

# Manual full cleanup
make docker-clean
# (equivalent to: docker compose down --rmi local --volumes --remove-orphans)

More: see docs/PIPELINE.md for training, docs/API.md for endpoints, docs/DEPLOY.md for Render deployment.


Contributors

AndrΓ©s Torrez
Backend Developer
Mirae Kang
Scrum Master
Jonathan Brasales
AI Developer
Roberto Molero
Product Owner

SignalMod β€” Bootcamp IA P6 Β· Team 3 Β· 2026