title: SignalMod
emoji: π‘οΈ
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 8000
pinned: false
license: apache-2.0
short_description: Smart moderation for YouTube comments
Project description
SignalMod is an intelligent moderation assistant for YouTube comments. It automatically classifies each comment as Safe or Toxic, returns a probability between 0 and 1, and tags toxicity categories (insult, threat, identity hate, obscene content).
It is built around the team's hybrid meta-feature stacking model β frozen Toxic-BERT embeddings combined with metadata features and a regularised logistic regression β reaching F1 = 0.805 with a trainβtest gap of 2.54 pp on the project's 200-sample test split.
The product ships as a FastAPI REST service plus a React SPA that mimics the YouTube Watch experience: pick a video, the API fetches the latest 50 comments via the YouTube Data API, scores them, and persists every prediction in Supabase so any visitor can see the full history.
Tools and languages
Languages
- Python 3.12 β backend, ML pipelines, evaluation.
- TypeScript + React 18 β frontend SPA.
- SQL (PostgreSQL via Supabase) β predictions persistence.
Backend
- FastAPI 0.136 β REST API, Pydantic schemas, lifespan model loading.
- Uvicorn β ASGI server with hot reload.
- scikit-learn 1.8 β TF-IDF baseline + meta-learner Logistic Regression.
- Optuna β hyperparameter search for the TF-IDF baseline.
- PyTorch 2.x + Transformers 5.9 β frozen
unitary/toxic-bertfor CLS embeddings. - spaCy + NLTK β lemmatisation, stopwords, regex-based cleanup.
- MLflow β experiment tracking.
- Supabase Python SDK β predictions persistence with anonymous RLS policies.
- google-api-python-client β YouTube Data API v3 integration.
Frontend
- React 18 + Vite 5 + TypeScript β SPA with hot module reload.
- CSS modules β YouTube-like dark theme.
Tooling and ops
- uv β Python package and venv manager (
pyproject.toml+uv.lock). - pnpm β frontend package manager.
- Docker + Docker Compose β single-container deploy serving API + built SPA.
- GNU Make β
make dev,make install,make build,make docker. - Render β free-tier deploy via
render.yamlblueprint. - Pytest β unit tests for API contracts and preprocessing.
Project architecture
Project_9_Equipo3/
βββ configs/ # YAML configs for pipelines and inference catalog
β βββ pipeline.yaml # Training data paths, target columns, CV folds
β βββ features.yaml # Preprocessing and TF-IDF settings
β βββ model_catalog.yaml # Inference catalog (3 swappable models)
β βββ best_params.yaml # Optuna winner for the LR baseline
β βββ suggested_videos.yaml # YouTube IDs shown in the Up-next rail
β βββ *_training.yaml # Training profiles (golden baseline, expert, hybrid, β¦)
βββ data/ # Raw and processed datasets (git-ignored)
βββ docs/ # API.md, PIPELINE.md, ARCHITECTURE.md, DEPLOY.md
β βββ assets/signalmod_logo.png # Brand assets
βββ frontend/ # React + Vite SPA
β βββ public/signalmod_logo.png # Logo served as static asset
β βββ src/
β βββ api/ # Typed HTTP client
β βββ components/ # Layout, CommentRow, SuggestedRail, ModelBanner
β βββ context/ # Global app state (active model, threshold)
β βββ hooks/ # useDebouncedPredict
β βββ pages/ # WatchPage, HubPage, SettingsPage
β βββ utils/ # toxicityColor, randomUsername, relativeTime
βββ models/
β βββ baseline/lr_tfidf.joblib # Optuna-tuned LR baseline
β βββ production_final/ # meta_stack_final.joblib β production artifact
βββ notebooks/
β βββ 01β04 # EDA, preprocessing, TF-IDF, baseline LR
β βββ 12 # Golden baseline (frozen Toxic-BERT)
β βββ 14 # Final meta-stacking β production artifact
β βββ archive_attempts/ # Earlier experiments preserved for reproducibility
βββ reports/ # Metrics, plots, EDA figures, summary.csv
βββ src/
β βββ api/ # FastAPI app
β β βββ main.py # Lifespan, CORS, static SPA mount
β β βββ routes/ # health, models, predict (+ /predictions), videos
β β βββ schemas.py # Pydantic request/response models
β β βββ services.py # predict_single, to_predict_response
β β βββ state.py # Shared app state
β β βββ youtube.py # YouTube Data API fetch + suggested metadata
β βββ data/ # Loader, dual loader for hybrid pipelines
β βββ db/ # Supabase client + save_prediction helpers
β βββ evaluation/ # Evaluator, threshold tuning, stable CV
β βββ experiments/ # Notebook 13 / 14 script versions
β βββ features/ # text_preprocessor, vectorizer, metadata, augmentation
β βββ models/ # baseline (LR/RF/XGBoost), hybrid_ensemble, metadata_lr
β βββ pipeline/ # run_pipeline + per-strategy variants
β βββ service/ # ModelService, meta_stack_predictor, model_catalog
β βββ utils/ # Logger
βββ supabase/predictions_setup.sql # SQL to create the predictions table + RLS policies
βββ tests/ # Pytest suite
βββ Dockerfile # Multi-stage build (frontend + uv backend)
βββ docker-compose.yml # One-container deploy serving API + SPA
βββ render.yaml # Render blueprint (web service + static site)
βββ Procfile # Render process declaration
βββ Makefile # make dev / install / build / docker / test
βββ pyproject.toml + uv.lock # Python dependencies pinned with uv
βββ README.md / README.es.md # English / Spanish documentation
Data flow
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β React SPA (Vite) http://localhost:5173β
β Layout Β· Watch Β· Hub Β· Settings β
ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ
β HTTP JSON (Vite proxy β :8000)
ββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β FastAPI http://localhost:8000β
β /predict /predict-batch /predict-video β
β /predictions (GET β Supabase history) β
β /models /models/select /model-info β
β /videos/suggested /health β
ββββββββ¬ββββββββββββββββββββββββββββββ¬ββββββββββββ
β β
ββββββββββββββββΌββββββββββββββ βββββββββββββββΌβββββββββββββββ
β ModelService β β YouTube Data API v3 β
β Β· local joblib β β Β· video metadata β
β Β· hf_remote β β Β· 50 newest comments β
β Β· meta_stack (production) β β β
ββββββββ¬ββββββββββββββββββββββ ββββββββββββββββββββββββββββββ
β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ
β Supabase (PostgreSQL) β
β table: predictions(id, created_at, text, video_id, β
β probability, is_toxic, labels, β¦) β
β RLS: anon insert + anon select β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Model catalog (swappable from the UI)
| Model | Type | F1 (test) | Trainβtest gap | Threshold | Latency | Default |
|---|---|---|---|---|---|---|
| Meta-Feature Stacking | Hybrid | 0.805 | 2.54 pp | 0.381 | ~400 ms | Yes |
| Frozen Toxic-BERT | Transformer | 0.790 | 0.16 pp | 0.120 | ~400 ms | No |
| LR + TF-IDF (Optuna) | sklearn | 0.758 | 4.76 pp | 0.500 | < 50 ms | No |
The production model concatenates the frozen [CLS] embedding from unitary/toxic-bert (768-d) with hand-crafted metadata features (length, uppercase ratio, emoji densityβ¦), scales them with StandardScaler, and feeds them into a LogisticRegression(C=0.001) meta-learner.
Setup & run
1. Prerequisites
| Tool | macOS / Linux | Windows |
|---|---|---|
| Python 3.12 | brew install python@3.12 |
python.org/downloads (check Add Python to PATH) |
| uv | curl -LsSf https://astral.sh/uv/install.sh | sh |
powershell -c "irm https://astral.sh/uv/install.ps1 | iex" |
| Node.js 18+ | brew install node |
nodejs.org (LTS) |
| pnpm | npm i -g pnpm |
npm i -g pnpm |
| Make (optional) | already installed | winget install GnuWin32.Make (or use WSL) |
2. Clone & configure
git clone https://github.com/Bootcamp-IA-P6/Project_9_Equipo3.git
cd Project_9_Equipo3
cp .env.example .env
# Fill: YOUTUBE_API_KEY, SUPABASE_URL, SUPABASE_KEY
Windows PowerShell: replace
cpwithCopy-Item .env.example .env.
Paste supabase/predictions_setup.sql into the Supabase SQL editor before the first run (creates the predictions table + RLS policies).
3. Run β three ways
Option A β With Makefile (recommended on macOS / Linux / WSL)
make install # uv sync + pnpm install
make dev # FastAPI :8000 + Vite :5173
| Command | What it does |
|---|---|
make install |
Install Python + frontend deps |
make dev |
Start API and UI in parallel (Ctrl+C stops both) |
make api |
API only |
make ui |
UI only |
make build |
Build the SPA into frontend/dist |
make test |
Run Pytest |
make docker |
docker compose up --build |
make stop |
Kill anything on ports 8000 / 5173 |
make clean |
Remove .venv, node_modules, dist |
Option B β Manual (macOS / Linux)
Two terminals.
Terminal 1 β API
uv sync
uv run uvicorn src.api.main:app --reload --port 8000
Terminal 2 β Frontend
cd frontend
pnpm install
pnpm dev
Option C β Manual (Windows PowerShell)
Two terminals.
Terminal 1 β API
uv sync
uv run uvicorn src.api.main:app --reload --port 8000
Terminal 2 β Frontend
cd frontend
pnpm install
pnpm dev
If
uvis not recognised after install, close and reopen PowerShell so the newPATHis picked up.
4. Open the app
| URL | What you'll see |
|---|---|
| http://localhost:5173 | React SPA β Watch / Hub / Settings |
| http://localhost:8000/docs | FastAPI Swagger UI |
| http://localhost:8000/health | Health check |
5. Docker (one container β API + SPA built)
Same commands on macOS / Linux / Windows:
# Normal β keeps images and volumes for fast rebuilds
docker compose up --build
# β http://localhost:8000 Β· Ctrl+C to stop Β· docker compose down
# Ephemeral demo β Ctrl+C tears down container + image + volumes
make docker-demo
# Manual full cleanup
make docker-clean
# (equivalent to: docker compose down --rmi local --volumes --remove-orphans)
More: see docs/PIPELINE.md for training, docs/API.md for endpoints, docs/DEPLOY.md for Render deployment.
Contributors
|
AndrΓ©s Torrez Backend Developer |
Mirae Kang Scrum Master |
Jonathan Brasales AI Developer |
Roberto Molero Product Owner |
SignalMod β Bootcamp IA P6 Β· Team 3 Β· 2026