# Model results and comparison **Demo catalog:** [`configs/model_catalog.yaml`](../configs/model_catalog.yaml) · Baseline metrics: [`models/baseline/manifest.json`](../models/baseline/manifest.json) | Model | F1 (test, weighted) | Train–test gap | Default in UI | |-------|---------------------|----------------|---------------| | LR + TF-IDF (Baseline) | 0.758 | 4.76 pp | No | | Frozen Toxic-BERT (Baseline) | 0.790 | 0.16 pp | No | | **Meta-Feature Stacking (Production)** | **0.805** | **2.54 pp** | **Yes** | **Handover:** [`reports/HANDOVER_REPORT.md`](../reports/HANDOVER_REPORT.md) · **Production JSON:** [`reports/notebook_14/final_result.json`](../reports/notebook_14/final_result.json) · **Golden baseline:** [`reports/golden_baseline/`](../reports/golden_baseline/) ## Baselines **LR + TF-IDF** — Notebooks 01–03, artifact `models/baseline/lr_tfidf.joblib`, tuning in [`configs/best_params.yaml`](../configs/best_params.yaml). **Frozen Toxic-BERT** — Notebook 12, `unitary/toxic-bert` inference-only; see golden baseline reports and `manifest.json` → `frozen_toxic_bert`. ## Production ```bash uv run python -m src.experiments.notebook_14_final_stack ``` Requires `uv sync --extra hf`. ## Other experiments Historical table: [`reports/summary.csv`](../reports/summary.csv). RF/XGBoost pipelines and `reports/v2/` figures are teammate or archived work — not in the demo model catalog.