| # Model results and comparison | |
| **Demo catalog:** [`configs/model_catalog.yaml`](../configs/model_catalog.yaml) · Baseline metrics: [`models/baseline/manifest.json`](../models/baseline/manifest.json) | |
| | Model | F1 (test, weighted) | Train–test gap | Default in UI | | |
| |-------|---------------------|----------------|---------------| | |
| | LR + TF-IDF (Baseline) | 0.758 | 4.76 pp | No | | |
| | Frozen Toxic-BERT (Baseline) | 0.790 | 0.16 pp | No | | |
| | **Meta-Feature Stacking (Production)** | **0.805** | **2.54 pp** | **Yes** | | |
| **Handover:** [`reports/HANDOVER_REPORT.md`](../reports/HANDOVER_REPORT.md) · **Production JSON:** [`reports/notebook_14/final_result.json`](../reports/notebook_14/final_result.json) · **Golden baseline:** [`reports/golden_baseline/`](../reports/golden_baseline/) | |
| ## Baselines | |
| **LR + TF-IDF** — Notebooks 01–03, artifact `models/baseline/lr_tfidf.joblib`, tuning in [`configs/best_params.yaml`](../configs/best_params.yaml). | |
| **Frozen Toxic-BERT** — Notebook 12, `unitary/toxic-bert` inference-only; see golden baseline reports and `manifest.json` → `frozen_toxic_bert`. | |
| ## Production | |
| ```bash | |
| uv run python -m src.experiments.notebook_14_final_stack | |
| ``` | |
| Requires `uv sync --extra hf`. | |
| ## Other experiments | |
| Historical table: [`reports/summary.csv`](../reports/summary.csv). RF/XGBoost pipelines and `reports/v2/` figures are teammate or archived work — not in the demo model catalog. | |