SignalMod / docs /RESULTS.md
Mirae Kang
feat: implement new models and improve UI, #23
46cc63a
|
raw
history blame
1.42 kB
# Model results and comparison
**Demo catalog:** [`configs/model_catalog.yaml`](../configs/model_catalog.yaml) · Baseline metrics: [`models/baseline/manifest.json`](../models/baseline/manifest.json)
| Model | F1 (test, weighted) | Train–test gap | Default in UI |
|-------|---------------------|----------------|---------------|
| LR + TF-IDF (Baseline) | 0.758 | 4.76 pp | No |
| Frozen Toxic-BERT (Baseline) | 0.790 | 0.16 pp | No |
| **Meta-Feature Stacking (Production)** | **0.805** | **2.54 pp** | **Yes** |
**Handover:** [`reports/HANDOVER_REPORT.md`](../reports/HANDOVER_REPORT.md) · **Production JSON:** [`reports/notebook_14/final_result.json`](../reports/notebook_14/final_result.json) · **Golden baseline:** [`reports/golden_baseline/`](../reports/golden_baseline/)
## Baselines
**LR + TF-IDF** — Notebooks 01–03, artifact `models/baseline/lr_tfidf.joblib`, tuning in [`configs/best_params.yaml`](../configs/best_params.yaml).
**Frozen Toxic-BERT** — Notebook 12, `unitary/toxic-bert` inference-only; see golden baseline reports and `manifest.json``frozen_toxic_bert`.
## Production
```bash
uv run python -m src.experiments.notebook_14_final_stack
```
Requires `uv sync --extra hf`.
## Other experiments
Historical table: [`reports/summary.csv`](../reports/summary.csv). RF/XGBoost pipelines and `reports/v2/` figures are teammate or archived work — not in the demo model catalog.