SignalMod / docs /RESULTS.md
Mirae Kang
feat: implement new models and improve UI, #23
46cc63a
|
raw
history blame
1.42 kB

Model results and comparison

Demo catalog: configs/model_catalog.yaml · Baseline metrics: models/baseline/manifest.json

Model F1 (test, weighted) Train–test gap Default in UI
LR + TF-IDF (Baseline) 0.758 4.76 pp No
Frozen Toxic-BERT (Baseline) 0.790 0.16 pp No
Meta-Feature Stacking (Production) 0.805 2.54 pp Yes

Handover: reports/HANDOVER_REPORT.md · Production JSON: reports/notebook_14/final_result.json · Golden baseline: reports/golden_baseline/

Baselines

LR + TF-IDF — Notebooks 01–03, artifact models/baseline/lr_tfidf.joblib, tuning in configs/best_params.yaml.

Frozen Toxic-BERT — Notebook 12, unitary/toxic-bert inference-only; see golden baseline reports and manifest.jsonfrozen_toxic_bert.

Production

uv run python -m src.experiments.notebook_14_final_stack

Requires uv sync --extra hf.

Other experiments

Historical table: reports/summary.csv. RF/XGBoost pipelines and reports/v2/ figures are teammate or archived work — not in the demo model catalog.