Spaces:

devrup404
/

SignalMod

Running

SignalMod / docs /RESULTS.md

Mirae Kang

feat: implement new models and improve UI, #23

46cc63a 5 days ago

1.42 kB

	# Model results and comparison

	Demo catalog: [`configs/model_catalog.yaml`](../configs/model_catalog.yaml) · Baseline metrics: [`models/baseline/manifest.json`](../models/baseline/manifest.json)

	\| Model \| F1 (test, weighted) \| Train–test gap \| Default in UI \|
	\|-------\|---------------------\|----------------\|---------------\|
	\| LR + TF-IDF (Baseline) \| 0.758 \| 4.76 pp \| No \|
	\| Frozen Toxic-BERT (Baseline) \| 0.790 \| 0.16 pp \| No \|
	\| Meta-Feature Stacking (Production) \| 0.805 \| 2.54 pp \| Yes \|

	Handover: [`reports/HANDOVER_REPORT.md`](../reports/HANDOVER_REPORT.md) · Production JSON: [`reports/notebook_14/final_result.json`](../reports/notebook_14/final_result.json) · Golden baseline: [`reports/golden_baseline/`](../reports/golden_baseline/)

	## Baselines

	LR + TF-IDF — Notebooks 01–03, artifact `models/baseline/lr_tfidf.joblib`, tuning in [`configs/best_params.yaml`](../configs/best_params.yaml).

	Frozen Toxic-BERT — Notebook 12, `unitary/toxic-bert` inference-only; see golden baseline reports and `manifest.json` → `frozen_toxic_bert`.

	## Production

	```bash
	uv run python -m src.experiments.notebook_14_final_stack
	```

	Requires `uv sync --extra hf`.

	## Other experiments

	Historical table: [`reports/summary.csv`](../reports/summary.csv). RF/XGBoost pipelines and `reports/v2/` figures are teammate or archived work — not in the demo model catalog.