Spaces:

devrup404
/

SignalMod

Sleeping

App Files Files Community

SignalMod / docs /PIPELINE.md

Mirae Kang

feat: implement new models and improve UI, #23

46cc63a 5 days ago

preview code

raw

history blame

7.88 kB

Training pipeline

Entry point: src/pipeline/run_pipeline.py

Command

python -m src.pipeline.run_pipeline --model lr

Flag	Choices	Default
`--model`	`lr`, `rf`, `xgboost`	`lr`

Run from the repository root so configs/ and data/raw/ resolve correctly.

Phases

Load data — load_raw_data() reads configs/pipeline.yaml → data/raw/youtoxic_english_1000.csv
Split — stratified train/test (test_size, random_state in YAML)
Preprocess — TextPreprocessor (lowercase, regex cleanup, spaCy lemmas, NLTK stopwords)
Train — build_model(model_type) fits TF-IDF + classifier pipeline
Cross-validation — 5-fold stratified CV, F1 weighted + ROC-AUC
Evaluate — Evaluator.evaluate_and_report() on test set
Save — models/experiments/{model}/{model}_pipeline_{timestamp}.joblib
MLflow — metrics and sklearn pipeline under mlruns/
Reports — append row to reports/summary.csv; PNGs in reports/pipeline/{model}/

Configuration

File	Keys (examples)
`configs/pipeline.yaml`	`target_binary: IsToxic`, `test_size: 0.2`, `cv_folds: 5`
`configs/features.yaml`	TF-IDF `max_features`, `ngram_range`, preprocessing flags
`configs/models.yaml`	LR `C`, RF `n_estimators`, etc.
`configs/best_params.yaml`	Optuna winner for LR (overrides defaults when training LR)

Outputs

Path	Content
`reports/summary.csv`	All runs — model comparison table
`reports/pipeline/lr/cm_lr.png`	Confusion matrix
`reports/pipeline/lr/roc_lr.png`	ROC curve
`reports/pipeline/lr/errors_lr.csv`	False positives / negatives
`reports/pipeline/lr/exp_*.json`	Full metrics per run
`models/experiments/lr/*.joblib`	Serialized pipeline

Evaluator API

src/evaluation/evaluator.py:

from src.evaluation.evaluator import Evaluator

evaluator = Evaluator(output_dir="reports/pipeline/lr")
metrics = evaluator.evaluate_and_report(
    model, X_test, y_test, model_name="LR",
    X_train=X_train, y_train=y_train, cv_results=cv_results,
    summary_path="reports/summary.csv",
)

Metrics include: f1_weighted, f1_toxic, roc_auc, fp, fn, cv_test_gap_pp, train_test_gap_pp, plus paths to plots.

Stable training (DistilBERT + LR ensemble)

Entry point: src/pipeline/run_stable_pipeline.py

Implements partial DistilBERT freezing, toxic-only back-translation with cosine dedup, gap-aware early stopping, regularized head (dropout 0.5, label smoothing 0.1), and soft-voting with TF-IDF LR (C=0.01).

uv sync --extra hf --extra train
uv run python -m src.pipeline.run_stable_pipeline
uv run python -m src.pipeline.run_stable_pipeline --skip-augmentation   # no network BT
uv run python -m src.pipeline.run_stable_pipeline --bert-only           # DistilBERT only

Config: configs/stable_training.yaml. Outputs under models/stable_distilbert/, models/stable_lr_tfidf.joblib, reports/stable/.

Phase 5: Expert adaptation (Toxic-BERT + hybrid)

Entry point: src/pipeline/run_expert_pipeline.py

unitary/toxic-bert with head-only fine-tune, TF-IDF LR at 250 features, validation threshold tuning on F1-toxic, hybrid 0.7 / 0.3, EN→DE→EN augmentation. Notebook: notebooks/11_expert_phase5_toxicbert.ipynb.

uv sync --extra hf --extra train
uv run python -m src.pipeline.run_expert_pipeline

Config: configs/expert_training.yaml. Outputs under models/expert_toxic_bert/, models/expert_lr_tfidf.joblib, reports/expert/.

Clean-Signal Dual-Input Hybrid

Entry point: src/pipeline/run_hybrid_clean_pipeline.py

Toxic-BERT: raw Text (reuses models/expert_toxic_bert, threshold 0.33)
LR: clean_text from data/processed/v2/comments_preprocessed.csv (generated via spaCy if missing) + metadata from comments_with_stats.csv
Weights: validation F1–based (clamped LR share 0.15–0.45)

uv run python -m src.pipeline.run_hybrid_clean_pipeline
uv run python -m src.pipeline.run_hybrid_clean_pipeline --skip-augmentation

Config: configs/hybrid_clean_training.yaml. Reports: reports/hybrid_clean/.

Performance Push (Final Squeeze)

Entry point: src/pipeline/run_performance_push_pipeline.py

Full Toxic-BERT unfreeze (lr=5e-6, 20 epochs, early stop patience 4 on val_f1_weighted), test-time augmentation (original + back-translated average), LR anchor 300 features / 0.05 ensemble weight, threshold grid 0.30–0.70, gap defense 4.8 pp.

uv run python -m src.pipeline.run_performance_push_pipeline

Config: configs/performance_push_training.yaml. Reports: reports/performance_push/.

Stealth Learning (0.80 push)

Entry point: src/pipeline/run_stealth_learning_pipeline.py

Last 2 Toxic-BERT layers (lr=7e-6) + head (2e-5), training gap limit 5.5%, patience 5, SWA over last 5 epochs, threshold step 0.005, LR anchor 250 features / 0.05 weight, TTA on test.

uv run python -m src.pipeline.run_stealth_learning_pipeline

Config: configs/stealth_learning_training.yaml. Reports: reports/stealth_learning/.

Golden Baseline Strategy (Briefing gap + F1 0.80)

Entry point: src/pipeline/run_golden_baseline_pipeline.py · Notebook: notebooks/12_golden_baseline_strategy.ipynb

Golden Baseline — frozen pretrained Toxic-BERT (no training; gap <1%)
Performance Squeeze — last 2 layers + R-Drop, lr=5e-6, 15 epochs, gap ≤4.9%
Hybrid Safety Net — BERT + LR (C=0.001, 200 features)

uv run python -m src.pipeline.run_golden_baseline_pipeline

Config: configs/golden_baseline_training.yaml. Reports: reports/golden_baseline/.

Hyper-Optimization Sprints (Notebook 13)

Entry point: src/experiments/notebook_13_sprints.py · Notebook: notebooks/13_hyper_optimization_sprints.ipynb

Four CV sprints (multi-pivot aug, TTA, meta stacking, ultra-fine threshold) on Golden Baseline foundation. Artifacts: models/notebook_13/, reports: reports/notebook_13/.

uv run python -m src.experiments.notebook_13_sprints

Final Meta Stacking (Notebook 14)

Entry point: src/experiments/notebook_14_final_stack.py · Notebook: notebooks/14_final_meta_stacking.ipynb

Single 80/20 split, Exp3 meta stacking, C=0.001, test threshold grid (step 0.001). Report: reports/notebook_14/final_result.json.

uv run python -m src.experiments.notebook_14_final_stack

Production model (inference)

Demo inference (API / UI):

Model	Path / weights
Meta-Feature Stacking (Production)	`models/production_final/meta_stack_final.joblib`
LR + TF-IDF (Baseline)	`models/baseline/lr_tfidf.joblib`
Frozen Toxic-BERT (Baseline)	Hub `unitary/toxic-bert` (metrics in `models/baseline/manifest.json`)

Catalog: configs/model_catalog.yaml.

Other pipelines below (stable, expert, etc.) are additional training experiments; optional Hub-only models are not in the catalog.

Handover script: reports/HANDOVER_REPORT.md.