ModernBERT Disfluency Detection — Exp B (Mixed 80/20)

Fine-tuned from answerdotai/ModernBERT-base on mixed data (80% synthetic / 20% real) from FluencyBank Timestamped.

Dataset

  • Config: mixed_8020 de arielcerdap/disfluency-fluencybank
  • Train: 13713 segmentos (80% sint / 20% real)
  • Val/Test: idénticos a Exp A para comparación directa

Labels

O · FP (filled pause) · RP (repetition) · RV (revision) · PW (partial word)

Test Results

Label P R F1 Support
Label P R F1 Support
--- --- --- --- ---
O 0.9710 0.9590 0.9650 3704
FP 0.9832 1.0000 0.9915 176
RP 0.6414 0.7299 0.6828 174
RV 0.1748 0.2907 0.2183 86
PW 0.9744 0.8155 0.8879 233

Macro F1 (4 disfluencias): 0.6951
Binary F1: 0.8006

Hyperparameters

  • learning_rate: 5e-05
  • epochs: 15
  • warmup_steps: 963
  • weight_decay: 0.1
  • focal_loss_gamma: 3.0 (adaptive)
  • class_weights: O=1.0, FP=3.0, RP=6.0, RV=20.0, PW=5.0
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train arielcerdap/modernbert-disfluency-expB-mixed