Dysarthric Speech Severity Level Classifier

A regression probe trained on top of Whisper-large-v3 encoder features for estimating the severity level of dysarthric speech.

Score scale: 1.0 (most severe dysarthria) to 7.0 (typical speech)

GitHub: JaesungBae/DA-DSQA

Model Description

This model uses a three-stage training pipeline:

Pseudo-labeling — A baseline probe generates pseudo-labels for unlabeled data
Contrastive pre-training — Weakly-supervised contrastive learning with typical speech augmentation
Fine-tuning — Regression probe fine-tuned with the pre-trained projector

Architecture: Whisper-large-v3 encoder (frozen) → LayerNorm → 2-layer MLP (proj_dim=320) → Statistics Pooling (mean+std) → Linear → Score

For details, see our paper:

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech [arXiv]

Available Checkpoints

This repository contains 9 checkpoints trained with different contrastive losses:

Checkpoint	Contrastive Loss	τ
`proposed_L_coarse_tau0.1`	Proposed (L_coarse)	0.1
`proposed_L_coarse_tau1.0`	Proposed (L_coarse)	1.0
`proposed_L_coarse_tau10.0`	Proposed (L_coarse)	10.0
`proposed_L_coarse_tau50.0`	Proposed (L_coarse)	50.0
`proposed_L_coarse_tau100.0` (default)	Proposed (L_coarse)	100.0
`proposed_L_cont_tau0.1`	Proposed (L_cont)	0.1
`proposed_L_dis_tau1.0`	Proposed (L_dis)	1.0
`rank-n-contrast_tau100.0`	Rank-N-Contrast	100.0
`simclr_tau0.1`	SimCLR	0.1

Setup

1. Create conda environment

conda create -n da-dsqa python=3.10 -y
conda activate da-dsqa

2. Install PyTorch with CUDA

conda install pytorch torchaudio -c pytorch -y

For a GPU build with a specific CUDA version, see pytorch.org for the appropriate command.

3. Install remaining dependencies

pip install -r requirements.txt

Note: Silero VAD is loaded automatically at runtime via torch.hub — no separate installation needed.

Runtime Dependencies

This model loads openai/whisper-large-v3 (~6GB) and Silero VAD at initialization time. Ensure sufficient memory is available.

Usage

With the custom pipeline

from huggingface_hub import snapshot_download

# Download the model
model_dir = snapshot_download("jaesungbae/da-dsqa")

# Load pipeline (defaults to proposed_L_coarse_tau100.0)
from pipeline import PreTrainedPipeline
pipe = PreTrainedPipeline(model_dir)

# Run inference
result = pipe("/path/to/audio.wav")
print(result)
# {"severity_score": 4.25, "raw_score": 4.2483, "model_name": "proposed_L_coarse_tau100.0"}

Select a specific checkpoint

# Option 1: specify at initialization
pipe = PreTrainedPipeline(model_dir, model_name="simclr_tau0.1")

# Option 2: switch at runtime (Whisper & VAD stay loaded)
pipe.switch_model("rank-n-contrast_tau100.0")
result = pipe("/path/to/audio.wav")

# Option 3: override per call
result = pipe("/path/to/audio.wav", model_name="proposed_L_dis_tau1.0")

Batch inference

results = pipe.batch_inference([
    "/path/to/audio1.wav",
    "/path/to/audio2.wav",
    "/path/to/audio3.wav",
])
for r in results:
    print(f"{r['file']}: {r['severity_score']}")

List available checkpoints

print(pipe.list_models())
# ['proposed_L_coarse_tau0.1', 'proposed_L_coarse_tau1.0', ...]

Compare all checkpoints on a single file

for name in pipe.list_models():
    result = pipe("/path/to/audio.wav", model_name=name)
    print(f"{name}: {result['severity_score']}")

Standalone inference

Clone the full repository and run:

python inference.py \
    --wav /path/to/audio.wav \
    --checkpoint ./checkpoints/stage3/proposed_L_coarse_tau100.0/average

Citation

@misc{bae2026something,
  title         = {Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech},
  author        = {Jaesung Bae and Xiuwen Zheng and Minje Kim and Chang D. Yoo and Mark Hasegawa-Johnson},
  year          = {2026},
  eprint        = {2603.15988},
  archivePrefix = {arXiv},
  primaryClass  = {eess.AS},
  url           = {https://arxiv.org/abs/2603.15988}
}

Downloads last month: 20

Space using jaesungbae/da-dsqa 1

Paper for jaesungbae/da-dsqa

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

Paper • 2603.15988 • Published Mar 16