SDiaReward-7B

SDiaReward-7B is a reward model for spoken dialogue, built on Qwen2.5-Omni-7B with a pooling layer and a linear scoring head (QwenOmniThinkerReward). Given a multi-turn conversation containing interleaved speech and text, it produces a scalar reward that reflects two qualities:

Modality-awareness — prosody, emotion, and acoustic naturalness (real human speech vs. synthetic TTS).
Colloquialness — spontaneous spoken style vs. scripted written style.
📄 Paper: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness (ACL 2026 Main Conference) — arXiv:2603.14889
💻 Code: https://github.com/MM-Speech/SDiaReward
📚 Training data (gated): SDiaReward dataset.
🧩 Smaller variant: SDiaReward-3B.

Evaluation

On the ESDR-Bench validation set:

Model	Accuracy	Eval loss	Margin
SDiaReward-7B	0.971	0.358	1.167
SDiaReward-3B	0.917	0.419	0.869

Usage

This checkpoint uses a custom reward architecture (QwenOmniThinkerReward: a Qwen2.5-Omni backbone with a pooling layer and a scalar reward head). The custom modeling code is not bundled in this repository, so the weights cannot be loaded with a plain AutoModel/pipeline call. To load the model and score conversations, follow the loading and inference instructions in the official code repository:

👉 https://github.com/MM-Speech/SDiaReward (checkpoint id: MYJOKERML/SDiaReward-7B)

Training

Trained with TRL's reward trainer on the SDiaReward preference dataset (11,630 episode-level preference pairs, ~200 hours of paired speech). Backbone: Qwen2.5-Omni-7B; pooling variant: mean_center; 2 epochs.

License & intended use

Released under Apache-2.0 for research use. The reward signal is intended for evaluating and improving spoken-dialogue systems; it is not a safety classifier.

Citation

@article{lu2026modeling,
  title={Modeling and benchmarking spoken dialogue rewards with modality and colloquialness},
  author={Lu, Jingyu and Wang, Yuhan and Zhuo, Fan and Cheng, Xize and Pan, Changhao and Pu, Xueyi and Chen, Yifu and Wen, Chenyuhao and Liang, Tianle and Zhao, Zhou},
  journal={arXiv preprint arXiv:2603.14889},
  year={2026}
}

Downloads last month: 15

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for MYJOKERML/SDiaReward-7B

Base model

Qwen/Qwen2.5-Omni-7B

Finetuned

(58)

this model

Paper for MYJOKERML/SDiaReward-7B

SDiaReward: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness

Paper • 2603.14889 • Published May 11