SDiaReward-7B

SDiaReward-7B is a reward model for spoken dialogue, built on Qwen2.5-Omni-7B with a pooling layer and a linear scoring head (QwenOmniThinkerReward). Given a multi-turn conversation containing interleaved speech and text, it produces a scalar reward that reflects two qualities:

  • Modality-awareness β€” prosody, emotion, and acoustic naturalness (real human speech vs. synthetic TTS).

  • Colloquialness β€” spontaneous spoken style vs. scripted written style.

  • πŸ“„ Paper: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness (ACL 2026 Main Conference) β€” arXiv:2603.14889

  • πŸ’» Code: https://github.com/MM-Speech/SDiaReward

  • πŸ“š Training data (gated): SDiaReward dataset.

  • 🧩 Smaller variant: SDiaReward-3B.

Evaluation

On the ESDR-Bench validation set:

Model Accuracy Eval loss Margin
SDiaReward-7B 0.971 0.358 1.167
SDiaReward-3B 0.917 0.419 0.869

Usage

This checkpoint uses a custom reward architecture (QwenOmniThinkerReward: a Qwen2.5-Omni backbone with a pooling layer and a scalar reward head). The custom modeling code is not bundled in this repository, so the weights cannot be loaded with a plain AutoModel/pipeline call. To load the model and score conversations, follow the loading and inference instructions in the official code repository:

πŸ‘‰ https://github.com/MM-Speech/SDiaReward (checkpoint id: MYJOKERML/SDiaReward-7B)

Training

Trained with TRL's reward trainer on the SDiaReward preference dataset (11,630 episode-level preference pairs, ~200 hours of paired speech). Backbone: Qwen2.5-Omni-7B; pooling: mean + center; 2 epochs.

License & intended use

Released under Apache-2.0 for research use. The reward signal is intended for evaluating and improving spoken-dialogue systems; it is not a safety classifier.

Citation

@article{lu2026modeling,
  title={Modeling and benchmarking spoken dialogue rewards with modality and colloquialness},
  author={Lu, Jingyu and Wang, Yuhan and Zhuo, Fan and Cheng, Xize and Pan, Changhao and Pu, Xueyi and Chen, Yifu and Wen, Chenyuhao and Liang, Tianle and Zhao, Zhou},
  journal={arXiv preprint arXiv:2603.14889},
  year={2026}
}
Downloads last month
-
Safetensors
Model size
9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for MYJOKERML/SDiaReward-7B

Finetuned
(56)
this model

Paper for MYJOKERML/SDiaReward-7B