Assessing speech quality metrics for evaluation of neural audio codecs under clean speech conditions
Abstract
Analysis of 45 objective speech-quality metrics reveals that neural-based metrics like scoreq and utmos show strongest correlation with subjective listening scores across 17 codec conditions.
Objective speech-quality metrics are widely used to assess codec performance. However, for neural codecs, it is often unclear which metrics provide reliable quality estimates. To address this, we evaluated 45 objective metrics by correlating their scores with subjective listening scores for clean speech across 17 codec conditions. Neural-based metrics such as scoreq and utmos achieved the highest Pearson correlations with subjective scores. Further analysis across different subjective quality ranges revealed that non-intrusive metrics tend to saturate at high subjective quality levels.
Get this paper in your agent:
hf papers read 2509.24457 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper