File size: 7,715 Bytes
a01a0b9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
license: mit
---

# SCOREQ-PyTorch

## About

This is an unofficial `fairseq`-free implementation of the SCOREQ Speech Quality Assessment system proposed in [SCOREQ: Speech Quality Assessment with Contrastive Regression](https://arxiv.org/abs/2410.06675).

The [original implementation](https://github.com/alessandroragano/scoreq) provides a `fairseq`-based PyTorch model and an ONNX variant. In practice, the `fairseq` dependency can be difficult to install with recent Python, PyTorch, and dependency versions. The ONNX variant avoids `fairseq`, but it can be less convenient for PyTorch-based research workflows and may be difficult to run with GPU acceleration on `ARM/aarch64` systems.

[Recent study from ICASSP 2026](https://arxiv.org/abs/2509.24457) highlights the high correlation of SCOREQ with subjective listening scores for neural codecs. Therefore, modern neural audio codec and TTS research benefits from an easy-to-install SCOREQ implementation.

We provide a `fairseq`-free implementation written directly in `PyTorch` that matches the [original system](https://github.com/alessandroragano/scoreq) using converted weights and reimplemented modules.

We also provide a `TorchScript` variant that can be loaded with only PyTorch, without installing this package.

The PyTorch and TorchScript versions are validated against the original implementation and produce matching scores.

> [!NOTE]
> In contrast to the original implementation, we support batched audio assessment. However, we recommend running SCOREQ with **batch size 1** to avoid metric shifts caused by padding. Batching can be used for faster evaluation when small padding-related score differences are acceptable.

## Model Types

As in the [original system](https://github.com/alessandroragano/scoreq), we support 4 types of SCOREQ, i.e., 2 audio domains and 2 modes.

Data domain (what kind of audio is evaluated):

- `natural`: used for audio that was created from a genuine human speech (Audio Codecs, VoIP, Telephony, Speech Enhancement, Audio Restoration).
- `synthetic`: used for audio that was synthesized by a machine (Text-to-Speech (TTS), Voice Conversion (VC), Generative Speech Models).

Mode (whether there is a reference audio to compare with):

- `nr`: no-reference mode. Assesses the quality of audio, **the higher the better**, without relying on any reference.
- `ref`: reference mode. Calculate the distance between provided and reference audio embeddings, **the lower the better**.

We refer the user to the [original repository](https://github.com/alessandroragano/scoreq) and [paper](https://arxiv.org/abs/2410.06675) for more details on model types.

## Usage

You can install the repo as a package:

```bash
pip install scoreq-pytorch
```

Or from source:

```bash
git clone https://github.com/Blinorot/scoreq-pytorch.git
cd scoreq-pytorch
pip install -e .
```

The code requires:

| Package         | Version |
| --------------- | ------- |
| Python          | >=3.9   |
| PyTorch         | >=2.2.0 |
| HuggingFace Hub | >=0.20  |

The TorchScript checkpoint was scripted with `PyTorch 2.5.1`. We have tested that it works on `PyTorch 2.2.0`, however, `PyTorch >=2.5.1` is recommended for the
TorchScript variant.

Then, you can run the model as follows:

```python
import torchaudio
from scoreq_pytorch import SCOREQScoreTorch

device = "cpu" # set to "cuda" to use on GPU
data_domain = "natural" # or "synthetic"
mode = "nr" # or "ref"
scoreq = SCOREQScoreTorch(
  data_domain=data_domain,
  mode=mode,
  device=device
) # already in eval mode

# load an audio file, e.g. using torchaudio
test_audio_path = ... # path to an audio file
test_wav, sr = torchaudio.load(test_audio_path)

# convert to MONO 16 kHz
TARGET_SR = 16000
if test_wav.shape[0] != 1:
    test_wav = test_wav[0:1]
if sr != TARGET_SR:
    test_wav = torchaudio.functional.resample(test_wav, orig_freq=sr, new_freq=TARGET_SR)
# put on device
test_wav = test_wav.to(device)

# for "ref" mode, you need a reference audio
# same loading and pre-processing procedure
if mode == "ref":
    ref_wav = ...
else:
    ref_wav = None

# calculate the score
# accepts T, 1xT, Bx1xT
scoreq_score = scoreq.score(test_wav, ref_wav) # tensor of shape (batch_size,)
```

You can replace `SCOREQScoreTorch` with `SCOREQScoreScripted` to use the `TorchScript` variant instead. On first use, the package downloads converted SCOREQ weights from [Hugging Face Hub](https://huggingface.co/Blinorot/SCOREQ-PyTorch) and caches them locally using the Hugging Face cache.

For `TorchScript`, you can avoid downloading the package and use the model directly:

```python
import torch
import torchaudio
import wget

data_domain = "natural" # or "synthetic"
mode = "nr" # or "ref"

# download scripted checkpoint, e.g. using wget
checkpoint_url = f"https://huggingface.co/Blinorot/SCOREQ-PyTorch/resolve/main/scoreq_{data_domain}_{mode}_scripted.pt"
checkpoint_path = ... # path to saved checkpoint
wget.download(checkpoint_url, checkpoint_path)

# load directly with torch.jit
device = "cpu" # set to "cuda" to use on GPU
scoreq = torch.jit.load(checkpoint_path, map_location=device)
scoreq.eval()

# load an audio file, e.g. using torchaudio
test_audio_path = ... # path to an audio file
test_wav, sr = torchaudio.load(test_audio_path)

# convert to MONO 16 kHz
TARGET_SR = 16000
if test_wav.shape[0] != 1:
    test_wav = test_wav[0:1]
if sr != TARGET_SR:
    test_wav = torchaudio.functional.resample(test_wav, orig_freq=sr, new_freq=TARGET_SR)
# put on device
test_wav = test_wav.to(device)

# for "ref" mode, you need a reference audio
# same loading and pre-processing procedure
if mode == "ref":
    ref_wav = ...
else:
    ref_wav = None

# calculate the score
# accepts T, 1xT, Bx1xT
with torch.no_grad():
    scoreq_score = scoreq(test_wav, ref_wav) # tensor of shape (batch_size,)
```

### Notes

The model expects audio sampled at **16 kHz**.

Accepted tensor shapes:

| Shape       | Meaning                                          |
| ----------- | ------------------------------------------------ |
| `(T,)`      | single mono test_waveform                        |
| `(1, T)`    | single mono test_waveform with channel dimension |
| `(B, 1, T)` | batch of mono test_waveforms                     |

The input should be a floating point PyTorch tensor. Stereo audio should be converted to mono before scoring. `scoreq.score(test_wav)` returns a tensor of shape `(batch_size,)`, where each value is a predicted quality score.

For reference `ref` mode, a reference audio `ref_wav` must be provided: `scoreq.score(test_wav, ref_wav)`.

Note that `score()` and `forward()` return the same values. The only difference is that `score()` is decorated with `torch.no_grad()` for convenient inference. Since the raw TorchScript module exposes `forward()`, it is called directly as `scoreq(test_wav, ref_wav)` rather than through the package wrapper's `scoreq.score(test_wav, ref_wav)`.

**Batch size 1 is recommended to avoid padding-related score shifts.**

API classes:

| Class                 | Description                                     |
| --------------------- | ----------------------------------------------- |
| `SCOREQScoreTorch`    | PyTorch implementation using converted weights. |
| `SCOREQScoreScripted` | Wrapper around the TorchScript checkpoint.      |

## Citation

If you use this package, please cite the original SCOREQ paper:

```bibtex
@article{ragano2024scoreq,
  title={SCOREQ: Speech quality assessment with contrastive regression},
  author={Ragano, Alessandro and Skoglund, Jan and Hines, Andrew},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={105702--105729},
  year={2024}
}
```