File size: 2,681 Bytes
5afca35
 
 
 
 
 
 
 
8676e76
 
5afca35
 
 
 
 
 
8676e76
5afca35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8676e76
 
 
5afca35
 
 
8676e76
5afca35
 
 
 
 
 
8676e76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5afca35
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
license: mit
---

# Model Card for SHEET Models

This model card describes the models implemented in the [SHEET](https://github.com/unilight/sheet) toolkit trained using the training sets in MOS-Bench and benchmarked using the test sets in MOS-Bench.

The task is subjective speech quality assessment (SSQA), which aims to predict the perceptual quality score of speech.

## Model Details

- **Developed by:** Wen-Chin Huang
- **Model type:** SSL-MOS or AlignNet
- **License:** MIT
- **Repository:** [SHEET](https://github.com/unilight/sheet)
- **Paper:** [[SHEET](https://arxiv.org/abs/2505.15061)] [[MOS-Bench (arXiv; 2024)](https://arxiv.org/abs/2411.03715)]
- **Demo :** https://huggingface.co/spaces/unilight/sheet-demo

## Uses

Please refer to the [README in the sheet repo](https://github.com/unilight/sheet/tree/main/egs/bvcc) for more details.

## Bias, Risks, and Limitations

The models are not yet ready to be used to replace subjective tests in scientific papers. They can however be used to compare systems in a heterogeneous way.

## How to Get Started with the Model

Please refer to the [README in the sheet repo](https://github.com/unilight/sheet/tree/main/egs/bvcc) for more details.

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

Please refer to the [`egs` folder in the sheet repo](https://github.com/unilight/sheet/tree/main/egs/bvcc) for more details.

#### Metrics

Commonly used metrics for SQA are MSE, LCC, SRCC and KTAU. A code snippet for calculating them can be found here: https://gist.github.com/unilight/883726c94640cca1f4d4068e29c3d20f

Please refer to the [MOS-Bench (arXiv; 2024)](https://arxiv.org/abs/2411.03715) paper for details.

### Results

Please refer to the [MOS-Bench (arXiv; 2024)](https://arxiv.org/abs/2411.03715) paper for details.


## Citation

**BibTeX:**

```
@inproceedings{sheet,
  title     = {{SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit}},
  author    = {Wen-Chin Huang and Erica Cooper and Tomoki Toda},
  year      = {2025},
  booktitle = {{Proc. Interspeech}},
  pages     = {2355--2359},
}


@article{huang2024,
      title={MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models}, 
      author={Wen-Chin Huang and Erica Cooper and Tomoki Toda},
      year={2024},
      eprint={2411.03715},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2411.03715}, 
}
```

## Model Card Contact

Wen-Chin Huang  
Nagoya University  
Email: wen.chinhuang@g.sp.m.is.nagoya-u.ac.jp  
GitHub: unilight