|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# Model Card for SHEET Models |
|
|
|
|
|
This model card describes the models implemented in the [SHEET](https://github.com/unilight/sheet) toolkit trained using the training sets in MOS-Bench and benchmarked using the test sets in MOS-Bench. |
|
|
|
|
|
The task is subjective speech quality assessment (SSQA), which aims to predict the perceptual quality score of speech. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Developed by:** Wen-Chin Huang |
|
|
- **Model type:** SSL-MOS or AlignNet |
|
|
- **License:** MIT |
|
|
- **Repository:** [SHEET](https://github.com/unilight/sheet) |
|
|
- **Paper:** [[SHEET](https://arxiv.org/abs/2505.15061)] [[MOS-Bench (arXiv; 2024)](https://arxiv.org/abs/2411.03715)] |
|
|
- **Demo :** https://huggingface.co/spaces/unilight/sheet-demo |
|
|
|
|
|
## Uses |
|
|
|
|
|
Please refer to the [README in the sheet repo](https://github.com/unilight/sheet/tree/main/egs/bvcc) for more details. |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
The models are not yet ready to be used to replace subjective tests in scientific papers. They can however be used to compare systems in a heterogeneous way. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
Please refer to the [README in the sheet repo](https://github.com/unilight/sheet/tree/main/egs/bvcc) for more details. |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
|
|
#### Testing Data |
|
|
|
|
|
Please refer to the [`egs` folder in the sheet repo](https://github.com/unilight/sheet/tree/main/egs/bvcc) for more details. |
|
|
|
|
|
#### Metrics |
|
|
|
|
|
Commonly used metrics for SQA are MSE, LCC, SRCC and KTAU. A code snippet for calculating them can be found here: https://gist.github.com/unilight/883726c94640cca1f4d4068e29c3d20f |
|
|
|
|
|
Please refer to the [MOS-Bench (arXiv; 2024)](https://arxiv.org/abs/2411.03715) paper for details. |
|
|
|
|
|
### Results |
|
|
|
|
|
Please refer to the [MOS-Bench (arXiv; 2024)](https://arxiv.org/abs/2411.03715) paper for details. |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
|
|
|
|
``` |
|
|
@inproceedings{sheet, |
|
|
title = {{SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit}}, |
|
|
author = {Wen-Chin Huang and Erica Cooper and Tomoki Toda}, |
|
|
year = {2025}, |
|
|
booktitle = {{Proc. Interspeech}}, |
|
|
pages = {2355--2359}, |
|
|
} |
|
|
|
|
|
|
|
|
@article{huang2024, |
|
|
title={MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models}, |
|
|
author={Wen-Chin Huang and Erica Cooper and Tomoki Toda}, |
|
|
year={2024}, |
|
|
eprint={2411.03715}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.SD}, |
|
|
url={https://arxiv.org/abs/2411.03715}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
Wen-Chin Huang |
|
|
Nagoya University |
|
|
Email: wen.chinhuang@g.sp.m.is.nagoya-u.ac.jp |
|
|
GitHub: unilight |