File size: 2,681 Bytes
5afca35 8676e76 5afca35 8676e76 5afca35 8676e76 5afca35 8676e76 5afca35 8676e76 5afca35 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
license: mit
---
# Model Card for SHEET Models
This model card describes the models implemented in the [SHEET](https://github.com/unilight/sheet) toolkit trained using the training sets in MOS-Bench and benchmarked using the test sets in MOS-Bench.
The task is subjective speech quality assessment (SSQA), which aims to predict the perceptual quality score of speech.
## Model Details
- **Developed by:** Wen-Chin Huang
- **Model type:** SSL-MOS or AlignNet
- **License:** MIT
- **Repository:** [SHEET](https://github.com/unilight/sheet)
- **Paper:** [[SHEET](https://arxiv.org/abs/2505.15061)] [[MOS-Bench (arXiv; 2024)](https://arxiv.org/abs/2411.03715)]
- **Demo :** https://huggingface.co/spaces/unilight/sheet-demo
## Uses
Please refer to the [README in the sheet repo](https://github.com/unilight/sheet/tree/main/egs/bvcc) for more details.
## Bias, Risks, and Limitations
The models are not yet ready to be used to replace subjective tests in scientific papers. They can however be used to compare systems in a heterogeneous way.
## How to Get Started with the Model
Please refer to the [README in the sheet repo](https://github.com/unilight/sheet/tree/main/egs/bvcc) for more details.
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
Please refer to the [`egs` folder in the sheet repo](https://github.com/unilight/sheet/tree/main/egs/bvcc) for more details.
#### Metrics
Commonly used metrics for SQA are MSE, LCC, SRCC and KTAU. A code snippet for calculating them can be found here: https://gist.github.com/unilight/883726c94640cca1f4d4068e29c3d20f
Please refer to the [MOS-Bench (arXiv; 2024)](https://arxiv.org/abs/2411.03715) paper for details.
### Results
Please refer to the [MOS-Bench (arXiv; 2024)](https://arxiv.org/abs/2411.03715) paper for details.
## Citation
**BibTeX:**
```
@inproceedings{sheet,
title = {{SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit}},
author = {Wen-Chin Huang and Erica Cooper and Tomoki Toda},
year = {2025},
booktitle = {{Proc. Interspeech}},
pages = {2355--2359},
}
@article{huang2024,
title={MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models},
author={Wen-Chin Huang and Erica Cooper and Tomoki Toda},
year={2024},
eprint={2411.03715},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2411.03715},
}
```
## Model Card Contact
Wen-Chin Huang
Nagoya University
Email: wen.chinhuang@g.sp.m.is.nagoya-u.ac.jp
GitHub: unilight |