Docs: Update README to include benchmark results
Browse files
README.md
CHANGED
|
@@ -85,6 +85,18 @@ python -m moss_tts_delay.llama_cpp \
|
|
| 85 |
|
| 86 |
For full setup instructions (including building the C bridge, configuration options, and installation profiles), see the [llama.cpp Backend documentation](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_delay/llama_cpp/README.md).
|
| 87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
### Main Repositories
|
| 89 |
|
| 90 |
| Repository | Description |
|
|
|
|
| 85 |
|
| 86 |
For full setup instructions (including building the C bridge, configuration options, and installation profiles), see the [llama.cpp Backend documentation](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_delay/llama_cpp/README.md).
|
| 87 |
|
| 88 |
+
### Quantization Benchmark
|
| 89 |
+
|
| 90 |
+
Quantization quality evaluated on [Seed-TTS-eval](https://github.com/BytedanceSpeech/seed-tts-eval) zero-shot benchmark. Baseline is the original HuggingFace model; GGUF variants use the llama.cpp backend with TensorRT audio tokenizer.
|
| 91 |
+
|
| 92 |
+
| Quantization | EN WER (%) ↓ | EN SIM (%) ↑ | ZH CER (%) ↓ | ZH SIM (%) ↑ |
|
| 93 |
+
|---|---:|---:|---:|---:|
|
| 94 |
+
| Baseline (HuggingFace) | 1.79 | 71.46 | 1.32 | 77.05 |
|
| 95 |
+
| Q8_0 | 3.21 | 68.61 | 1.56 | 76.03 |
|
| 96 |
+
| Q6_K | 3.11 | 68.77 | 1.44 | 76.06 |
|
| 97 |
+
| Q5_K_M | 2.95 | 68.55 | 1.50 | 75.96 |
|
| 98 |
+
| Q4_K_M | 2.83 | 68.15 | 1.58 | 75.71 |
|
| 99 |
+
|
| 100 |
### Main Repositories
|
| 101 |
|
| 102 |
| Repository | Description |
|