OpenMOSS-Team
/

MOSS-TTS-GGUF

@@ -85,6 +85,18 @@ python -m moss_tts_delay.llama_cpp \
 For full setup instructions (including building the C bridge, configuration options, and installation profiles), see the [llama.cpp Backend documentation](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_delay/llama_cpp/README.md).
 ### Main Repositories
 | Repository | Description |

 For full setup instructions (including building the C bridge, configuration options, and installation profiles), see the [llama.cpp Backend documentation](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_delay/llama_cpp/README.md).
+### Quantization Benchmark
+Quantization quality evaluated on [Seed-TTS-eval](https://github.com/BytedanceSpeech/seed-tts-eval) zero-shot benchmark. Baseline is the original HuggingFace model; GGUF variants use the llama.cpp backend with TensorRT audio tokenizer.
+| Quantization | EN WER (%) ↓ | EN SIM (%) ↑ | ZH CER (%) ↓ | ZH SIM (%) ↑ |
+|---|---:|---:|---:|---:|
+| Baseline (HuggingFace) | 1.79 | 71.46 | 1.32 | 77.05 |
+| Q8_0 | 3.21 | 68.61 | 1.56 | 76.03 |
+| Q6_K | 3.11 | 68.77 | 1.44 | 76.06 |
+| Q5_K_M | 2.95 | 68.55 | 1.50 | 75.96 |
+| Q4_K_M | 2.83 | 68.15 | 1.58 | 75.71 |
 ### Main Repositories
 | Repository | Description |