MOSS-TTS-Realtime GGUF (codec only)
GGUF conversion of the MOSS-Audio-Tokenizer (full, 1.6B mono 24 kHz) used by OpenMOSS-Team/MOSS-TTS-Realtime, runnable via codec.cpp.
โ ๏ธ The LLM-part is not converted yet โ see Status.
Files
Codec-part (MOSS-Audio-Tokenizer, moss_audio arch, 16 RVQ codebooks ร 1024, 24 kHz mono, ~1.6B params)
codec[-<quant>].gguf
| File | Size |
|---|---|
codec-f32.gguf |
6770 MB |
codec-f16.gguf |
3387 MB |
codec-q8_0.gguf |
1802 MB |
codec-q5_k_m.gguf |
1170 MB |
codec-q4_k_m.gguf |
959 MB |
Status
Codec-part only. The MOSS-TTS-Realtime LLM-part isn't shipped here โ it would need a dedicated moss_tts_realtime arch in llama.cpp. The Qwen3 backbone (28 layers, 2048 hidden, vocab 151936) is by itself a standard qwen3 arch, but the model produces 16 RVQ codebooks per audio frame via a 4-layer local transformer (MossTTSRealtimeLocalTransformer) wired onto the Qwen3 trunk. That multi-codebook output isn't expressible in any existing llama.cpp arch and won't be glued together outside the model graph โ splitting the global trunk into llama.cpp and the local transformer into application code defeats the point of having a unified inference graph.
If/when llama.cpp gains a moss_tts_realtime arch (or an upstream multi-RVQ-head abstraction), the LLM-part can land in this repo too. Until then, only the codec is here.
Notes
- Source weights:
OpenMOSS-Team/MOSS-Audio-Tokenizer - Full upstream Python pipeline:
OpenMOSS-Team/MOSS-TTS-Realtime
- Downloads last month
- 167
4-bit
5-bit
8-bit
16-bit
32-bit
Model tree for hans00/MOSS-TTS-Realtime-GGUF
Base model
OpenMOSS-Team/MOSS-TTS-Realtime