MOSS-TTS-Realtime GGUF (codec only)

GGUF conversion of the MOSS-Audio-Tokenizer (full, 1.6B mono 24 kHz) used by OpenMOSS-Team/MOSS-TTS-Realtime, runnable via codec.cpp.

โš ๏ธ The LLM-part is not converted yet โ€” see Status.

Files

Codec-part (MOSS-Audio-Tokenizer, moss_audio arch, 16 RVQ codebooks ร— 1024, 24 kHz mono, ~1.6B params)

codec[-<quant>].gguf

File Size
codec-f32.gguf 6770 MB
codec-f16.gguf 3387 MB
codec-q8_0.gguf 1802 MB
codec-q5_k_m.gguf 1170 MB
codec-q4_k_m.gguf 959 MB

Status

Codec-part only. The MOSS-TTS-Realtime LLM-part isn't shipped here โ€” it would need a dedicated moss_tts_realtime arch in llama.cpp. The Qwen3 backbone (28 layers, 2048 hidden, vocab 151936) is by itself a standard qwen3 arch, but the model produces 16 RVQ codebooks per audio frame via a 4-layer local transformer (MossTTSRealtimeLocalTransformer) wired onto the Qwen3 trunk. That multi-codebook output isn't expressible in any existing llama.cpp arch and won't be glued together outside the model graph โ€” splitting the global trunk into llama.cpp and the local transformer into application code defeats the point of having a unified inference graph.

If/when llama.cpp gains a moss_tts_realtime arch (or an upstream multi-RVQ-head abstraction), the LLM-part can land in this repo too. Until then, only the codec is here.

Notes

Downloads last month
167
GGUF
Model size
2B params
Architecture
moss_audio_tokenizer
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hans00/MOSS-TTS-Realtime-GGUF

Quantized
(1)
this model