MOSS-TTS-Realtime GGUF (codec only)

GGUF conversion of the MOSS-Audio-Tokenizer (full, 1.6B mono 24 kHz) used by OpenMOSS-Team/MOSS-TTS-Realtime, runnable via codec.cpp.

⚠️ The LLM-part is not converted yet — see Status.

Files

Codec-part (MOSS-Audio-Tokenizer, `moss_audio` arch, 16 RVQ codebooks × 1024, 24 kHz mono, ~1.6B params)

codec[-<quant>].gguf

File	Size
`codec-f32.gguf`	6770 MB
`codec-f16.gguf`	3387 MB
`codec-q8_0.gguf`	1802 MB
`codec-q5_k_m.gguf`	1170 MB
`codec-q4_k_m.gguf`	959 MB

Status

Codec-part only. The MOSS-TTS-Realtime LLM-part isn't shipped here — it would need a dedicated moss_tts_realtime arch in llama.cpp. The Qwen3 backbone (28 layers, 2048 hidden, vocab 151936) is by itself a standard qwen3 arch, but the model produces 16 RVQ codebooks per audio frame via a 4-layer local transformer (MossTTSRealtimeLocalTransformer) wired onto the Qwen3 trunk. That multi-codebook output isn't expressible in any existing llama.cpp arch and won't be glued together outside the model graph — splitting the global trunk into llama.cpp and the local transformer into application code defeats the point of having a unified inference graph.

If/when llama.cpp gains a moss_tts_realtime arch (or an upstream multi-RVQ-head abstraction), the LLM-part can land in this repo too. Until then, only the codec is here.

Notes

Source weights: OpenMOSS-Team/MOSS-Audio-Tokenizer
Full upstream Python pipeline: OpenMOSS-Team/MOSS-TTS-Realtime

Downloads last month: 167

GGUF

Model size

2B params

Architecture

moss_audio_tokenizer

Hardware compatibility

4-bit

5-bit

8-bit

16-bit

32-bit

Model tree for hans00/MOSS-TTS-Realtime-GGUF

Base model

OpenMOSS-Team/MOSS-TTS-Realtime

Quantized

(1)