MOSS-TTS-v1.5 GGUF (openmoss-ggml format)
GGUF conversion of OpenMOSS-Team/MOSS-TTS-v1.5 for the pwilkin/openmoss moss-tts-server (a C++/GGML MOSS-TTS runtime that links libllama).
These are not standard llama.cpp GGUFs and vanilla llama.cpp will not run them. The model ships as two files: a plain Qwen3 backbone GGUF that libllama loads directly, plus an .extras.gguf sidecar (audio embedding tables, LM heads, codec encoder/RVQ/decoder, and the moss.* KV namespace) that only the moss-tts-server reader picks up.
Files
| File | Size | Role |
|---|---|---|
moss-tts-v1.5-q8_0.gguf |
8.7 GB | Qwen3-8B backbone, Q8_0 (token-embedding table kept f16) |
moss-tts-v1.5-q8_0.extras.gguf |
3.9 GB | Audio embeddings, LM heads, codec, moss.* KV (not quantised) |
Both files are required and must sit in the same directory. The server derives the sidecar name from the backbone by replacing .gguf with .extras.gguf, so keep the pair named X.gguf + X.extras.gguf.
Usage
moss-tts-server \
--model moss-tts-v1.5-q8_0.gguf \
--host 127.0.0.1 --port 8080 \
--aux-cpu --no-webui
--aux-cpu keeps the codec on CPU, which is required on Apple Silicon Metal builds.
How it was made
Built with the converter in the openmoss tree:
python scripts/convert_hf_to_gguf.py \
--moss-tts OpenMOSS-Team/MOSS-TTS-v1.5 \
--codec OpenMOSS-Team/MOSS-Audio-Tokenizer \
--output moss-tts-v1.5.gguf \
--llama-cpp-dir /path/to/llama.cpp \
--backbone-dtype f16
# Quantise the backbone; the embedding table is indexed via ggml_get_rows
# and must stay f16.
llama-quantize --token-embedding-type f16 \
moss-tts-v1.5.gguf moss-tts-v1.5-q8_0.gguf Q8_0
The sidecar (.extras.gguf) is emitted alongside the backbone by the converter and is not quantised.
Model
MOSS-TTS-v1.5 is a weights-only fine-tune of MOSS-TTS 1.0: same moss_tts_delay architecture, Qwen3-8B language backbone, and MOSS-Audio-Tokenizer codec (24 kHz). It supports zero-shot voice cloning, long-form synthesis, token-level duration control, multilingual synthesis with language tags, and inline [pause X.Ys] control. See the base model card for the full feature walkthrough and input schema.
License and attribution
Apache-2.0, the same licence as the source model. Derived from OpenMOSS-Team/MOSS-TTS-v1.5. The only change is format conversion to GGUF and Q8_0 quantisation of the backbone; no weights were retrained.
- Downloads last month
- 266
8-bit
Model tree for smcleod/MOSS-TTS-v1.5-GGUF
Base model
OpenMOSS-Team/MOSS-TTS-v1.5