MOSS-TTS-v1.5 GGUF (openmoss-ggml format)

GGUF conversion of OpenMOSS-Team/MOSS-TTS-v1.5 for the pwilkin/openmoss moss-tts-server (a C++/GGML MOSS-TTS runtime that links libllama).

These are not standard llama.cpp GGUFs and vanilla llama.cpp will not run them. The model ships as two files: a plain Qwen3 backbone GGUF that libllama loads directly, plus an .extras.gguf sidecar (audio embedding tables, LM heads, codec encoder/RVQ/decoder, and the moss.* KV namespace) that only the moss-tts-server reader picks up.

Files

File Size Role
moss-tts-v1.5-q8_0.gguf 8.7 GB Qwen3-8B backbone, Q8_0 (token-embedding table kept f16)
moss-tts-v1.5-q8_0.extras.gguf 3.9 GB Audio embeddings, LM heads, codec, moss.* KV (not quantised)

Both files are required and must sit in the same directory. The server derives the sidecar name from the backbone by replacing .gguf with .extras.gguf, so keep the pair named X.gguf + X.extras.gguf.

Usage

moss-tts-server \
  --model moss-tts-v1.5-q8_0.gguf \
  --host 127.0.0.1 --port 8080 \
  --aux-cpu --no-webui

--aux-cpu keeps the codec on CPU, which is required on Apple Silicon Metal builds.

How it was made

Built with the converter in the openmoss tree:

python scripts/convert_hf_to_gguf.py \
  --moss-tts OpenMOSS-Team/MOSS-TTS-v1.5 \
  --codec    OpenMOSS-Team/MOSS-Audio-Tokenizer \
  --output   moss-tts-v1.5.gguf \
  --llama-cpp-dir /path/to/llama.cpp \
  --backbone-dtype f16

# Quantise the backbone; the embedding table is indexed via ggml_get_rows
# and must stay f16.
llama-quantize --token-embedding-type f16 \
  moss-tts-v1.5.gguf moss-tts-v1.5-q8_0.gguf Q8_0

The sidecar (.extras.gguf) is emitted alongside the backbone by the converter and is not quantised.

Model

MOSS-TTS-v1.5 is a weights-only fine-tune of MOSS-TTS 1.0: same moss_tts_delay architecture, Qwen3-8B language backbone, and MOSS-Audio-Tokenizer codec (24 kHz). It supports zero-shot voice cloning, long-form synthesis, token-level duration control, multilingual synthesis with language tags, and inline [pause X.Ys] control. See the base model card for the full feature walkthrough and input schema.

License and attribution

Apache-2.0, the same licence as the source model. Derived from OpenMOSS-Team/MOSS-TTS-v1.5. The only change is format conversion to GGUF and Q8_0 quantisation of the backbone; no weights were retrained.

Downloads last month
266
GGUF
Model size
2B params
Architecture
moss_tts_delay
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for smcleod/MOSS-TTS-v1.5-GGUF

Quantized
(1)
this model