Holler 0.6B (6-bit)

6-bit quantized version of sentiuminc/holler-0.6b. Smaller download (1.7 GB vs 2.3 GB), lower RAM, faster inference. This is what ivi uses for real-time voice responses.

For full documentation, all voice samples, and the training pipeline, see the bf16 model card or the GitHub repo.

Compared to bf16

Metric bf16 6-bit (this)
Download 2.3 GB 1.7 GB
Metal RAM ~2.4 GB ~1.7 GB
RTF (16cb) 0.68 0.54
RTF (12cb) โ€” 0.47
TTFA (16cb) ~200ms ~170ms
TTFA (12cb) โ€” ~147ms
Quality Best Very close

Codebook Comparison

Same text, same 6-bit model โ€” 16 codebooks (best quality) vs 12 codebooks (fastest). Listen and decide.

Kit

16 codebooks:

12 codebooks:

Dakota

16 codebooks:

12 codebooks:

Nora

16 codebooks:

12 codebooks:

Oliver

16 codebooks:

12 codebooks:

Tessa

16 codebooks:

12 codebooks:

Quick Start

CLI

git clone https://github.com/sentiuminc/holler.git && cd holler
./build.sh
./holler --6bit --text 'Hello world' --talk

Python Server

git clone https://github.com/sentiuminc/holler.git && cd holler
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

python3 inference/server.py -c sentiuminc/holler-0.6b-6bit
# โ†’ http://localhost:8100

HollerKit (Swift)

import HollerKit

let model = try await HollerModel.load(repo: "sentiuminc/holler-0.6b-6bit")
let audio = try await model.synthesize("Hello world", voice: "kit")

Stock mlx-audio

from mlx_audio.tts import load
model = load("sentiuminc/holler-0.6b-6bit")
audio = model.generate("Hello world", speaker="kit")

Codebooks

Use 16 codebooks for best quality (default), or 12 for maximum speed (~18% faster, negligible quality loss). Configurable at inference time โ€” not baked into the weights.

Quantization

6-bit affine quantization with group size 64, applied via mlx_audio.convert. The speech_tokenizer/ is shared with the bf16 variant and is not quantized.

Attribution

Fine-tune of Qwen3-TTS by the Qwen team at Alibaba Cloud (Apache 2.0). All credit for the underlying architecture goes to them.

Downloads last month
131
Safetensors
Model size
0.5B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sentiuminc/holler-0.6b-6bit

Quantized
(1)
this model