Holler 0.6B (6-bit)

6-bit quantized version of sentiuminc/holler-0.6b. Smaller download (1.7 GB vs 2.3 GB), lower RAM, faster inference. This is what ivi uses for real-time voice responses.

For full documentation, all voice samples, and the training pipeline, see the bf16 model card or the GitHub repo.

Compared to bf16

Metric	bf16	6-bit (this)
Download	2.3 GB	1.7 GB
Metal RAM	~2.4 GB	~1.7 GB
RTF (16cb)	0.68	0.54
RTF (12cb)	—	0.47
TTFA (16cb)	~200ms	~170ms
TTFA (12cb)	—	~147ms
Quality	Best	Very close

Codebook Comparison

Same text, same 6-bit model — 16 codebooks (best quality) vs 12 codebooks (fastest). Listen and decide.

Kit

16 codebooks:

12 codebooks:

Dakota

16 codebooks:

12 codebooks:

Nora

16 codebooks:

12 codebooks:

Oliver

16 codebooks:

12 codebooks:

Tessa

16 codebooks:

12 codebooks:

Quick Start

CLI

git clone https://github.com/sentiuminc/holler.git && cd holler
./build.sh
./holler --6bit --text 'Hello world' --talk

Python Server

git clone https://github.com/sentiuminc/holler.git && cd holler
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

python3 inference/server.py -c sentiuminc/holler-0.6b-6bit
# → http://localhost:8100

HollerKit (Swift)

import HollerKit

let model = try await HollerModel.load(repo: "sentiuminc/holler-0.6b-6bit")
let audio = try await model.synthesize("Hello world", voice: "kit")

Stock mlx-audio

from mlx_audio.tts import load
model = load("sentiuminc/holler-0.6b-6bit")
audio = model.generate("Hello world", speaker="kit")

Codebooks

Use 16 codebooks for best quality (default), or 12 for maximum speed (~18% faster, negligible quality loss). Configurable at inference time — not baked into the weights.

Quantization

6-bit affine quantization with group size 64, applied via mlx_audio.convert. The speech_tokenizer/ is shared with the bf16 variant and is not quantized.

Attribution

Fine-tune of Qwen3-TTS by the Qwen team at Alibaba Cloud (Apache 2.0). All credit for the underlying architecture goes to them.

Downloads last month: 68

Safetensors

Model size

0.5B params

Tensor type

BF16

U32

MLX

Hardware compatibility

6-bit

Model tree for sentiuminc/holler-0.6b-6bit

Base model

sentiuminc/holler-0.6b

Quantized

(1)

this model