Pocket TTS β INT8 Quantized
INT8 channel-wise quantized version of kyutai/pocket-tts-without-voice-cloning for browser-based TTS inference via WebAssembly.
Model Details
| Original | INT8 | |
|---|---|---|
| File | tts_b6369a24.safetensors |
model.safetensors |
| Size | 225 MB | 132 MB |
| Dtype | BF16 | I8 + BF16 scales |
| Reduction | β | 41% |
Quantization Format
Per-output-channel INT8 quantization with BF16 scale factors.
Each weight tensor foo is split into two tensors:
foo(I8) β quantized weight valuesfoo_scale(BF16) β one scale factor per output channel
Dequantization: weight_bf16 = weight_i8 * scale_bf16
See quantize_config.json for machine-readable metadata.
Files
| File | Size | Description |
|---|---|---|
model.safetensors |
132 MB | Quantized model weights |
tokenizer.model |
58 KB | SentencePiece unigram tokenizer |
quantize_config.json |
<1 KB | Quantization parameters |
Voice embeddings are unchanged β use them from the original repo.
Usage
This model is designed for use with tts-web, a browser-based TTS engine built with Candle (Rust ML framework) and WebAssembly. Dequantization happens at load time in memory.
Dequantization (Python)
import safetensors
from safetensors.numpy import load_file
import numpy as np
tensors = load_file("model.safetensors")
for name in list(tensors.keys()):
if name.endswith("_scale"):
continue
scale_name = f"{name}_scale"
if scale_name in tensors:
weight_i8 = tensors[name].astype(np.float32)
scale = tensors[scale_name].astype(np.float32)
tensors[name] = (weight_i8 * scale).astype(np.float16)
del tensors[scale_name]
Acknowledgments
Based on Kyutai's Pocket TTS β a 100M parameter text-to-speech model.
Disclaimer
This is an independent port by idle intelligence, not affiliated with or endorsed by Kyutai Labs.
License
CC-BY-4.0 (same as the original model).
Model tree for idle-intelligence/pocket-tts-int8
Base model
kyutai/pocket-tts-without-voice-cloning