Instructions to use idle-intelligence/pocket-tts-int8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Pocket-TTS
How to use idle-intelligence/pocket-tts-int8 with Pocket-TTS:
from pocket_tts import TTSModel import scipy.io.wavfile tts_model = TTSModel.load_model("idle-intelligence/pocket-tts-int8") voice_state = tts_model.get_state_for_audio_prompt( "hf://kyutai/tts-voices/alba-mackenna/casual.wav" ) audio = tts_model.generate_audio(voice_state, "Hello world, this is a test.") # Audio is a 1D torch tensor containing PCM data. scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy()) - Notebooks
- Google Colab
- Kaggle
Pocket TTS β INT8 Quantized
INT8 channel-wise quantized version of kyutai/pocket-tts-without-voice-cloning for browser-based TTS inference via WebAssembly.
Model Details
| Original | INT8 | |
|---|---|---|
| File | tts_b6369a24.safetensors |
model.safetensors |
| Size | 225 MB | 132 MB |
| Dtype | BF16 | I8 + BF16 scales |
| Reduction | β | 41% |
Quantization Format
Per-output-channel INT8 quantization with BF16 scale factors.
Each weight tensor foo is split into two tensors:
foo(I8) β quantized weight valuesfoo_scale(BF16) β one scale factor per output channel
Dequantization: weight_bf16 = weight_i8 * scale_bf16
See quantize_config.json for machine-readable metadata.
Files
| File | Size | Description |
|---|---|---|
model.safetensors |
132 MB | Quantized model weights |
tokenizer.model |
58 KB | SentencePiece unigram tokenizer |
quantize_config.json |
<1 KB | Quantization parameters |
Voice embeddings are unchanged β use them from the original repo.
Usage
This model is designed for use with tts-web, a browser-based TTS engine built with Candle (Rust ML framework) and WebAssembly. Dequantization happens at load time in memory.
Dequantization (Python)
import safetensors
from safetensors.numpy import load_file
import numpy as np
tensors = load_file("model.safetensors")
for name in list(tensors.keys()):
if name.endswith("_scale"):
continue
scale_name = f"{name}_scale"
if scale_name in tensors:
weight_i8 = tensors[name].astype(np.float32)
scale = tensors[scale_name].astype(np.float32)
tensors[name] = (weight_i8 * scale).astype(np.float16)
del tensors[scale_name]
Acknowledgments
Based on Kyutai's Pocket TTS β a 100M parameter text-to-speech model.
Disclaimer
This is an independent port by idle intelligence, not affiliated with or endorsed by Kyutai Labs.
License
CC-BY-4.0 (same as the original model).
- Downloads last month
- 5
Model tree for idle-intelligence/pocket-tts-int8
Base model
kyutai/pocket-tts-without-voice-cloning