Pocket TTS β€” INT8 Quantized

INT8 channel-wise quantized version of kyutai/pocket-tts-without-voice-cloning for browser-based TTS inference via WebAssembly.

Try the demo β†’

Model Details

Original INT8
File tts_b6369a24.safetensors model.safetensors
Size 225 MB 132 MB
Dtype BF16 I8 + BF16 scales
Reduction β€” 41%

Quantization Format

Per-output-channel INT8 quantization with BF16 scale factors.

Each weight tensor foo is split into two tensors:

  • foo (I8) β€” quantized weight values
  • foo_scale (BF16) β€” one scale factor per output channel

Dequantization: weight_bf16 = weight_i8 * scale_bf16

See quantize_config.json for machine-readable metadata.

Files

File Size Description
model.safetensors 132 MB Quantized model weights
tokenizer.model 58 KB SentencePiece unigram tokenizer
quantize_config.json <1 KB Quantization parameters

Voice embeddings are unchanged β€” use them from the original repo.

Usage

This model is designed for use with tts-web, a browser-based TTS engine built with Candle (Rust ML framework) and WebAssembly. Dequantization happens at load time in memory.

Dequantization (Python)

import safetensors
from safetensors.numpy import load_file
import numpy as np

tensors = load_file("model.safetensors")

for name in list(tensors.keys()):
    if name.endswith("_scale"):
        continue
    scale_name = f"{name}_scale"
    if scale_name in tensors:
        weight_i8 = tensors[name].astype(np.float32)
        scale = tensors[scale_name].astype(np.float32)
        tensors[name] = (weight_i8 * scale).astype(np.float16)
        del tensors[scale_name]

Acknowledgments

Based on Kyutai's Pocket TTS β€” a 100M parameter text-to-speech model.

Disclaimer

This is an independent port by idle intelligence, not affiliated with or endorsed by Kyutai Labs.

License

CC-BY-4.0 (same as the original model).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for idle-intelligence/pocket-tts-int8

Quantized
(2)
this model