ilnmtlbnm commited on
Commit
520862d
·
verified ·
1 Parent(s): 74d4422

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +14 -10
  2. pocket-tts-q8_0.gguf +2 -2
README.md CHANGED
@@ -28,31 +28,35 @@ Q8_0 quantized version of [kyutai/pocket-tts-without-voice-cloning](https://hugg
28
  | | Original | GGUF Q8_0 |
29
  |---|---|---|
30
  | **File** | `tts_b6369a24.safetensors` | `pocket-tts-q8_0.gguf` |
31
- | **Size** | 236 MB (BF16) | 178 MB |
32
  | **Format** | safetensors | GGUF |
33
- | **Reduction** | — | 24% |
 
 
 
 
 
 
34
 
35
  ## Quantization
36
 
37
  Per-block Q8_0 quantization (block size 32): 2-byte f16 scale + 32 int8 values per block.
38
 
39
- **64 tensors quantized** (82% of params) — all linear/projection weights in the transformer backbone, flow matching network, and mimi codec transformers.
40
-
41
- **149 tensors kept as F32** — norms, biases, embeddings, SEANet convolutions, quantizer, and resampling convolutions.
42
 
43
- Validation SQNR: >43 dB on all non-zero tensors.
44
 
45
- ## Runtime note
46
 
47
- Weights are **dequantized to F32 at load time** and matmuls run through candle's optimized `gemm` kernels. This is because candle's `QMatMul` for quantized tensors currently uses a naive triple loop that is ~1.7x slower than `gemm`'s SIMD-tiled F32 matmul on WASM. The GGUF Q8_0 format still saves 25% on download size vs F32 safetensors.
48
 
49
- Once candle ships an optimized quantized matmul kernel (tiled, cache-blocked), we can keep weights quantized at runtime for additional memory bandwidth savings on mobile.
50
 
51
  ## Files
52
 
53
  | File | Size | Description |
54
  |------|------|-------------|
55
- | `pocket-tts-q8_0.gguf` | 178 MB | Model weights (Q8_0 + F32) |
56
  | `tokenizer.model` | 58 KB | SentencePiece unigram tokenizer |
57
 
58
  Voice embeddings are unchanged — use them from the [original repo](https://huggingface.co/kyutai/pocket-tts-without-voice-cloning/tree/main/embeddings_v2).
 
28
  | | Original | GGUF Q8_0 |
29
  |---|---|---|
30
  | **File** | `tts_b6369a24.safetensors` | `pocket-tts-q8_0.gguf` |
31
+ | **Size** | 236 MB (BF16) | 128 MB |
32
  | **Format** | safetensors | GGUF |
33
+ | **Reduction** | — | 46% |
34
+
35
+ ## What's included
36
+
37
+ This GGUF contains the **TTS decoder pipeline only**: the transformer backbone, flow matching network, mimi decoder + decoder transformer, and the DummyQuantizer output projection.
38
+
39
+ The mimi **encoder** (SEANet encoder, encoder transformer, downsample conv) is **excluded** — TTS only needs the decoder path. This saves ~52 MB (28%) compared to a full-model GGUF.
40
 
41
  ## Quantization
42
 
43
  Per-block Q8_0 quantization (block size 32): 2-byte f16 scale + 32 int8 values per block.
44
 
45
+ **56 tensors quantized** — all linear/projection weights in the transformer backbone, flow matching network, and mimi decoder transformer.
 
 
46
 
47
+ **114 tensors kept as F32** — norms, biases, embeddings, SEANet decoder convolutions, quantizer, and resampling convolutions.
48
 
49
+ Validation SQNR: >40 dB on all tensors.
50
 
51
+ ## Runtime
52
 
53
+ Weights stay quantized as Q8_0 at runtime. Matmuls use a [tiled WASM SIMD128 quantized matmul kernel](https://github.com/ilnmtlbnm/candle/tree/quantized-matmul-wasm-simd-opt) (fork of candle) — achieving ~2x realtime on desktop (M-series Mac, Chrome).
54
 
55
  ## Files
56
 
57
  | File | Size | Description |
58
  |------|------|-------------|
59
+ | `pocket-tts-q8_0.gguf` | 128 MB | Model weights (Q8_0 + F32, decoder only) |
60
  | `tokenizer.model` | 58 KB | SentencePiece unigram tokenizer |
61
 
62
  Voice embeddings are unchanged — use them from the [original repo](https://huggingface.co/kyutai/pocket-tts-without-voice-cloning/tree/main/embeddings_v2).
pocket-tts-q8_0.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:df46502b4a62abd4836ae10ca26ca9ef5de7215790d02b32f1e91555e6a5fe75
3
- size 186331968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6861029e5a99fd082ce95854721b1f4a5097189a625a5fafa133c84c399ba304
3
+ size 134356064