Upload folder using huggingface_hub
Browse files- README.md +14 -10
- pocket-tts-q8_0.gguf +2 -2
README.md
CHANGED
|
@@ -28,31 +28,35 @@ Q8_0 quantized version of [kyutai/pocket-tts-without-voice-cloning](https://hugg
|
|
| 28 |
| | Original | GGUF Q8_0 |
|
| 29 |
|---|---|---|
|
| 30 |
| **File** | `tts_b6369a24.safetensors` | `pocket-tts-q8_0.gguf` |
|
| 31 |
-
| **Size** | 236 MB (BF16) |
|
| 32 |
| **Format** | safetensors | GGUF |
|
| 33 |
-
| **Reduction** | — |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
## Quantization
|
| 36 |
|
| 37 |
Per-block Q8_0 quantization (block size 32): 2-byte f16 scale + 32 int8 values per block.
|
| 38 |
|
| 39 |
-
**
|
| 40 |
-
|
| 41 |
-
**149 tensors kept as F32** — norms, biases, embeddings, SEANet convolutions, quantizer, and resampling convolutions.
|
| 42 |
|
| 43 |
-
|
| 44 |
|
| 45 |
-
|
| 46 |
|
| 47 |
-
|
| 48 |
|
| 49 |
-
|
| 50 |
|
| 51 |
## Files
|
| 52 |
|
| 53 |
| File | Size | Description |
|
| 54 |
|------|------|-------------|
|
| 55 |
-
| `pocket-tts-q8_0.gguf` |
|
| 56 |
| `tokenizer.model` | 58 KB | SentencePiece unigram tokenizer |
|
| 57 |
|
| 58 |
Voice embeddings are unchanged — use them from the [original repo](https://huggingface.co/kyutai/pocket-tts-without-voice-cloning/tree/main/embeddings_v2).
|
|
|
|
| 28 |
| | Original | GGUF Q8_0 |
|
| 29 |
|---|---|---|
|
| 30 |
| **File** | `tts_b6369a24.safetensors` | `pocket-tts-q8_0.gguf` |
|
| 31 |
+
| **Size** | 236 MB (BF16) | 128 MB |
|
| 32 |
| **Format** | safetensors | GGUF |
|
| 33 |
+
| **Reduction** | — | 46% |
|
| 34 |
+
|
| 35 |
+
## What's included
|
| 36 |
+
|
| 37 |
+
This GGUF contains the **TTS decoder pipeline only**: the transformer backbone, flow matching network, mimi decoder + decoder transformer, and the DummyQuantizer output projection.
|
| 38 |
+
|
| 39 |
+
The mimi **encoder** (SEANet encoder, encoder transformer, downsample conv) is **excluded** — TTS only needs the decoder path. This saves ~52 MB (28%) compared to a full-model GGUF.
|
| 40 |
|
| 41 |
## Quantization
|
| 42 |
|
| 43 |
Per-block Q8_0 quantization (block size 32): 2-byte f16 scale + 32 int8 values per block.
|
| 44 |
|
| 45 |
+
**56 tensors quantized** — all linear/projection weights in the transformer backbone, flow matching network, and mimi decoder transformer.
|
|
|
|
|
|
|
| 46 |
|
| 47 |
+
**114 tensors kept as F32** — norms, biases, embeddings, SEANet decoder convolutions, quantizer, and resampling convolutions.
|
| 48 |
|
| 49 |
+
Validation SQNR: >40 dB on all tensors.
|
| 50 |
|
| 51 |
+
## Runtime
|
| 52 |
|
| 53 |
+
Weights stay quantized as Q8_0 at runtime. Matmuls use a [tiled WASM SIMD128 quantized matmul kernel](https://github.com/ilnmtlbnm/candle/tree/quantized-matmul-wasm-simd-opt) (fork of candle) — achieving ~2x realtime on desktop (M-series Mac, Chrome).
|
| 54 |
|
| 55 |
## Files
|
| 56 |
|
| 57 |
| File | Size | Description |
|
| 58 |
|------|------|-------------|
|
| 59 |
+
| `pocket-tts-q8_0.gguf` | 128 MB | Model weights (Q8_0 + F32, decoder only) |
|
| 60 |
| `tokenizer.model` | 58 KB | SentencePiece unigram tokenizer |
|
| 61 |
|
| 62 |
Voice embeddings are unchanged — use them from the [original repo](https://huggingface.co/kyutai/pocket-tts-without-voice-cloning/tree/main/embeddings_v2).
|
pocket-tts-q8_0.gguf
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6861029e5a99fd082ce95854721b1f4a5097189a625a5fafa133c84c399ba304
|
| 3 |
+
size 134356064
|