Add model card README
Browse files
README.md
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
pipeline_tag: audio-to-audio
|
| 4 |
+
tags:
|
| 5 |
+
- audio
|
| 6 |
+
- speech
|
| 7 |
+
- gguf
|
| 8 |
+
- tokenizer
|
| 9 |
+
- rvq
|
| 10 |
+
library_name: ggml
|
| 11 |
+
base_model: XiaomiMiMo/MiMo-Audio-Tokenizer
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# MiMo Audio Tokenizer (encoder only) -- GGUF
|
| 15 |
+
|
| 16 |
+
GGUF conversion of the **encoder** from [`XiaomiMiMo/MiMo-Audio-Tokenizer`](https://huggingface.co/XiaomiMiMo/MiMo-Audio-Tokenizer) for use with **[CrispStrobe/CrispASR](https://github.com/CrispStrobe/CrispASR)**.
|
| 17 |
+
|
| 18 |
+
## Available variants
|
| 19 |
+
|
| 20 |
+
| File | Quant | Size | Notes |
|
| 21 |
+
|---|---|---|---|
|
| 22 |
+
| `mimo-tokenizer-q4_k.gguf` | Q4_K | 377 MB | Encoder + RVQ codebooks |
|
| 23 |
+
|
| 24 |
+
## Model details
|
| 25 |
+
|
| 26 |
+
- **Architecture:** 32-layer transformer encoder (1280d, 20 heads) + Conv1d stem + 20 RVQ codebooks
|
| 27 |
+
- **Parameters:** ~600M (encoder only, decoder/vocoder excluded)
|
| 28 |
+
- **Audio:** 24kHz input, outputs RVQ tokens at 25 Hz (8 channels used by ASR)
|
| 29 |
+
- **License:** MIT
|
| 30 |
+
- **Source:** [`XiaomiMiMo/MiMo-Audio-Tokenizer`](https://huggingface.co/XiaomiMiMo/MiMo-Audio-Tokenizer)
|
| 31 |
+
|
| 32 |
+
## Notes
|
| 33 |
+
|
| 34 |
+
- Only the encoder is included (waveform → RVQ tokens). Decoder/vocoder (TTS reconstruction) excluded.
|
| 35 |
+
- Used as the first stage of MiMo-V2.5-ASR pipeline (tokenizer → LLM)
|