mimo-tokenizer-GGUF / README.md
cstr's picture
Add model card README
fefd4c0 verified
---
license: mit
pipeline_tag: audio-to-audio
tags:
- audio
- speech
- gguf
- tokenizer
- rvq
library_name: ggml
base_model: XiaomiMiMo/MiMo-Audio-Tokenizer
---
# MiMo Audio Tokenizer (encoder only) -- GGUF
GGUF conversion of the **encoder** from [`XiaomiMiMo/MiMo-Audio-Tokenizer`](https://huggingface.co/XiaomiMiMo/MiMo-Audio-Tokenizer) for use with **[CrispStrobe/CrispASR](https://github.com/CrispStrobe/CrispASR)**.
## Available variants
| File | Quant | Size | Notes |
|---|---|---|---|
| `mimo-tokenizer-q4_k.gguf` | Q4_K | 377 MB | Encoder + RVQ codebooks |
## Model details
- **Architecture:** 32-layer transformer encoder (1280d, 20 heads) + Conv1d stem + 20 RVQ codebooks
- **Parameters:** ~600M (encoder only, decoder/vocoder excluded)
- **Audio:** 24kHz input, outputs RVQ tokens at 25 Hz (8 channels used by ASR)
- **License:** MIT
- **Source:** [`XiaomiMiMo/MiMo-Audio-Tokenizer`](https://huggingface.co/XiaomiMiMo/MiMo-Audio-Tokenizer)
## Notes
- Only the encoder is included (waveform → RVQ tokens). Decoder/vocoder (TTS reconstruction) excluded.
- Used as the first stage of MiMo-V2.5-ASR pipeline (tokenizer → LLM)