metadata
license: mit
pipeline_tag: audio-to-audio
tags:
- audio
- speech
- gguf
- tokenizer
- rvq
library_name: ggml
base_model: XiaomiMiMo/MiMo-Audio-Tokenizer
MiMo Audio Tokenizer (encoder only) -- GGUF
GGUF conversion of the encoder from XiaomiMiMo/MiMo-Audio-Tokenizer for use with CrispStrobe/CrispASR.
Available variants
| File | Quant | Size | Notes |
|---|---|---|---|
mimo-tokenizer-q4_k.gguf |
Q4_K | 377 MB | Encoder + RVQ codebooks |
Model details
- Architecture: 32-layer transformer encoder (1280d, 20 heads) + Conv1d stem + 20 RVQ codebooks
- Parameters: ~600M (encoder only, decoder/vocoder excluded)
- Audio: 24kHz input, outputs RVQ tokens at 25 Hz (8 channels used by ASR)
- License: MIT
- Source:
XiaomiMiMo/MiMo-Audio-Tokenizer
Notes
- Only the encoder is included (waveform → RVQ tokens). Decoder/vocoder (TTS reconstruction) excluded.
- Used as the first stage of MiMo-V2.5-ASR pipeline (tokenizer → LLM)