--- license: mit pipeline_tag: audio-to-audio tags: - audio - speech - gguf - tokenizer - rvq library_name: ggml base_model: XiaomiMiMo/MiMo-Audio-Tokenizer --- # MiMo Audio Tokenizer (encoder only) -- GGUF GGUF conversion of the **encoder** from [`XiaomiMiMo/MiMo-Audio-Tokenizer`](https://huggingface.co/XiaomiMiMo/MiMo-Audio-Tokenizer) for use with **[CrispStrobe/CrispASR](https://github.com/CrispStrobe/CrispASR)**. ## Available variants | File | Quant | Size | Notes | |---|---|---|---| | `mimo-tokenizer-q4_k.gguf` | Q4_K | 377 MB | Encoder + RVQ codebooks | ## Model details - **Architecture:** 32-layer transformer encoder (1280d, 20 heads) + Conv1d stem + 20 RVQ codebooks - **Parameters:** ~600M (encoder only, decoder/vocoder excluded) - **Audio:** 24kHz input, outputs RVQ tokens at 25 Hz (8 channels used by ASR) - **License:** MIT - **Source:** [`XiaomiMiMo/MiMo-Audio-Tokenizer`](https://huggingface.co/XiaomiMiMo/MiMo-Audio-Tokenizer) ## Notes - Only the encoder is included (waveform → RVQ tokens). Decoder/vocoder (TTS reconstruction) excluded. - Used as the first stage of MiMo-V2.5-ASR pipeline (tokenizer → LLM)