mimo-tokenizer-GGUF / README.md
cstr's picture
Add model card README
fefd4c0 verified
metadata
license: mit
pipeline_tag: audio-to-audio
tags:
  - audio
  - speech
  - gguf
  - tokenizer
  - rvq
library_name: ggml
base_model: XiaomiMiMo/MiMo-Audio-Tokenizer

MiMo Audio Tokenizer (encoder only) -- GGUF

GGUF conversion of the encoder from XiaomiMiMo/MiMo-Audio-Tokenizer for use with CrispStrobe/CrispASR.

Available variants

File Quant Size Notes
mimo-tokenizer-q4_k.gguf Q4_K 377 MB Encoder + RVQ codebooks

Model details

  • Architecture: 32-layer transformer encoder (1280d, 20 heads) + Conv1d stem + 20 RVQ codebooks
  • Parameters: ~600M (encoder only, decoder/vocoder excluded)
  • Audio: 24kHz input, outputs RVQ tokens at 25 Hz (8 channels used by ASR)
  • License: MIT
  • Source: XiaomiMiMo/MiMo-Audio-Tokenizer

Notes

  • Only the encoder is included (waveform → RVQ tokens). Decoder/vocoder (TTS reconstruction) excluded.
  • Used as the first stage of MiMo-V2.5-ASR pipeline (tokenizer → LLM)