cstr commited on
Commit
fefd4c0
·
verified ·
1 Parent(s): f35f4c7

Add model card README

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: audio-to-audio
4
+ tags:
5
+ - audio
6
+ - speech
7
+ - gguf
8
+ - tokenizer
9
+ - rvq
10
+ library_name: ggml
11
+ base_model: XiaomiMiMo/MiMo-Audio-Tokenizer
12
+ ---
13
+
14
+ # MiMo Audio Tokenizer (encoder only) -- GGUF
15
+
16
+ GGUF conversion of the **encoder** from [`XiaomiMiMo/MiMo-Audio-Tokenizer`](https://huggingface.co/XiaomiMiMo/MiMo-Audio-Tokenizer) for use with **[CrispStrobe/CrispASR](https://github.com/CrispStrobe/CrispASR)**.
17
+
18
+ ## Available variants
19
+
20
+ | File | Quant | Size | Notes |
21
+ |---|---|---|---|
22
+ | `mimo-tokenizer-q4_k.gguf` | Q4_K | 377 MB | Encoder + RVQ codebooks |
23
+
24
+ ## Model details
25
+
26
+ - **Architecture:** 32-layer transformer encoder (1280d, 20 heads) + Conv1d stem + 20 RVQ codebooks
27
+ - **Parameters:** ~600M (encoder only, decoder/vocoder excluded)
28
+ - **Audio:** 24kHz input, outputs RVQ tokens at 25 Hz (8 channels used by ASR)
29
+ - **License:** MIT
30
+ - **Source:** [`XiaomiMiMo/MiMo-Audio-Tokenizer`](https://huggingface.co/XiaomiMiMo/MiMo-Audio-Tokenizer)
31
+
32
+ ## Notes
33
+
34
+ - Only the encoder is included (waveform → RVQ tokens). Decoder/vocoder (TTS reconstruction) excluded.
35
+ - Used as the first stage of MiMo-V2.5-ASR pipeline (tokenizer → LLM)