FunAudioLLM
/

Fun-ASR-Nano-GGUF

@@ -21,6 +21,18 @@ pipeline_tag: automatic-speech-recognition
 GGUF build of **Fun-ASR-Nano** (SenseVoice SAN-M encoder + adaptor + **Qwen3-0.6B** LLM decoder) for the zero-Python, CPU/edge **[FunASR llama.cpp runtime](https://github.com/FunAudioLLM/Fun-ASR/tree/main/runtime/llama.cpp)** — the accuracy leader (LLM decoder), single C++ binary.
 ## Get it running (no Python, no build)
 These are GGUF weights for the **[FunASR llama.cpp runtime](https://github.com/modelscope/FunASR/tree/main/runtime/llama.cpp)** — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run:

 GGUF build of **Fun-ASR-Nano** (SenseVoice SAN-M encoder + adaptor + **Qwen3-0.6B** LLM decoder) for the zero-Python, CPU/edge **[FunASR llama.cpp runtime](https://github.com/FunAudioLLM/Fun-ASR/tree/main/runtime/llama.cpp)** — the accuracy leader (LLM decoder), single C++ binary.
+## LLM quantization (pick by size vs accuracy)
+The Fun-ASR-Nano LLM (Qwen3-0.6B) ships in three tiers — all within 0.1% CER (184-file micro-CER). Pair any with `funasr-encoder-f16.gguf` (470 MB).
+| LLM file | size | CER ↓ | speed |
+|---|---|---|---|
+| `qwen3-0.6b-q4km.gguf` | **484 MB** | 8.35% | 6.1× | smallest |
+| `qwen3-0.6b-q5km.gguf` | 551 MB | **8.25%** | 5.7× | best accuracy |
+| `qwen3-0.6b-q8_0.gguf` | 805 MB | 8.30% | 6.0× | |
+Recommended: **q4_K_M** (smallest) or **q5_K_M** (best).
 ## Get it running (no Python, no build)
 These are GGUF weights for the **[FunASR llama.cpp runtime](https://github.com/modelscope/FunASR/tree/main/runtime/llama.cpp)** — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run: