| --- |
| license: apache-2.0 |
| language: |
| - zh |
| - en |
| library_name: gguf |
| tags: |
| - automatic-speech-recognition |
| - asr |
| - fun-asr |
| - funasr |
| - qwen3 |
| - llama.cpp |
| - ggml |
| - cpu |
| - chinese |
| pipeline_tag: automatic-speech-recognition |
| --- |
| |
| # Fun-ASR-Nano Β· GGUF (FunASR llama.cpp runtime) |
|
|
| GGUF build of **Fun-ASR-Nano** (SenseVoice SAN-M encoder + adaptor + **Qwen3-0.6B** LLM decoder) for the zero-Python, CPU/edge **[FunASR llama.cpp runtime](https://github.com/FunAudioLLM/Fun-ASR/tree/main/runtime/llama.cpp)** β the accuracy leader (LLM decoder), single C++ binary. |
|
|
| ## LLM quantization (pick by size vs accuracy) |
|
|
| The Fun-ASR-Nano LLM (Qwen3-0.6B) ships in three tiers β all within 0.1% CER (184-file micro-CER). Pair any with `funasr-encoder-f16.gguf` (470 MB). |
|
|
| | LLM file | size | CER β | speed | |
| |---|---|---|---| |
| | `qwen3-0.6b-q4km.gguf` | **484 MB** | 8.35% | 6.1Γ | smallest | |
| | `qwen3-0.6b-q5km.gguf` | 551 MB | **8.25%** | 5.7Γ | best accuracy | |
| | `qwen3-0.6b-q8_0.gguf` | 805 MB | 8.30% | 6.0Γ | | |
|
|
| Recommended: **q4_K_M** (smallest) or **q5_K_M** (best). |
|
|
| ## Get it running (no Python, no build) |
|
|
| These are GGUF weights for the **[FunASR llama.cpp runtime](https://github.com/modelscope/FunASR/tree/main/runtime/llama.cpp)** β a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run: |
|
|
| - **Prebuilt binaries (Linux / macOS / Windows) β [GitHub Releases](https://github.com/modelscope/FunASR/releases)** (tag `runtime-llamacpp-v*`) |
| - **One-page quickstart & benchmarks β [funasr.com/llama-cpp](https://www.funasr.com/llama-cpp.html)** |
|
|
| ```bash |
| bash download-funasr-model.sh nano ./gguf |
| llama-funasr-cli --enc ./gguf/funasr-encoder-f16.gguf -m ./gguf/qwen3-0.6b-q8_0.gguf --vad ./gguf/fsmn-vad.gguf -a audio.wav |
| ``` |
|
|
| ## Files |
| | file | size | notes | |
| |---|---|---| |
| | `funasr-encoder-f16.gguf` | 470 MB | audio encoder + adaptor (f16) | |
| | `qwen3-0.6b-q8_0.gguf` | 805 MB | LLM decoder, **recommended** (Q8_0) | |
| | `qwen3-0.6b-q4km.gguf` | 484 MB | LLM decoder, smaller (Q4_K_M) | |
| |
| ## Usage (needs both the encoder and the LLM gguf) |
| ```bash |
| llama-funasr-cli --enc funasr-encoder-f16.gguf -m qwen3-0.6b-q8_0.gguf -a audio.wav --vad fsmn-vad.gguf |
| ``` |
| On CPU: **8.30 % CER** on the 184-clip Mandarin benchmark (vs whisper.cpp 22β31 %). |
| |
| ## Links |
| - π§© Runtime & build: **[Fun-ASR Β· runtime/llama.cpp](https://github.com/FunAudioLLM/Fun-ASR/tree/main/runtime/llama.cpp)** β β **Star [Fun-ASR](https://github.com/FunAudioLLM/Fun-ASR)!** |
| - Source model: [FunAudioLLM/Fun-ASR-Nano-2512](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512) |
| |