docs: add LLM quantization tiers table (q4km/q5km/q8)

c1629cb verified 9 days ago

2.61 kB

license: apache-2.0
language:
  - zh
  - en
library_name: gguf
tags:
  - automatic-speech-recognition
  - asr
  - fun-asr
  - funasr
  - qwen3
  - llama.cpp
  - ggml
  - cpu
  - chinese
pipeline_tag: automatic-speech-recognition

Fun-ASR-Nano · GGUF (FunASR llama.cpp runtime)

GGUF build of Fun-ASR-Nano (SenseVoice SAN-M encoder + adaptor + Qwen3-0.6B LLM decoder) for the zero-Python, CPU/edge FunASR llama.cpp runtime — the accuracy leader (LLM decoder), single C++ binary.

LLM quantization (pick by size vs accuracy)

The Fun-ASR-Nano LLM (Qwen3-0.6B) ships in three tiers — all within 0.1% CER (184-file micro-CER). Pair any with funasr-encoder-f16.gguf (470 MB).

LLM file	size	CER ↓	speed
`qwen3-0.6b-q4km.gguf`	484 MB	8.35%	6.1×
`qwen3-0.6b-q5km.gguf`	551 MB	8.25%	5.7×
`qwen3-0.6b-q8_0.gguf`	805 MB	8.30%	6.0×

Recommended: q4_K_M (smallest) or q5_K_M (best).

Get it running (no Python, no build)

These are GGUF weights for the FunASR llama.cpp runtime — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run:

Prebuilt binaries (Linux / macOS / Windows) → GitHub Releases (tag runtime-llamacpp-v*)
One-page quickstart & benchmarks → funasr.com/llama-cpp

bash download-funasr-model.sh nano ./gguf
llama-funasr-cli --enc ./gguf/funasr-encoder-f16.gguf -m ./gguf/qwen3-0.6b-q8_0.gguf --vad ./gguf/fsmn-vad.gguf -a audio.wav

Files

file	size	notes
`funasr-encoder-f16.gguf`	470 MB	audio encoder + adaptor (f16)
`qwen3-0.6b-q8_0.gguf`	805 MB	LLM decoder, recommended (Q8_0)
`qwen3-0.6b-q4km.gguf`	484 MB	LLM decoder, smaller (Q4_K_M)

Usage (needs both the encoder and the LLM gguf)

llama-funasr-cli --enc funasr-encoder-f16.gguf -m qwen3-0.6b-q8_0.gguf -a audio.wav --vad fsmn-vad.gguf

On CPU: 8.30 % CER on the 184-clip Mandarin benchmark (vs whisper.cpp 22–31 %).

FunAudioLLM
/

Fun-ASR-Nano-GGUF

Fun-ASR-Nano · GGUF (FunASR llama.cpp runtime)

LLM quantization (pick by size vs accuracy)

Get it running (no Python, no build)

Files

Usage (needs both the encoder and the LLM gguf)

Links