langgz commited on
Commit
c1629cb
·
verified ·
1 Parent(s): dcbddeb

docs: add LLM quantization tiers table (q4km/q5km/q8)

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -21,6 +21,18 @@ pipeline_tag: automatic-speech-recognition
21
 
22
  GGUF build of **Fun-ASR-Nano** (SenseVoice SAN-M encoder + adaptor + **Qwen3-0.6B** LLM decoder) for the zero-Python, CPU/edge **[FunASR llama.cpp runtime](https://github.com/FunAudioLLM/Fun-ASR/tree/main/runtime/llama.cpp)** — the accuracy leader (LLM decoder), single C++ binary.
23
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ## Get it running (no Python, no build)
25
 
26
  These are GGUF weights for the **[FunASR llama.cpp runtime](https://github.com/modelscope/FunASR/tree/main/runtime/llama.cpp)** — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run:
 
21
 
22
  GGUF build of **Fun-ASR-Nano** (SenseVoice SAN-M encoder + adaptor + **Qwen3-0.6B** LLM decoder) for the zero-Python, CPU/edge **[FunASR llama.cpp runtime](https://github.com/FunAudioLLM/Fun-ASR/tree/main/runtime/llama.cpp)** — the accuracy leader (LLM decoder), single C++ binary.
23
 
24
+ ## LLM quantization (pick by size vs accuracy)
25
+
26
+ The Fun-ASR-Nano LLM (Qwen3-0.6B) ships in three tiers — all within 0.1% CER (184-file micro-CER). Pair any with `funasr-encoder-f16.gguf` (470 MB).
27
+
28
+ | LLM file | size | CER ↓ | speed |
29
+ |---|---|---|---|
30
+ | `qwen3-0.6b-q4km.gguf` | **484 MB** | 8.35% | 6.1× | smallest |
31
+ | `qwen3-0.6b-q5km.gguf` | 551 MB | **8.25%** | 5.7× | best accuracy |
32
+ | `qwen3-0.6b-q8_0.gguf` | 805 MB | 8.30% | 6.0× | |
33
+
34
+ Recommended: **q4_K_M** (smallest) or **q5_K_M** (best).
35
+
36
  ## Get it running (no Python, no build)
37
 
38
  These are GGUF weights for the **[FunASR llama.cpp runtime](https://github.com/modelscope/FunASR/tree/main/runtime/llama.cpp)** — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run: