docs: add LLM quantization tiers table (q4km/q5km/q8)
Browse files
README.md
CHANGED
|
@@ -21,6 +21,18 @@ pipeline_tag: automatic-speech-recognition
|
|
| 21 |
|
| 22 |
GGUF build of **Fun-ASR-Nano** (SenseVoice SAN-M encoder + adaptor + **Qwen3-0.6B** LLM decoder) for the zero-Python, CPU/edge **[FunASR llama.cpp runtime](https://github.com/FunAudioLLM/Fun-ASR/tree/main/runtime/llama.cpp)** — the accuracy leader (LLM decoder), single C++ binary.
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
## Get it running (no Python, no build)
|
| 25 |
|
| 26 |
These are GGUF weights for the **[FunASR llama.cpp runtime](https://github.com/modelscope/FunASR/tree/main/runtime/llama.cpp)** — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run:
|
|
|
|
| 21 |
|
| 22 |
GGUF build of **Fun-ASR-Nano** (SenseVoice SAN-M encoder + adaptor + **Qwen3-0.6B** LLM decoder) for the zero-Python, CPU/edge **[FunASR llama.cpp runtime](https://github.com/FunAudioLLM/Fun-ASR/tree/main/runtime/llama.cpp)** — the accuracy leader (LLM decoder), single C++ binary.
|
| 23 |
|
| 24 |
+
## LLM quantization (pick by size vs accuracy)
|
| 25 |
+
|
| 26 |
+
The Fun-ASR-Nano LLM (Qwen3-0.6B) ships in three tiers — all within 0.1% CER (184-file micro-CER). Pair any with `funasr-encoder-f16.gguf` (470 MB).
|
| 27 |
+
|
| 28 |
+
| LLM file | size | CER ↓ | speed |
|
| 29 |
+
|---|---|---|---|
|
| 30 |
+
| `qwen3-0.6b-q4km.gguf` | **484 MB** | 8.35% | 6.1× | smallest |
|
| 31 |
+
| `qwen3-0.6b-q5km.gguf` | 551 MB | **8.25%** | 5.7× | best accuracy |
|
| 32 |
+
| `qwen3-0.6b-q8_0.gguf` | 805 MB | 8.30% | 6.0× | |
|
| 33 |
+
|
| 34 |
+
Recommended: **q4_K_M** (smallest) or **q5_K_M** (best).
|
| 35 |
+
|
| 36 |
## Get it running (no Python, no build)
|
| 37 |
|
| 38 |
These are GGUF weights for the **[FunASR llama.cpp runtime](https://github.com/modelscope/FunASR/tree/main/runtime/llama.cpp)** — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run:
|