langgz's picture
docs: add install/run instructions (Releases + download script + landing) and q8
90c1c61 verified
|
Raw
History Blame Contribute Delete
2.5 kB
---
license: apache-2.0
language:
- zh
- en
library_name: gguf
tags:
- automatic-speech-recognition
- asr
- sensevoice
- funasr
- llama.cpp
- ggml
- cpu
- chinese
pipeline_tag: automatic-speech-recognition
---
# SenseVoiceSmall · GGUF (FunASR llama.cpp runtime)
GGUF build of **[SenseVoiceSmall](https://huggingface.co/FunAudioLLM/SenseVoiceSmall)** (SAN-M encoder + CTC) for the zero-Python, CPU/edge **[FunASR llama.cpp runtime](https://github.com/FunAudioLLM/SenseVoice/tree/main/runtime/llama.cpp)** — multilingual ASR with language / emotion / event tags, **~20× real-time on CPU**.
## Get it running (no Python, no build)
These are GGUF weights for the **[FunASR llama.cpp runtime](https://github.com/modelscope/FunASR/tree/main/runtime/llama.cpp)** — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run:
- **Prebuilt binaries (Linux / macOS / Windows) → [GitHub Releases](https://github.com/modelscope/FunASR/releases)** (tag `runtime-llamacpp-v*`)
- **One-page quickstart & benchmarks → [funasr.com/llama-cpp](https://www.funasr.com/llama-cpp.html)**
```bash
bash download-funasr-model.sh sensevoice ./gguf
llama-funasr-sensevoice -m ./gguf/sensevoice-small-q8.gguf --vad ./gguf/fsmn-vad.gguf -a audio.wav
# → 欢迎大家来体验达摩院推出的语音识别模型
```
## Files
| file | size | notes |
|---|---|---|
| `sensevoice-small-f16.gguf` | 470 MB | **recommended** (f16 matmul weights) |
| `sensevoice-small-q8.gguf` | ~235 MB | **recommended** — half of f16, same accuracy |
| `sensevoice-small.gguf` | 936 MB | f32 reference |
## Usage
The binary prints **transcription text** directly (no Python detok). `--ids` for raw ids / `--keep-tags` for the lang/emotion tags.
```bash
# 1. get the VAD too (for long audio): huggingface-cli download FunAudioLLM/fsmn-vad-GGUF
llama-funasr-sensevoice -m sensevoice-small-f16.gguf -a audio.wav --vad fsmn-vad.gguf
```
On CPU (8 threads) this reaches **8.01 % CER** on the 184-clip Mandarin benchmark — vs whisper.cpp 22–31 %. See the [benchmark](https://github.com/FunAudioLLM/SenseVoice/blob/main/runtime/llama.cpp/BENCHMARKS.md).
## Links
- 🧩 Runtime & build: **[SenseVoice · runtime/llama.cpp](https://github.com/FunAudioLLM/SenseVoice/tree/main/runtime/llama.cpp)** — ⭐ **Star [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)!**
- Source model: [FunAudioLLM/SenseVoiceSmall](https://huggingface.co/FunAudioLLM/SenseVoiceSmall)