docs: add install/run instructions (Releases + download script + landing) and q8

90c1c61 verified 8 days ago

2.5 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	library_name: gguf
	tags:
	- automatic-speech-recognition
	- asr
	- sensevoice
	- funasr
	- llama.cpp
	- ggml
	- cpu
	- chinese
	pipeline_tag: automatic-speech-recognition
	---

	# SenseVoiceSmall · GGUF (FunASR llama.cpp runtime)

	GGUF build of [SenseVoiceSmall](https://huggingface.co/FunAudioLLM/SenseVoiceSmall) (SAN-M encoder + CTC) for the zero-Python, CPU/edge [FunASR llama.cpp runtime](https://github.com/FunAudioLLM/SenseVoice/tree/main/runtime/llama.cpp) — multilingual ASR with language / emotion / event tags, ~20× real-time on CPU.

	## Get it running (no Python, no build)

	These are GGUF weights for the [FunASR llama.cpp runtime](https://github.com/modelscope/FunASR/tree/main/runtime/llama.cpp) — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run:

	- Prebuilt binaries (Linux / macOS / Windows) → [GitHub Releases](https://github.com/modelscope/FunASR/releases) (tag `runtime-llamacpp-v*`)
	- One-page quickstart & benchmarks → [funasr.com/llama-cpp](https://www.funasr.com/llama-cpp.html)

	```bash
	bash download-funasr-model.sh sensevoice ./gguf
	llama-funasr-sensevoice -m ./gguf/sensevoice-small-q8.gguf --vad ./gguf/fsmn-vad.gguf -a audio.wav
	# → 欢迎大家来体验达摩院推出的语音识别模型
	```

	## Files
	\| file \| size \| notes \|
	\|---\|---\|---\|
	\| `sensevoice-small-f16.gguf` \| 470 MB \| recommended (f16 matmul weights) \|
	\| `sensevoice-small-q8.gguf` \| ~235 MB \| recommended — half of f16, same accuracy \|
	\| `sensevoice-small.gguf` \| 936 MB \| f32 reference \|

	## Usage
	The binary prints transcription text directly (no Python detok). `--ids` for raw ids / `--keep-tags` for the lang/emotion tags.
	```bash
	# 1. get the VAD too (for long audio): huggingface-cli download FunAudioLLM/fsmn-vad-GGUF
	llama-funasr-sensevoice -m sensevoice-small-f16.gguf -a audio.wav --vad fsmn-vad.gguf
	```
	On CPU (8 threads) this reaches 8.01 % CER on the 184-clip Mandarin benchmark — vs whisper.cpp 22–31 %. See the [benchmark](https://github.com/FunAudioLLM/SenseVoice/blob/main/runtime/llama.cpp/BENCHMARKS.md).

	## Links
	- 🧩 Runtime & build: [SenseVoice · runtime/llama.cpp](https://github.com/FunAudioLLM/SenseVoice/tree/main/runtime/llama.cpp) — ⭐ Star [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)!
	- Source model: [FunAudioLLM/SenseVoiceSmall](https://huggingface.co/FunAudioLLM/SenseVoiceSmall)