docs: add install/run instructions (Releases + download script + landing) and q8

90c1c61 verified 8 days ago

2.5 kB

license: apache-2.0
language:
  - zh
  - en
library_name: gguf
tags:
  - automatic-speech-recognition
  - asr
  - sensevoice
  - funasr
  - llama.cpp
  - ggml
  - cpu
  - chinese
pipeline_tag: automatic-speech-recognition

SenseVoiceSmall · GGUF (FunASR llama.cpp runtime)

GGUF build of SenseVoiceSmall (SAN-M encoder + CTC) for the zero-Python, CPU/edge FunASR llama.cpp runtime — multilingual ASR with language / emotion / event tags, ~20× real-time on CPU.

Get it running (no Python, no build)

These are GGUF weights for the FunASR llama.cpp runtime — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run:

Prebuilt binaries (Linux / macOS / Windows) → GitHub Releases (tag runtime-llamacpp-v*)
One-page quickstart & benchmarks → funasr.com/llama-cpp

bash download-funasr-model.sh sensevoice ./gguf
llama-funasr-sensevoice -m ./gguf/sensevoice-small-q8.gguf --vad ./gguf/fsmn-vad.gguf -a audio.wav
# → 欢迎大家来体验达摩院推出的语音识别模型

Files

file	size	notes
`sensevoice-small-f16.gguf`	470 MB	recommended (f16 matmul weights)
`sensevoice-small-q8.gguf`	~235 MB	recommended — half of f16, same accuracy
`sensevoice-small.gguf`	936 MB	f32 reference

Usage

The binary prints transcription text directly (no Python detok). --ids for raw ids / --keep-tags for the lang/emotion tags.

# 1. get the VAD too (for long audio): huggingface-cli download FunAudioLLM/fsmn-vad-GGUF
llama-funasr-sensevoice -m sensevoice-small-f16.gguf -a audio.wav --vad fsmn-vad.gguf

On CPU (8 threads) this reaches 8.01 % CER on the 184-clip Mandarin benchmark — vs whisper.cpp 22–31 %. See the benchmark.

FunAudioLLM
/

SenseVoiceSmall-GGUF

SenseVoiceSmall · GGUF (FunASR llama.cpp runtime)

Get it running (no Python, no build)

Files

Usage

Links