Paraformer-zh · GGUF (FunASR llama.cpp runtime)

GGUF build of FunASR's Paraformer-zh (SAN-M encoder + CIF predictor + SAN-M decoder, non-autoregressive) for the zero-Python, CPU/edge FunASR llama.cpp runtime — fast Mandarin ASR, ~21× real-time on CPU.

Files

file size notes
paraformer-f16.gguf 435 MB recommended (f16 matmul weights)
paraformer.gguf 863 MB f32 reference

Usage

The binary prints transcription text directly (no Python detok). --ids for raw ids.

llama-funasr-paraformer -m paraformer-f16.gguf -a audio.wav --vad fsmn-vad.gguf

On CPU (8 threads): 9.85 % CER on the 184-clip Mandarin benchmark (vs whisper.cpp 22–31 %).

Links

Downloads last month
-
GGUF
Model size
0.2B params
Architecture
paraformer
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support