OpenASR
/

sensevoice-small-onnx

Automatic Speech Recognition

speech-recognition

Model card Files Files and versions

sensevoice-small-onnx / README.md

OpenASR's picture

Initial sherpa-onnx SenseVoiceSmall INT8

ea1f7fb 23 days ago

|

history blame contribute delete

2.8 kB

	---
	license: other
	license_name: model-license
	license_link: https://github.com/modelscope/FunASR/blob/main/MODEL_LICENSE
	base_model:
	- FunAudioLLM/SenseVoiceSmall
	library_name: sherpa-onnx
	pipeline_tag: automatic-speech-recognition
	language:
	- zh
	- yue
	- en
	- ja
	- ko
	tags:
	- sensevoice
	- sherpa-onnx
	- onnx
	- int8
	- speech-recognition
	---

	# SenseVoiceSmall ONNX INT8 for sherpa-onnx

	This repository contains a sherpa-onnx compatible ONNX INT8 export of
	[`FunAudioLLM/SenseVoiceSmall`](https://huggingface.co/FunAudioLLM/SenseVoiceSmall).

	It is intended for local or embedded ONNX Runtime inference with sherpa-onnx. The model supports
	Mandarin, Cantonese, English, Japanese, Korean, auto language detection, inverse text
	normalization options, and the SenseVoice CTC output format.

	## Attribution

	Base model and upstream project:

	- Base model: https://huggingface.co/FunAudioLLM/SenseVoiceSmall
	- Upstream code: https://github.com/FunAudioLLM/SenseVoice
	- Upstream license: https://github.com/modelscope/FunASR/blob/main/MODEL_LICENSE

	This is a derivative export and is not an official FunAudioLLM release.

	## Files

	- `model.int8.onnx` - sherpa-onnx compatible INT8 ONNX model
	- `tokens.txt` - token table generated from the upstream SentencePiece model

	## Model Metadata

	The ONNX model includes sherpa-onnx runtime metadata, including:

	- `model_type=sense_voice_ctc`
	- `lfr_window_size=7`
	- `lfr_window_shift=6`
	- CMVN statistics: `neg_mean`, `inv_stddev`
	- language IDs for `auto`, `zh`, `en`, `yue`, `ja`, `ko`, `nospeech`
	- text normalization IDs for `with_itn` and `without_itn`
	- `vocab_size=25055`

	## Usage

	Install sherpa-onnx following the official documentation for your platform:

	```bash
	pip install sherpa-onnx
	```

	Example Python usage:

	```python
	import sherpa_onnx

	recognizer = sherpa_onnx.OfflineRecognizer.from_sense_voice(
	model="model.int8.onnx",
	tokens="tokens.txt",
	num_threads=4,
	use_itn=True,
	debug=False,
	)
	```

	Please adapt audio loading and resampling to your application. SenseVoice expects 16 kHz audio.

	## Reproduction

	This artifact was generated with OpenASR Model Factory:

	```powershell
	openasr-model-factory quantize-sensevoice `
	--input-dir downloads/FunAudioLLM/SenseVoiceSmall `
	--output-dir outputs/sensevoice-small-onnx
	```

	The export follows the sherpa-onnx SenseVoice layout:

	- ONNX inputs: `x`, `x_length`, `language`, `text_norm`
	- ONNX output: `logits`
	- Dynamic INT8 quantization for `MatMul` weights with `QUInt8`

	## Limitations

	- INT8 quantization may change recognition output compared with the original PyTorch model.
	- Validate accuracy and latency in your target environment before production use.
	- This artifact inherits upstream model limitations and license requirements.