---
license: other
license_name: model-license
license_link: https://github.com/modelscope/FunASR/blob/main/MODEL_LICENSE
base_model:
  - FunAudioLLM/SenseVoiceSmall
library_name: sherpa-onnx
pipeline_tag: automatic-speech-recognition
language:
  - zh
  - yue
  - en
  - ja
  - ko
tags:
  - sensevoice
  - sherpa-onnx
  - onnx
  - int8
  - speech-recognition
---

# SenseVoiceSmall ONNX INT8 for sherpa-onnx

This repository contains a sherpa-onnx compatible ONNX INT8 export of
[`FunAudioLLM/SenseVoiceSmall`](https://huggingface.co/FunAudioLLM/SenseVoiceSmall).

It is intended for local or embedded ONNX Runtime inference with sherpa-onnx. The model supports
Mandarin, Cantonese, English, Japanese, Korean, auto language detection, inverse text
normalization options, and the SenseVoice CTC output format.

## Attribution

Base model and upstream project:

- Base model: https://huggingface.co/FunAudioLLM/SenseVoiceSmall
- Upstream code: https://github.com/FunAudioLLM/SenseVoice
- Upstream license: https://github.com/modelscope/FunASR/blob/main/MODEL_LICENSE

This is a derivative export and is not an official FunAudioLLM release.

## Files

- `model.int8.onnx` - sherpa-onnx compatible INT8 ONNX model
- `tokens.txt` - token table generated from the upstream SentencePiece model

## Model Metadata

The ONNX model includes sherpa-onnx runtime metadata, including:

- `model_type=sense_voice_ctc`
- `lfr_window_size=7`
- `lfr_window_shift=6`
- CMVN statistics: `neg_mean`, `inv_stddev`
- language IDs for `auto`, `zh`, `en`, `yue`, `ja`, `ko`, `nospeech`
- text normalization IDs for `with_itn` and `without_itn`
- `vocab_size=25055`

## Usage

Install sherpa-onnx following the official documentation for your platform:

```bash
pip install sherpa-onnx
```

Example Python usage:

```python
import sherpa_onnx

recognizer = sherpa_onnx.OfflineRecognizer.from_sense_voice(
    model="model.int8.onnx",
    tokens="tokens.txt",
    num_threads=4,
    use_itn=True,
    debug=False,
)
```

Please adapt audio loading and resampling to your application. SenseVoice expects 16 kHz audio.

## Reproduction

This artifact was generated with OpenASR Model Factory:

```powershell
openasr-model-factory quantize-sensevoice `
  --input-dir downloads/FunAudioLLM/SenseVoiceSmall `
  --output-dir outputs/sensevoice-small-onnx
```

The export follows the sherpa-onnx SenseVoice layout:

- ONNX inputs: `x`, `x_length`, `language`, `text_norm`
- ONNX output: `logits`
- Dynamic INT8 quantization for `MatMul` weights with `QUInt8`

## Limitations

- INT8 quantization may change recognition output compared with the original PyTorch model.
- Validate accuracy and latency in your target environment before production use.
- This artifact inherits upstream model limitations and license requirements.