--- license: other license_name: model-license license_link: https://github.com/modelscope/FunASR/blob/main/MODEL_LICENSE base_model: - FunAudioLLM/SenseVoiceSmall library_name: sherpa-onnx pipeline_tag: automatic-speech-recognition language: - zh - yue - en - ja - ko tags: - sensevoice - sherpa-onnx - onnx - int8 - speech-recognition --- # SenseVoiceSmall ONNX INT8 for sherpa-onnx This repository contains a sherpa-onnx compatible ONNX INT8 export of [`FunAudioLLM/SenseVoiceSmall`](https://huggingface.co/FunAudioLLM/SenseVoiceSmall). It is intended for local or embedded ONNX Runtime inference with sherpa-onnx. The model supports Mandarin, Cantonese, English, Japanese, Korean, auto language detection, inverse text normalization options, and the SenseVoice CTC output format. ## Attribution Base model and upstream project: - Base model: https://huggingface.co/FunAudioLLM/SenseVoiceSmall - Upstream code: https://github.com/FunAudioLLM/SenseVoice - Upstream license: https://github.com/modelscope/FunASR/blob/main/MODEL_LICENSE This is a derivative export and is not an official FunAudioLLM release. ## Files - `model.int8.onnx` - sherpa-onnx compatible INT8 ONNX model - `tokens.txt` - token table generated from the upstream SentencePiece model ## Model Metadata The ONNX model includes sherpa-onnx runtime metadata, including: - `model_type=sense_voice_ctc` - `lfr_window_size=7` - `lfr_window_shift=6` - CMVN statistics: `neg_mean`, `inv_stddev` - language IDs for `auto`, `zh`, `en`, `yue`, `ja`, `ko`, `nospeech` - text normalization IDs for `with_itn` and `without_itn` - `vocab_size=25055` ## Usage Install sherpa-onnx following the official documentation for your platform: ```bash pip install sherpa-onnx ``` Example Python usage: ```python import sherpa_onnx recognizer = sherpa_onnx.OfflineRecognizer.from_sense_voice( model="model.int8.onnx", tokens="tokens.txt", num_threads=4, use_itn=True, debug=False, ) ``` Please adapt audio loading and resampling to your application. SenseVoice expects 16 kHz audio. ## Reproduction This artifact was generated with OpenASR Model Factory: ```powershell openasr-model-factory quantize-sensevoice ` --input-dir downloads/FunAudioLLM/SenseVoiceSmall ` --output-dir outputs/sensevoice-small-onnx ``` The export follows the sherpa-onnx SenseVoice layout: - ONNX inputs: `x`, `x_length`, `language`, `text_norm` - ONNX output: `logits` - Dynamic INT8 quantization for `MatMul` weights with `QUInt8` ## Limitations - INT8 quantization may change recognition output compared with the original PyTorch model. - Validate accuracy and latency in your target environment before production use. - This artifact inherits upstream model limitations and license requirements.