| --- |
| license: other |
| license_name: model-license |
| license_link: https://github.com/modelscope/FunASR/blob/main/MODEL_LICENSE |
| base_model: |
| - FunAudioLLM/SenseVoiceSmall |
| library_name: sherpa-onnx |
| pipeline_tag: automatic-speech-recognition |
| language: |
| - zh |
| - yue |
| - en |
| - ja |
| - ko |
| tags: |
| - sensevoice |
| - sherpa-onnx |
| - onnx |
| - int8 |
| - speech-recognition |
| --- |
| |
| # SenseVoiceSmall ONNX INT8 for sherpa-onnx |
|
|
| This repository contains a sherpa-onnx compatible ONNX INT8 export of |
| [`FunAudioLLM/SenseVoiceSmall`](https://huggingface.co/FunAudioLLM/SenseVoiceSmall). |
|
|
| It is intended for local or embedded ONNX Runtime inference with sherpa-onnx. The model supports |
| Mandarin, Cantonese, English, Japanese, Korean, auto language detection, inverse text |
| normalization options, and the SenseVoice CTC output format. |
|
|
| ## Attribution |
|
|
| Base model and upstream project: |
|
|
| - Base model: https://huggingface.co/FunAudioLLM/SenseVoiceSmall |
| - Upstream code: https://github.com/FunAudioLLM/SenseVoice |
| - Upstream license: https://github.com/modelscope/FunASR/blob/main/MODEL_LICENSE |
| |
| This is a derivative export and is not an official FunAudioLLM release. |
| |
| ## Files |
| |
| - `model.int8.onnx` - sherpa-onnx compatible INT8 ONNX model |
| - `tokens.txt` - token table generated from the upstream SentencePiece model |
| |
| ## Model Metadata |
| |
| The ONNX model includes sherpa-onnx runtime metadata, including: |
| |
| - `model_type=sense_voice_ctc` |
| - `lfr_window_size=7` |
| - `lfr_window_shift=6` |
| - CMVN statistics: `neg_mean`, `inv_stddev` |
| - language IDs for `auto`, `zh`, `en`, `yue`, `ja`, `ko`, `nospeech` |
| - text normalization IDs for `with_itn` and `without_itn` |
| - `vocab_size=25055` |
|
|
| ## Usage |
|
|
| Install sherpa-onnx following the official documentation for your platform: |
|
|
| ```bash |
| pip install sherpa-onnx |
| ``` |
|
|
| Example Python usage: |
|
|
| ```python |
| import sherpa_onnx |
| |
| recognizer = sherpa_onnx.OfflineRecognizer.from_sense_voice( |
| model="model.int8.onnx", |
| tokens="tokens.txt", |
| num_threads=4, |
| use_itn=True, |
| debug=False, |
| ) |
| ``` |
|
|
| Please adapt audio loading and resampling to your application. SenseVoice expects 16 kHz audio. |
|
|
| ## Reproduction |
|
|
| This artifact was generated with OpenASR Model Factory: |
|
|
| ```powershell |
| openasr-model-factory quantize-sensevoice ` |
| --input-dir downloads/FunAudioLLM/SenseVoiceSmall ` |
| --output-dir outputs/sensevoice-small-onnx |
| ``` |
|
|
| The export follows the sherpa-onnx SenseVoice layout: |
|
|
| - ONNX inputs: `x`, `x_length`, `language`, `text_norm` |
| - ONNX output: `logits` |
| - Dynamic INT8 quantization for `MatMul` weights with `QUInt8` |
|
|
| ## Limitations |
|
|
| - INT8 quantization may change recognition output compared with the original PyTorch model. |
| - Validate accuracy and latency in your target environment before production use. |
| - This artifact inherits upstream model limitations and license requirements. |
|
|