| | --- |
| | license: bsd-3-clause |
| | pipeline_tag: automatic-speech-recognition |
| | --- |
| | |
| | # whisper.axera |
| |
|
| | - [English](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README_EN.md) |
| | - [中文](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README.md) |
| |
|
| | OpenAI Whisper on Axera Platform |
| |
|
| | ## Overview |
| |
|
| | This project provides an optimized implementation of OpenAI's Whisper speech recognition model for Axera AI processors (AX650N/AX630C). It supports both C++ and Python interfaces for efficient on-device speech-to-text conversion. |
| |
|
| | ## Features |
| |
|
| | - **Dual Language Support**: Both C++ and Python APIs available |
| | - **Multiple Model Sizes**: Support for tiny, base, small, and turbo model variants |
| | - **Multi-language Recognition**: Tested with English, Chinese, Japanese, and Korean |
| | - **Optimized Performance**: Specially optimized for Axera NPU acceleration |
| | - **Easy Deployment**: Pre-built packages and cross-compilation support |
| |
|
| | ## Update |
| |
|
| | - 2026/01/14: We provide cleaner model architecture now.(With encoder and decoder instead of decoder_main and decoder_loop). Support exporting models from huggingface. |
| |
|
| | ## Supported Platforms |
| |
|
| | - ✅ AX650N |
| | - ✅ AX630C |
| |
|
| | ## Pre-trained Models |
| |
|
| | Download pre-compiled models from: |
| | - [Baidu Cloud](https://pan.baidu.com/s/1tOHVMZCin0A68T5HmKRJyg?pwd=axyz) |
| | - [Huggingface](https://huggingface.co/AXERA-TECH/Whisper) |
| |
|
| | For custom model conversion, please refer to [Model Conversion Guide](./model_convert/README_EN.md). |
| |
|
| | ## Model Conversion |
| |
|
| | Currently supported model scales: |
| | - tiny |
| | - base |
| | - small |
| | - medium |
| | - turbo |
| |
|
| | Tested languages: |
| | - English |
| | - Chinese |
| | - Japanese |
| | - Korean |
| | - Malaysian |
| |
|
| | For other languages or custom model sizes, please refer to the [Model Conversion Guide](./model_convert/README_EN.md). |
| |
|
| | ## Deployment on Target Devices |
| |
|
| | ### Prerequisites |
| | - AX650N/AX630C devices with Ubuntu 22.04 pre-installed |
| | - Internet connection for `apt install` and `pip install` |
| | - Verified hardware platforms: |
| | - [MaixIV M4nDock (AX650N)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html) |
| | - [M.2 Accelerator Card (AX650N)](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html) |
| | - [Axera Pi 2 (AX630C)](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html) |
| | - [Module-LLM (AX630C)](https://docs.m5stack.com/zh_CN/module/Module-LLM) |
| | - [LLM630 Compute Kit (AX630C)](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit) |
| |
|
| | ## Programming Language Support |
| |
|
| | ### Python |
| |
|
| | Tested with Python 3.12. We recommend using [Miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh) for environment management. |
| |
|
| | #### Installation |
| |
|
| | ```bash |
| | cd python |
| | pip3 install -r requirements.txt |
| | ``` |
| |
|
| | #### pyaxenigne |
| |
|
| | Install NPU Python API from: https://github.com/AXERA-TECH/pyaxengine |
| |
|
| | #### Usage |
| |
|
| | ##### Command Line Interface |
| |
|
| | ``` |
| | cd python |
| | (whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_cli.py -w ../demo.wav -t tiny |
| | [INFO] Available providers: ['AxEngineExecutionProvider'] |
| | {'wav': '../demo.wav', 'model_type': 'tiny', 'model_path': '../models-ax650', 'language': 'zh', 'task': 'transcribe'} |
| | [INFO] Using provider: AxEngineExecutionProvider |
| | [INFO] Chip type: ChipType.MC50 |
| | [INFO] VNPU type: VNPUType.DISABLED |
| | [INFO] Engine version: 2.12.0s |
| | [INFO] Model type: 2 (triple core) |
| | [INFO] Compiler version: 5.0 76f70fdc |
| | [INFO] Using provider: AxEngineExecutionProvider |
| | [INFO] Model type: 2 (triple core) |
| | [INFO] Compiler version: 5.0 76f70fdc |
| | ASR result: |
| | 擅职出现交易几乎停止的情况 |
| | RTF: 0.10313174677896837 |
| | |
| | ``` |
| |
|
| | Command line arguments: |
| | | Argument | Description | Default | |
| | | --- | --- | --- | |
| | | --wav | Input audio file | - | |
| | | --model_type/-t | Model type: tiny/base/small | - | |
| | | --model_path/-p | Model directory | ../models | |
| | | --language/-l | Recognition language | zh | |
| |
|
| |
|
| | ##### Server Mode |
| |
|
| | ``` |
| | (whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_svr.py |
| | [INFO] Available providers: ['AxEngineExecutionProvider'] |
| | Server started at http://0.0.0.0:8000 |
| | |
| | ``` |
| |
|
| | Test the server: |
| | ``` |
| | python test_svr.py |
| | ``` |
| |
|
| |
|
| | <h3 id="CPP">CPP</h3> |
| |
|
| | #### Usage on Target Device |
| | ``` |
| | cd cpp/ax650 |
| | ./whisper_cli -w ../demo.wav -t tiny |
| | ``` |
| |
|
| | 或 |
| |
|
| | ``` |
| | cd cpp/ax650 |
| | ./whisper_cli --model_type small -w ../demo.wav |
| | ``` |
| |
|
| | Example Output: |
| |
|
| | ``` |
| | (whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_cli -w ../../demo.wav -t tiny |
| | wav_file: ../../demo.wav |
| | model_path: ../../models-ax650 |
| | model_type: tiny |
| | language: zh |
| | Init whisper success, take 0.3540seconds |
| | Result: 甚至出现交易几乎停止的情况 |
| | RTF: 0.0968 |
| | |
| | ``` |
| |
|
| | ### Server Mode |
| |
|
| | ``` |
| | cd cpp/ax650 |
| | (whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_svr -t tiny |
| | port: 8080 |
| | model_path: ../../models-ax650 |
| | model_type: tiny |
| | language: zh |
| | [I][ main][ 60]: Initializing server... |
| | [I][ main][ 65]: Init server success |
| | [I][ start][ 32]: Start server at port 8080, POST binary stream to IP:8080/asr |
| | |
| | ``` |
| |
|
| | ### Client test using curl: |
| |
|
| | ``` |
| | ffmpeg -i demo.wav -f f32le -c:a pcm_f32le - 2>/dev/null | \ |
| | curl -X POST 10.126.33.192:8080/asr \ |
| | -H "Content-Type: application/octet-stream" \ |
| | --data-binary @- |
| | ``` |
| |
|
| | ## Performance Benchmarks |
| |
|
| | ### Latency |
| |
|
| | RTF: Real-Time Factor |
| |
|
| | CPP: |
| |
|
| | | Models | AX650N | AX630C | |
| | | ------------- | ------ | ------ | |
| | | Whisper-Tiny | 0.08 | | |
| | | Whisper-Base | 0.11 | 0.35 | |
| | | Whisper-Small | 0.24 | | |
| | | Whisper-Turbo | 0.48 | | |
| |
|
| | Python: |
| |
|
| | | Models | AX650N | AX630C | |
| | | ------------- | ------ | ------ | |
| | | Whisper-Tiny | 0.12 | | |
| | | Whisper-Base | 0.16 | 0.35 | |
| | | Whisper-Small | 0.50 | | |
| | | Whisper-Turbo | 0.60 | | |
| |
|
| | ### Word Error Rate(Test on AIShell dataset) |
| |
|
| | | Models | AX650N | AX630C | |
| | | ------------- | ------ | ------ | |
| | | Whisper-Tiny | 0.24 | | |
| | | Whisper-Base | 0.18 | | |
| | | Whisper-Small | 0.11 | | |
| | | Whisper-Turbo | 0.06 | | |
| |
|
| | To reproduce WER test results: |
| |
|
| | Download dataset: |
| | ``` |
| | cd model_convert |
| | bash download_dataset.sh |
| | ``` |
| |
|
| | Run test script: |
| | ``` |
| | cd python |
| | conda activate whisper |
| | python test_wer.py -d aishell --gt_path ../model_convert/datasets/ground_truth.txt --model_type tiny |
| | |
| | ``` |
| |
|
| | ### MEM Usage |
| |
|
| | * CMM Stands for Physical memory used by Axera modules like VDEC(Video decoder), VENC(Video encoder), NPU, etc. |
| |
|
| | Python: |
| |
|
| | | Models | CMM(MB)| OS(MB) | |
| | | ------------- | ------ | ------ | |
| | | Whisper-Tiny | 332 | 512 | |
| | | Whisper-Base | 533 | 644 | |
| | | Whisper-Small | 1106 | 906 | |
| | | Whisper-Turbo | 2065 | 2084 | |
| |
|
| | C++: |
| |
|
| | | Models | CMM(MB)| OS(MB) | |
| | | ------------- | ------ | ------ | |
| | | Whisper-Tiny | 332 | 31 | |
| | | Whisper-Base | 533 | 54 | |
| | | Whisper-Small | 1106 | 146 | |
| | | Whisper-Turbo | 2065 | 86 | |
| |
|
| |
|
| | ## Technical Discussion |
| |
|
| | - Github issues |
| | - Tencent QQ Group: 139953715 |
| |
|