license: bsd-3-clause
pipeline_tag: automatic-speech-recognition
whisper.axera
OpenAI Whisper on Axera Platform
Overview
This project provides an optimized implementation of OpenAI's Whisper speech recognition model for Axera AI processors (AX650N/AX630C). It supports both C++ and Python interfaces for efficient on-device speech-to-text conversion.
Features
- Dual Language Support: Both C++ and Python APIs available
- Multiple Model Sizes: Support for tiny, base, small, and turbo model variants
- Multi-language Recognition: Tested with English, Chinese, Japanese, and Korean
- Optimized Performance: Specially optimized for Axera NPU acceleration
- Easy Deployment: Pre-built packages and cross-compilation support
Update
- 2026/01/14: We provide cleaner model architecture now.(With encoder and decoder instead of decoder_main and decoder_loop). Support exporting models from huggingface.
Supported Platforms
- ✅ AX650N
- ✅ AX630C
Pre-trained Models
Download pre-compiled models from:
For custom model conversion, please refer to Model Conversion Guide.
Model Conversion
Currently supported model scales:
- tiny
- base
- small
- medium
- turbo
Tested languages:
- English
- Chinese
- Japanese
- Korean
- Malaysian
For other languages or custom model sizes, please refer to the Model Conversion Guide.
Deployment on Target Devices
Prerequisites
- AX650N/AX630C devices with Ubuntu 22.04 pre-installed
- Internet connection for
apt installandpip install - Verified hardware platforms:
Programming Language Support
Python
Tested with Python 3.12. We recommend using Miniconda for environment management.
Installation
cd python
pip3 install -r requirements.txt
pyaxenigne
Install NPU Python API from: https://github.com/AXERA-TECH/pyaxengine
Usage
Command Line Interface
cd python
(whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_cli.py -w ../demo.wav -t tiny
[INFO] Available providers: ['AxEngineExecutionProvider']
{'wav': '../demo.wav', 'model_type': 'tiny', 'model_path': '../models-ax650', 'language': 'zh', 'task': 'transcribe'}
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
ASR result:
擅职出现交易几乎停止的情况
RTF: 0.10313174677896837
Command line arguments:
| Argument | Description | Default |
|---|---|---|
| --wav | Input audio file | - |
| --model_type/-t | Model type: tiny/base/small | - |
| --model_path/-p | Model directory | ../models |
| --language/-l | Recognition language | zh |
Server Mode
(whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_svr.py
[INFO] Available providers: ['AxEngineExecutionProvider']
Server started at http://0.0.0.0:8000
Test the server:
python test_svr.py
CPP
Usage on Target Device
cd cpp/ax650
./whisper_cli -w ../demo.wav -t tiny
或
cd cpp/ax650
./whisper_cli --model_type small -w ../demo.wav
Example Output:
(whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_cli -w ../../demo.wav -t tiny
wav_file: ../../demo.wav
model_path: ../../models-ax650
model_type: tiny
language: zh
Init whisper success, take 0.3540seconds
Result: 甚至出现交易几乎停止的情况
RTF: 0.0968
Server Mode
cd cpp/ax650
(whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_svr -t tiny
port: 8080
model_path: ../../models-ax650
model_type: tiny
language: zh
[I][ main][ 60]: Initializing server...
[I][ main][ 65]: Init server success
[I][ start][ 32]: Start server at port 8080, POST binary stream to IP:8080/asr
Client test using curl:
ffmpeg -i demo.wav -f f32le -c:a pcm_f32le - 2>/dev/null | \
curl -X POST 10.126.33.192:8080/asr \
-H "Content-Type: application/octet-stream" \
--data-binary @-
Performance Benchmarks
Latency
RTF: Real-Time Factor
CPP:
| Models | AX650N | AX630C |
|---|---|---|
| Whisper-Tiny | 0.08 | |
| Whisper-Base | 0.11 | 0.35 |
| Whisper-Small | 0.24 | |
| Whisper-Turbo | 0.48 |
Python:
| Models | AX650N | AX630C |
|---|---|---|
| Whisper-Tiny | 0.12 | |
| Whisper-Base | 0.16 | 0.35 |
| Whisper-Small | 0.50 | |
| Whisper-Turbo | 0.60 |
Word Error Rate(Test on AIShell dataset)
| Models | AX650N | AX630C |
|---|---|---|
| Whisper-Tiny | 0.24 | |
| Whisper-Base | 0.18 | |
| Whisper-Small | 0.11 | |
| Whisper-Turbo | 0.06 |
To reproduce WER test results:
Download dataset:
cd model_convert
bash download_dataset.sh
Run test script:
cd python
conda activate whisper
python test_wer.py -d aishell --gt_path ../model_convert/datasets/ground_truth.txt --model_type tiny
MEM Usage
- CMM Stands for Physical memory used by Axera modules like VDEC(Video decoder), VENC(Video encoder), NPU, etc.
Python:
| Models | CMM(MB) | OS(MB) |
|---|---|---|
| Whisper-Tiny | 332 | 512 |
| Whisper-Base | 533 | 644 |
| Whisper-Small | 1106 | 906 |
| Whisper-Turbo | 2065 | 2084 |
C++:
| Models | CMM(MB) | OS(MB) |
|---|---|---|
| Whisper-Tiny | 332 | 31 |
| Whisper-Base | 533 | 54 |
| Whisper-Small | 1106 | 146 |
| Whisper-Turbo | 2065 | 86 |
Technical Discussion
- Github issues
- Tencent QQ Group: 139953715