--- license: bsd-3-clause pipeline_tag: automatic-speech-recognition --- # whisper.axera - [English](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README_EN.md) - [中文](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README.md) OpenAI Whisper on Axera Platform ## Overview This project provides an optimized implementation of OpenAI's Whisper speech recognition model for Axera AI processors (AX650N/AX630C). It supports both C++ and Python interfaces for efficient on-device speech-to-text conversion. ## Features - **Dual Language Support**: Both C++ and Python APIs available - **Multiple Model Sizes**: Support for tiny, base, small, and turbo model variants - **Multi-language Recognition**: Tested with English, Chinese, Japanese, and Korean - **Optimized Performance**: Specially optimized for Axera NPU acceleration - **Easy Deployment**: Pre-built packages and cross-compilation support ## Update - 2026/01/14: We provide cleaner model architecture now.(With encoder and decoder instead of decoder_main and decoder_loop). Support exporting models from huggingface. ## Supported Platforms - ✅ AX650N - ✅ AX630C ## Pre-trained Models Download pre-compiled models from: - [Baidu Cloud](https://pan.baidu.com/s/1tOHVMZCin0A68T5HmKRJyg?pwd=axyz) - [Huggingface](https://huggingface.co/AXERA-TECH/Whisper) For custom model conversion, please refer to [Model Conversion Guide](./model_convert/README_EN.md). ## Model Conversion Currently supported model scales: - tiny - base - small - medium - turbo Tested languages: - English - Chinese - Japanese - Korean - Malaysian For other languages or custom model sizes, please refer to the [Model Conversion Guide](./model_convert/README_EN.md). ## Deployment on Target Devices ### Prerequisites - AX650N/AX630C devices with Ubuntu 22.04 pre-installed - Internet connection for `apt install` and `pip install` - Verified hardware platforms: - [MaixIV M4nDock (AX650N)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html) - [M.2 Accelerator Card (AX650N)](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html) - [Axera Pi 2 (AX630C)](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html) - [Module-LLM (AX630C)](https://docs.m5stack.com/zh_CN/module/Module-LLM) - [LLM630 Compute Kit (AX630C)](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit) ## Programming Language Support ### Python Tested with Python 3.12. We recommend using [Miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh) for environment management. #### Installation ```bash cd python pip3 install -r requirements.txt ``` #### pyaxenigne Install NPU Python API from: https://github.com/AXERA-TECH/pyaxengine #### Usage ##### Command Line Interface ``` cd python (whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_cli.py -w ../demo.wav -t tiny [INFO] Available providers: ['AxEngineExecutionProvider'] {'wav': '../demo.wav', 'model_type': 'tiny', 'model_path': '../models-ax650', 'language': 'zh', 'task': 'transcribe'} [INFO] Using provider: AxEngineExecutionProvider [INFO] Chip type: ChipType.MC50 [INFO] VNPU type: VNPUType.DISABLED [INFO] Engine version: 2.12.0s [INFO] Model type: 2 (triple core) [INFO] Compiler version: 5.0 76f70fdc [INFO] Using provider: AxEngineExecutionProvider [INFO] Model type: 2 (triple core) [INFO] Compiler version: 5.0 76f70fdc ASR result: 擅职出现交易几乎停止的情况 RTF: 0.10313174677896837 ``` Command line arguments: | Argument | Description | Default | | --- | --- | --- | | --wav | Input audio file | - | | --model_type/-t | Model type: tiny/base/small | - | | --model_path/-p | Model directory | ../models | | --language/-l | Recognition language | zh | ##### Server Mode ``` (whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_svr.py [INFO] Available providers: ['AxEngineExecutionProvider'] Server started at http://0.0.0.0:8000 ``` Test the server: ``` python test_svr.py ```