Whisper / README_EN.md

Update README_EN.md

c453403 verified about 1 month ago

6.85 kB

license: bsd-3-clause
pipeline_tag: automatic-speech-recognition

whisper.axera

OpenAI Whisper on Axera Platform

Overview

This project provides an optimized implementation of OpenAI's Whisper speech recognition model for Axera AI processors (AX650N/AX630C). It supports both C++ and Python interfaces for efficient on-device speech-to-text conversion.

Features

Dual Language Support: Both C++ and Python APIs available
Multiple Model Sizes: Support for tiny, base, small, and turbo model variants
Multi-language Recognition: Tested with English, Chinese, Japanese, and Korean
Optimized Performance: Specially optimized for Axera NPU acceleration
Easy Deployment: Pre-built packages and cross-compilation support

Update

2026/01/14: We provide cleaner model architecture now.(With encoder and decoder instead of decoder_main and decoder_loop). Support exporting models from huggingface.

Supported Platforms

✅ AX650N
✅ AX630C

Pre-trained Models

Download pre-compiled models from:

For custom model conversion, please refer to Model Conversion Guide.

Model Conversion

Currently supported model scales:

tiny
base
small
medium
turbo

Tested languages:

English
Chinese
Japanese
Korean
Malaysian

For other languages or custom model sizes, please refer to the Model Conversion Guide.

Deployment on Target Devices

Prerequisites

AX650N/AX630C devices with Ubuntu 22.04 pre-installed
Internet connection for apt install and pip install
Verified hardware platforms:

Programming Language Support

Python

Tested with Python 3.12. We recommend using Miniconda for environment management.

Installation

cd python
pip3 install -r requirements.txt

pyaxenigne

Install NPU Python API from: https://github.com/AXERA-TECH/pyaxengine

Usage

Command Line Interface

cd python  
(whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_cli.py -w ../demo.wav -t tiny
[INFO] Available providers:  ['AxEngineExecutionProvider']
{'wav': '../demo.wav', 'model_type': 'tiny', 'model_path': '../models-ax650', 'language': 'zh', 'task': 'transcribe'}
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
ASR result:
擅职出现交易几乎停止的情况
RTF: 0.10313174677896837

Command line arguments:

Argument	Description	Default
--wav	Input audio file	-
--model_type/-t	Model type: tiny/base/small	-
--model_path/-p	Model directory	../models
--language/-l	Recognition language	zh

Server Mode

(whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_svr.py
[INFO] Available providers:  ['AxEngineExecutionProvider']
Server started at http://0.0.0.0:8000

Test the server:

python test_svr.py

CPP

Usage on Target Device

cd cpp/ax650
./whisper_cli -w ../demo.wav -t tiny

或

cd cpp/ax650
./whisper_cli --model_type small -w ../demo.wav

Example Output:

(whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_cli -w ../../demo.wav -t tiny
wav_file: ../../demo.wav
model_path: ../../models-ax650
model_type: tiny
language: zh
Init whisper success, take 0.3540seconds
Result: 甚至出现交易几乎停止的情况
RTF: 0.0968

Server Mode

cd cpp/ax650
(whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_svr -t tiny
port: 8080
model_path: ../../models-ax650
model_type: tiny
language: zh
[I][                            main][  60]: Initializing server...
[I][                            main][  65]: Init server success
[I][                           start][  32]: Start server at port 8080, POST binary stream to IP:8080/asr

Client test using curl:

ffmpeg -i demo.wav -f f32le -c:a pcm_f32le - 2>/dev/null | \
curl -X POST 10.126.33.192:8080/asr \
  -H "Content-Type: application/octet-stream" \
  --data-binary @-

Performance Benchmarks

Latency

RTF: Real-Time Factor

CPP:

Models	AX650N	AX630C
Whisper-Tiny	0.08
Whisper-Base	0.11	0.35
Whisper-Small	0.24
Whisper-Turbo	0.48

Python:

Models	AX650N	AX630C
Whisper-Tiny	0.12
Whisper-Base	0.16	0.35
Whisper-Small	0.50
Whisper-Turbo	0.60

Word Error Rate(Test on AIShell dataset)

Models	AX650N	AX630C
Whisper-Tiny	0.24
Whisper-Base	0.18
Whisper-Small	0.11
Whisper-Turbo	0.06

To reproduce WER test results:

Download dataset:

cd model_convert
bash download_dataset.sh

Run test script:

cd python
conda activate whisper
python test_wer.py -d aishell --gt_path ../model_convert/datasets/ground_truth.txt --model_type tiny

MEM Usage

CMM Stands for Physical memory used by Axera modules like VDEC(Video decoder), VENC(Video encoder), NPU, etc.

Python:

Models	CMM(MB)	OS(MB)
Whisper-Tiny	332	512
Whisper-Base	533	644
Whisper-Small	1106	906
Whisper-Turbo	2065	2084

C++:

Models	CMM(MB)	OS(MB)
Whisper-Tiny	332	31
Whisper-Base	533	54
Whisper-Small	1106	146
Whisper-Turbo	2065	86

Technical Discussion

Github issues
Tencent QQ Group: 139953715