File size: 7,624 Bytes
037c6a3 35b3bc8 ca02ffa 35b3bc8 ca02ffa 35b3bc8 ca02ffa |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 |
---
license: bsd-3-clause
pipeline_tag: automatic-speech-recognition
---
# Whisper
OpenAI Whisper on Axera
- 目前支持 C++ 和 Python 两种语言
- 预编译模型下载
- [Huggingface](https://huggingface.co/AXERA-TECH/Whisper)
- 如需自行转换请参考[模型转换](https://github.com/ml-inory/whisper.axera/blob/main/model_convert/README.md)
## 支持平台
- [x] AX650N
- [x] AX630C
## 模型转换
[模型转换](https://github.com/ml-inory/whisper.axera/blob/main/model_convert/README.md)
## 上板部署
- 基于 AX650N、AX630C 的设备已预装 Ubuntu22.04
- 链接互联网,确保设备能正常执行 `apt install`, `pip install` 等指令
- 已验证设备:
- [爱芯派Pro(AX650N)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
- [M.2 Accelerator card(AX650N)](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
- [爱芯派2(AX630C)](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
- [Module-LLM(AX630C)](https://docs.m5stack.com/zh_CN/module/Module-LLM)
- [LLM630 Compute Kit(AX630C)](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
- 支持编程语言:
- [Python](#Python)
- [C++](#CPP)
<h3 id="Python">Python</h3>
#### Requirements
推荐在板上安装Miniconda管理虚拟环境,安装方法如下:
```
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
source ~/miniconda3/bin/activate
conda init --all
```
安装Whisper依赖
```
cd python
conda create -n whisper python=3.12
conda activate whisper
pip3 install -r requirements.txt
```
#### 安装pyaxenigne
参考 https://github.com/AXERA-TECH/pyaxengine 安装 NPU Python API
在0.1.3rc2上测试通过,可通过
```
pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc2/axengine-0.1.3-py3-none-any.whl
```
安装,或把版本号更改为你想使用的版本
#### 运行
登陆开发板后
输入命令
```
cd python
conda activate whisper
python3 main.py --model_type small --model_path ../models-ax650 --wav ../demo.wav --language zh
```
输出结果
```
root@ax650:/mnt/qtang/whisper.axera/python# python3 main.py --wav ../demo.wav --model_type small --model_path ../models/ --language zh
[INFO] Available providers: ['AxEngineExecutionProvider']
wav: ../demo.wav
model_type: small
model_path: ../models/
language: zh
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.10.1s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.2-patch1 117f5fd4
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.2-patch1 117f5fd4
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.2-patch1 117f5fd4
Load models take 2322.563409805298ms
Preprocess wav take 6971.68493270874ms
Run encoder take 211.52877807617188ms
Run decoder_main take 79.00094985961914ms
First token: 17556
Run decoder_loop take 101.91774368286133ms
Iter 0 Token: 20844
Run decoder_loop take 60.30416488647461ms
Iter 1 Token: 7781
Run decoder_loop take 60.22000312805176ms
Iter 2 Token: 20204
Run decoder_loop take 60.23716926574707ms
Iter 3 Token: 28455
Run decoder_loop take 60.214996337890625ms
Iter 4 Token: 31962
Run decoder_loop take 60.17565727233887ms
Iter 5 Token: 6336
Run decoder_loop take 60.94002723693848ms
Iter 6 Token: 254
Run decoder_loop take 60.71639060974121ms
Iter 7 Token: 2930
Run decoder_loop take 60.225725173950195ms
Iter 8 Token: 236
Run decoder_loop take 60.167789459228516ms
Iter 9 Token: 36135
Run decoder_loop take 60.29987335205078ms
Iter 10 Token: 15868
Run decoder_loop take 61.163902282714844ms
Iter 11 Token: 252
Run decoder_loop take 60.273170471191406ms
Iter 12 Token: 1546
Run decoder_loop take 60.23144721984863ms
Iter 13 Token: 46514
Run decoder_loop take 60.31966209411621ms
Iter 14 Token: 50257
Result: 甚至出现交易几乎停滞的情况
```
运行参数说明:
| 参数名称 | 说明 | 默认值 |
| --- | --- | --- |
| --wav | 输入音频文件 | |
| --model_type/-t | 模型类型, tiny/base/small | |
| --model_path/-p | 模型所在目录 | ../models |
| --language/-l | 识别语言 | zh |
<h3 id="CPP">CPP</h3>
#### 运行
在 AX650N 设备上执行
```
cd cpp
./whisper -w ../demo.wav
```
或
```
cd cpp
./whisper --model_type small --model_path ../models -w ../demo.wav
```
输出结果
```
root@ax650:/mnt/qtang/whisper.axera/cpp# ./install/whisper --wav ../demo.wav --model_type small --model_path ../models/ --language zh
wav_file: ../demo.wav
model_path: ../models/
model_type: small
language: zh
Encoder run take 188.30 ms
First token: 17556 take 81.88ms
Next Token: 20844 take 29.64ms
Next Token: 7781 take 29.70ms
Next Token: 20204 take 29.64ms
Next Token: 28455 take 29.65ms
Next Token: 31962 take 29.61ms
Next Token: 6336 take 29.67ms
Next Token: 254 take 29.63ms
Next Token: 2930 take 29.61ms
Next Token: 236 take 29.56ms
Next Token: 36135 take 29.64ms
Next Token: 15868 take 29.71ms
Next Token: 252 take 29.51ms
Next Token: 1546 take 29.63ms
Next Token: 46514 take 29.51ms
Next Token: 50257 take 29.69ms
All take 801.13 ms
Result: 甚至出现交易几乎停滞的情况
```
### 服务端
```
cd cpp
./whisper_srv --model_type tiny --model_path ../models-ax650 --language zh --port 8080
```
### 客户端
curl命令行测试(请自行替换IP和端口):
```
ffmpeg -i demo.wav -f f32le -c:a pcm_f32le - 2>/dev/null | \
curl -X POST 10.126.33.192:8080/asr \
-H "Content-Type: application/octet-stream" \
--data-binary @-
```
## 模型性能
### Latency
RTF: Real-Time Factor
CPP:
| Models | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny | 0.08 | |
| Whisper-Base | 0.11 | 0.35 |
| Whisper-Small | 0.24 | |
| Whisper-Turbo | 0.48 | |
Python:
| Models | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny | 0.12 | |
| Whisper-Base | 0.16 | 0.35 |
| Whisper-Small | 0.50 | |
| Whisper-Turbo | 0.60 | |
### Word Error Rate(Test on AIShell dataset)
| Models | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny | 0.24 | |
| Whisper-Base | 0.18 | |
| Whisper-Small | 0.11 | |
| Whisper-Turbo | 0.06 | |
若要复现测试结果,请按照以下步骤:
解压数据集:
```
unzip datasets.zip
```
运行测试脚本:
```
cd python
conda activate whisper
python test_wer.py -d aishell --gt_path ../datasets/ground_truth.txt --model_type tiny
```
### MEM Usage
* CMM Stands for Physical memory used by Axera modules like VDEC(Video decoder), VENC(Video encoder), NPU, etc.
Python:
| Models | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny | 332 | 512 |
| Whisper-Base | 533 | 644 |
| Whisper-Small | 1106 | 906 |
| Whisper-Turbo | 2065 | 2084 |
C++:
| Models | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny | 332 | 31 |
| Whisper-Base | 533 | 54 |
| Whisper-Small | 1106 | 146 |
| Whisper-Turbo | 2065 | 86 |
## 技术讨论
- Github issues
- QQ 群: 139953715 |