SenseVoice (Sensevoice_Api)
Browse files- SenseVoice/Sensevoice_Api/.gitattributes +35 -0
- SenseVoice/Sensevoice_Api/Dockerfile +13 -0
- SenseVoice/Sensevoice_Api/README.md +56 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/am.mvn +8 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/chn_jpn_yue_eng_ko_spectok.bpe.model +3 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/config.yaml +97 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/.DS_Store +0 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/1 +0 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/en.mp3 +0 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/ja.mp3 +0 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/ko.mp3 +0 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/yue.mp3 +0 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/zh.mp3 +0 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/model.onnx +3 -0
- SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/model_quant.onnx +3 -0
- SenseVoice/Sensevoice_Api/main.py +120 -0
- SenseVoice/Sensevoice_Api/requirements.txt +8 -0
- SenseVoice/Sensevoice_Api/source.txt +1 -0
SenseVoice/Sensevoice_Api/.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
SenseVoice/Sensevoice_Api/Dockerfile
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.8-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
COPY requirements.txt .
|
| 6 |
+
COPY main.py .
|
| 7 |
+
COPY iic iic/
|
| 8 |
+
|
| 9 |
+
RUN pip install --upgrade pip
|
| 10 |
+
RUN pip install torch>=1.13 torchaudio --index-url https://download.pytorch.org/whl/cpu
|
| 11 |
+
RUN pip install -r requirements.txt
|
| 12 |
+
|
| 13 |
+
CMD ["python", "main.py"]
|
SenseVoice/Sensevoice_Api/README.md
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SenseVoice-Api
|
| 2 |
+
此项目是基于SenseVoice的funasr_onnx版本进行的api发布, 使用Python3.10.14开发
|
| 3 |
+
|
| 4 |
+
# SenseVoice
|
| 5 |
+
SenseVoice是具有音频理解能力的音频基础模型,包括语音识别(ASR)、语种识别(LID)、语音情感识别(SER)和声学事件分类(AEC)或声学事件检测(AED)。本项目提供SenseVoice模型的介绍以及在多个任务测试集上的benchmark,以及体验模型所需的环境安装的与推理方式。
|
| 6 |
+
|
| 7 |
+
<a name="核心功能"></a>
|
| 8 |
+
# 核心功能 🎯
|
| 9 |
+
**SenseVoice**专注于高精度多语言语音识别、情感辨识和音频事件检测
|
| 10 |
+
- **多语言识别:** 采用超过40万小时数据训练,支持超过50种语言,识别效果上优于Whisper模型。
|
| 11 |
+
- **富文本识别:**
|
| 12 |
+
- 具备优秀的情感识别,能够在测试数据上达到和超过目前最佳情感识别模型的效果。
|
| 13 |
+
- 支持声音事件检测能力,支持音乐、掌声、笑声、哭声、咳嗽、喷嚏等多种常见人机交互事件进行检测。
|
| 14 |
+
- **高效推理:** SenseVoice-Small模型采用非自回归端到端框架,推理延迟极低,10s音频推理仅耗时70ms,15倍优于Whisper-Large。
|
| 15 |
+
- **微调定制:** 具备便捷的微调脚本与策略,方便用户根据业务场景修复长尾样本问题。
|
| 16 |
+
- **服务部署:** 具有完整的服务部署链路,支持多并发请求,支持客户端语言有,python、c++、html、java与c#等。
|
| 17 |
+
|
| 18 |
+
### Docker部署(CPU+量化模型)
|
| 19 |
+
```
|
| 20 |
+
# 国内
|
| 21 |
+
docker pull registry.cn-hangzhou.aliyuncs.com/yiminger/sensevoice:latest
|
| 22 |
+
docker run -p 8000:8000 registry.cn-hangzhou.aliyuncs.com/yiminger/sensevoice:latest
|
| 23 |
+
|
| 24 |
+
# Docker hub
|
| 25 |
+
docker pull yiminger/sensevoice:latest
|
| 26 |
+
# 运行
|
| 27 |
+
docker run -p 8000:8000 yiminger/sensevoice:latest
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
### 本地安装
|
| 31 |
+
```
|
| 32 |
+
git clone https://github.com/HG-ha/SenseVoice-Api.git && cd SenseVoice-Api
|
| 33 |
+
# 安装依赖
|
| 34 |
+
pip install -r requirements.txt
|
| 35 |
+
# 运行
|
| 36 |
+
python main.py
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
### 接口测试
|
| 40 |
+
1. 从URL转文字
|
| 41 |
+
```
|
| 42 |
+
curl --location --request POST 'http://127.0.0.1:8000/extract_text' \
|
| 43 |
+
--form 'url=https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
|
| 44 |
+
```
|
| 45 |
+
2. 从文件转文字
|
| 46 |
+
```
|
| 47 |
+
curl --request POST \
|
| 48 |
+
--url http://127.0.0.1:8000/extract_text \
|
| 49 |
+
--header 'content-type: multipart/form-data' \
|
| 50 |
+
--form 'file=@asr_example_zh.wav'
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
### 接口文档
|
| 54 |
+
```
|
| 55 |
+
http://127.0.0.1:8000/docs
|
| 56 |
+
```
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/am.mvn
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<Nnet>
|
| 2 |
+
<Splice> 560 560
|
| 3 |
+
[ 0 ]
|
| 4 |
+
<AddShift> 560 560
|
| 5 |
+
<LearnRateCoef> 0 [ -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 ]
|
| 6 |
+
<Rescale> 560 560
|
| 7 |
+
<LearnRateCoef> 0 [ 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 ]
|
| 8 |
+
</Nnet>
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/chn_jpn_yue_eng_ko_spectok.bpe.model
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:aa87f86064c3730d799ddf7af3c04659151102cba548bce325cf06ba4da4e6a8
|
| 3 |
+
size 377341
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/config.yaml
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
encoder: SenseVoiceEncoderSmall
|
| 2 |
+
encoder_conf:
|
| 3 |
+
output_size: 512
|
| 4 |
+
attention_heads: 4
|
| 5 |
+
linear_units: 2048
|
| 6 |
+
num_blocks: 50
|
| 7 |
+
tp_blocks: 20
|
| 8 |
+
dropout_rate: 0.1
|
| 9 |
+
positional_dropout_rate: 0.1
|
| 10 |
+
attention_dropout_rate: 0.1
|
| 11 |
+
input_layer: pe
|
| 12 |
+
pos_enc_class: SinusoidalPositionEncoder
|
| 13 |
+
normalize_before: true
|
| 14 |
+
kernel_size: 11
|
| 15 |
+
sanm_shfit: 0
|
| 16 |
+
selfattention_layer_type: sanm
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
model: SenseVoiceSmall
|
| 20 |
+
model_conf:
|
| 21 |
+
length_normalized_loss: true
|
| 22 |
+
sos: 1
|
| 23 |
+
eos: 2
|
| 24 |
+
ignore_id: -1
|
| 25 |
+
|
| 26 |
+
tokenizer: SentencepiecesTokenizer
|
| 27 |
+
tokenizer_conf:
|
| 28 |
+
bpemodel: null
|
| 29 |
+
unk_symbol: <unk>
|
| 30 |
+
split_with_space: true
|
| 31 |
+
|
| 32 |
+
frontend: WavFrontend
|
| 33 |
+
frontend_conf:
|
| 34 |
+
fs: 16000
|
| 35 |
+
window: hamming
|
| 36 |
+
n_mels: 80
|
| 37 |
+
frame_length: 25
|
| 38 |
+
frame_shift: 10
|
| 39 |
+
lfr_m: 7
|
| 40 |
+
lfr_n: 6
|
| 41 |
+
cmvn_file: null
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
dataset: SenseVoiceCTCDataset
|
| 45 |
+
dataset_conf:
|
| 46 |
+
index_ds: IndexDSJsonl
|
| 47 |
+
batch_sampler: EspnetStyleBatchSampler
|
| 48 |
+
data_split_num: 32
|
| 49 |
+
batch_type: token
|
| 50 |
+
batch_size: 14000
|
| 51 |
+
max_token_length: 2000
|
| 52 |
+
min_token_length: 60
|
| 53 |
+
max_source_length: 2000
|
| 54 |
+
min_source_length: 60
|
| 55 |
+
max_target_length: 200
|
| 56 |
+
min_target_length: 0
|
| 57 |
+
shuffle: true
|
| 58 |
+
num_workers: 4
|
| 59 |
+
sos: ${model_conf.sos}
|
| 60 |
+
eos: ${model_conf.eos}
|
| 61 |
+
IndexDSJsonl: IndexDSJsonl
|
| 62 |
+
retry: 20
|
| 63 |
+
|
| 64 |
+
train_conf:
|
| 65 |
+
accum_grad: 1
|
| 66 |
+
grad_clip: 5
|
| 67 |
+
max_epoch: 20
|
| 68 |
+
keep_nbest_models: 10
|
| 69 |
+
avg_nbest_model: 10
|
| 70 |
+
log_interval: 100
|
| 71 |
+
resume: true
|
| 72 |
+
validate_interval: 10000
|
| 73 |
+
save_checkpoint_interval: 10000
|
| 74 |
+
|
| 75 |
+
optim: adamw
|
| 76 |
+
optim_conf:
|
| 77 |
+
lr: 0.00002
|
| 78 |
+
scheduler: warmuplr
|
| 79 |
+
scheduler_conf:
|
| 80 |
+
warmup_steps: 25000
|
| 81 |
+
|
| 82 |
+
specaug: SpecAugLFR
|
| 83 |
+
specaug_conf:
|
| 84 |
+
apply_time_warp: false
|
| 85 |
+
time_warp_window: 5
|
| 86 |
+
time_warp_mode: bicubic
|
| 87 |
+
apply_freq_mask: true
|
| 88 |
+
freq_mask_width_range:
|
| 89 |
+
- 0
|
| 90 |
+
- 30
|
| 91 |
+
lfr_rate: 6
|
| 92 |
+
num_freq_mask: 1
|
| 93 |
+
apply_time_mask: true
|
| 94 |
+
time_mask_width_range:
|
| 95 |
+
- 0
|
| 96 |
+
- 12
|
| 97 |
+
num_time_mask: 1
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/.DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/1
ADDED
|
File without changes
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/en.mp3
ADDED
|
Binary file (57.4 kB). View file
|
|
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/ja.mp3
ADDED
|
Binary file (57.8 kB). View file
|
|
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/ko.mp3
ADDED
|
Binary file (27.9 kB). View file
|
|
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/yue.mp3
ADDED
|
Binary file (31.2 kB). View file
|
|
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/example/zh.mp3
ADDED
|
Binary file (45 kB). View file
|
|
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/model.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ecaedf07c74e48ee3481204453f03f7d8a383ee226d136dece265e94457af660
|
| 3 |
+
size 937615371
|
SenseVoice/Sensevoice_Api/iic/SenseVoiceSmall/model_quant.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b466b19006784340a9f09af96f37778363ccc50917db02d4dc10ca260d73434c
|
| 3 |
+
size 241217542
|
SenseVoice/Sensevoice_Api/main.py
ADDED
|
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# -*- coding: utf-8 -*-
|
| 2 |
+
"""
|
| 3 |
+
Author: 一铭
|
| 4 |
+
Date : 2024-08-28
|
| 5 |
+
|
| 6 |
+
Github: https://github.com/HG-ha
|
| 7 |
+
Home : https://api2.wer.plus
|
| 8 |
+
|
| 9 |
+
Description:
|
| 10 |
+
From ali dharma school project: https://github.com/FunAudioLLM/SenseVoice
|
| 11 |
+
|
| 12 |
+
This program is distributed using ONNX-encapsulated fastapi,Provides an interface for reading audio from a network or file and predicting content.
|
| 13 |
+
|
| 14 |
+
If you need to use cuda, you need to install the OnnxRun-time gpu, not the onnxruntime.
|
| 15 |
+
"""
|
| 16 |
+
|
| 17 |
+
import librosa
|
| 18 |
+
import numpy as np
|
| 19 |
+
import aiohttp
|
| 20 |
+
from fastapi import FastAPI, Form, UploadFile, HTTPException
|
| 21 |
+
from pydantic import HttpUrl, ValidationError, BaseModel, Field
|
| 22 |
+
from typing import List, Union
|
| 23 |
+
from funasr_onnx import SenseVoiceSmall
|
| 24 |
+
from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
|
| 25 |
+
from io import BytesIO
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
class ApiResponse(BaseModel):
|
| 29 |
+
message: str = Field(..., description="Status message indicating the success of the operation.")
|
| 30 |
+
results: str = Field(..., description="Remove label output")
|
| 31 |
+
label_result: str = Field(..., description="Default output")
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
app = FastAPI()
|
| 35 |
+
|
| 36 |
+
async def from_url_load_audio(audio: HttpUrl) -> np.array:
|
| 37 |
+
async with aiohttp.ClientSession() as session:
|
| 38 |
+
async with session.get(
|
| 39 |
+
audio,
|
| 40 |
+
headers={
|
| 41 |
+
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0"
|
| 42 |
+
},
|
| 43 |
+
) as response:
|
| 44 |
+
if response.status != 200:
|
| 45 |
+
raise HTTPException(
|
| 46 |
+
status_code=400,
|
| 47 |
+
detail=f"Failed to download image: {response.status}",
|
| 48 |
+
)
|
| 49 |
+
image_bytes = await response.read()
|
| 50 |
+
return BytesIO(image_bytes)
|
| 51 |
+
|
| 52 |
+
@app.post("/extract_text",response_model=ApiResponse)
|
| 53 |
+
async def upload_url(url: Union[HttpUrl, None] = Form(None), file: Union[UploadFile, None] = Form(None)):
|
| 54 |
+
if file:
|
| 55 |
+
audio = BytesIO(await file.read())
|
| 56 |
+
elif url:
|
| 57 |
+
try:
|
| 58 |
+
audio = await from_url_load_audio(str(url))
|
| 59 |
+
except Exception as e:
|
| 60 |
+
raise HTTPException(status_code=500, detail=str(e))
|
| 61 |
+
|
| 62 |
+
else:
|
| 63 |
+
return HTTPException(400,{"error": "No valid audio source provided."})
|
| 64 |
+
try:
|
| 65 |
+
res = model(audio, language=language, use_itn=True)
|
| 66 |
+
return {
|
| 67 |
+
"message": "input processed successfully",
|
| 68 |
+
"results": rich_transcription_postprocess(res[0]),
|
| 69 |
+
"label_result": res[0]
|
| 70 |
+
}
|
| 71 |
+
except ValidationError as e:
|
| 72 |
+
raise HTTPException(status_code=400, detail=e.errors())
|
| 73 |
+
except Exception as e:
|
| 74 |
+
raise HTTPException(status_code=500, detail=str(e))
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
if __name__ == "__main__":
|
| 78 |
+
|
| 79 |
+
model_dir = "iic/SenseVoiceSmall"
|
| 80 |
+
device_id = 0 # Use GPU 0, automatically use CPU when not available
|
| 81 |
+
batch_size = 16
|
| 82 |
+
language = "auto"
|
| 83 |
+
quantize = True # Quantization model, small size, fast speed, accuracy may be insufficient: model_quant.onnx
|
| 84 |
+
# quantize = False # Standard model: model.onnx
|
| 85 |
+
|
| 86 |
+
# Override built-in load_data method to fix np.ndarray type accuracy bug
|
| 87 |
+
# cannot pass the librosa.load object directly, which would make the accuracy of other languages extremely poor
|
| 88 |
+
# No specific reason
|
| 89 |
+
def load_data(self, wav_content: Union[str, np.ndarray, List[str], BytesIO], fs: int = None) -> List:
|
| 90 |
+
def load_wav(path: str) -> np.ndarray:
|
| 91 |
+
waveform, _ = librosa.load(path, sr=fs)
|
| 92 |
+
return waveform
|
| 93 |
+
|
| 94 |
+
if isinstance(wav_content, np.ndarray):
|
| 95 |
+
return [wav_content]
|
| 96 |
+
|
| 97 |
+
if isinstance(wav_content, str):
|
| 98 |
+
return [load_wav(wav_content)]
|
| 99 |
+
|
| 100 |
+
if isinstance(wav_content, list):
|
| 101 |
+
return [load_wav(path) for path in wav_content]
|
| 102 |
+
|
| 103 |
+
if isinstance(wav_content, BytesIO):
|
| 104 |
+
return [load_wav(wav_content)]
|
| 105 |
+
|
| 106 |
+
raise TypeError(f"The type of {wav_content} is not in [str, np.ndarray, list]")
|
| 107 |
+
|
| 108 |
+
SenseVoiceSmall.load_data = load_data
|
| 109 |
+
|
| 110 |
+
model = SenseVoiceSmall(
|
| 111 |
+
model_dir,
|
| 112 |
+
quantize=quantize,
|
| 113 |
+
device_id=device_id,
|
| 114 |
+
batch_size=batch_size
|
| 115 |
+
)
|
| 116 |
+
|
| 117 |
+
print("\n\nDocs: http://127.0.0.1:8000/docs\n")
|
| 118 |
+
import uvicorn
|
| 119 |
+
|
| 120 |
+
uvicorn.run(app, host="0.0.0.0", port=8000)
|
SenseVoice/Sensevoice_Api/requirements.txt
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
funasr_onnx==0.4.1
|
| 2 |
+
fastapi==0.112.2
|
| 3 |
+
numpy==1.24.4
|
| 4 |
+
uvicorn==0.30.6
|
| 5 |
+
librosa==0.10.2
|
| 6 |
+
aiohttp==3.10.5
|
| 7 |
+
python-multipart==0.0.9
|
| 8 |
+
jieba==0.42.1
|
SenseVoice/Sensevoice_Api/source.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
https://huggingface.co/mingl/Sensevoice_Api
|