Update README.md
Browse files
README.md
CHANGED
|
@@ -2,7 +2,6 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
# **TEN VAD**
|
| 5 |
-
|
| 6 |
***A Low-Latency, Lightweight and High-Performance Streaming VAD***
|
| 7 |
|
| 8 |
|
|
@@ -18,7 +17,6 @@ license: apache-2.0
|
|
| 18 |
|
| 19 |
The precision-recall curves comparing the performance of WebRTC VAD (pitch-based), Silero VAD, and TEN VAD are shown below. The evaluation is conducted on the precisely manually annotated TEN-VAD-TestSet. The audio files are from librispeech, gigaspeech, DNS Challenge etc. As demonstrated, TEN VAD achieves the best performance. Additionally, cross-validation experiments conducted on large internal real-world datasets demonstrate the reproducibility of these findings. The **TEN-VAD-TestSet with annotated labels** is released in directory "TEN-VAD-TestSet" of this repository.
|
| 20 |
|
| 21 |
-
<br>
|
| 22 |
|
| 23 |
<div style="text-align:">
|
| 24 |
<img src="./images/PR_Curves_TEN-VAD-TestSet.png" width="800">
|
|
@@ -30,14 +28,14 @@ Note that the default threshold of 0.5 is used to generate binary speech indicat
|
|
| 30 |
cd ./examples
|
| 31 |
python plot_pr_curves.py
|
| 32 |
```
|
| 33 |
-
|
| 34 |
|
| 35 |
### **2. Agent-Friendly:**
|
| 36 |
As illustrated in the figure below, TEN VAD rapidly detects speech-to-non-speech transitions, whereas Silero VAD suffers from a delay of several hundred milliseconds, resulting in increased end-to-end latency in human-agent interaction systems. In addition, as demonstrated in the 6.5s-7.0s audio segment, Silero VAD fails to identify short silent durations between adjacent speech segments.
|
| 37 |
<div style="text-align:">
|
| 38 |
<img src="./images/Agent-Friendly-image.png" width="800">
|
| 39 |
</div>
|
| 40 |
-
|
| 41 |
|
| 42 |
### **3. Lightweight:**
|
| 43 |
We evaluated the RTF (Real-Time Factor) across five distinct platforms, each equipped with varying CPUs. TEN VAD demonstrates much lower computational complexity and smaller library size than Silero VAD.
|
|
@@ -113,24 +111,19 @@ We evaluated the RTF (Real-Time Factor) across five distinct platforms, each equ
|
|
| 113 |
padding: 8px;
|
| 114 |
}
|
| 115 |
</style>
|
| 116 |
-
<br>
|
| 117 |
|
| 118 |
### **4. Multiple programming languages and platforms:**
|
| 119 |
TEN VAD provides cross-platform C compatibility across five operating systems (Linux x64, Windows, macOS, Android, iOS), with Python bindings optimized for Linux x64.
|
| 120 |
-
<br>
|
| 121 |
-
<br>
|
| 122 |
|
| 123 |
|
| 124 |
### **5. Supproted sampling rate and hop size:**
|
| 125 |
TEN VAD operates on 16kHz audio input with configurable hop sizes (optimized frame configurations: 160/256 samples=10/16ms). Other sampling rates must be resampled to 16kHz.
|
| 126 |
-
|
| 127 |
-
<br>
|
| 128 |
|
| 129 |
## **Installation**
|
| 130 |
```
|
| 131 |
git clone https://huggingface.co/TEN-framework/ten-vad
|
| 132 |
```
|
| 133 |
-
<br>
|
| 134 |
|
| 135 |
## **Quick Start**
|
| 136 |
The project supports five major platforms with dynamic library linking.
|
|
@@ -180,7 +173,6 @@ The project supports five major platforms with dynamic library linking.
|
|
| 180 |
<td> 1. not simulator <br> 2. not iPad </td>
|
| 181 |
</tr>
|
| 182 |
</table>
|
| 183 |
-
<br>
|
| 184 |
|
| 185 |
|
| 186 |
### **Python Usage**
|
|
@@ -201,7 +193,6 @@ You can install the above mentioned dependencies via requirements.txt:
|
|
| 201 |
```
|
| 202 |
pip install -r requirements.txt
|
| 203 |
```
|
| 204 |
-
<br>
|
| 205 |
|
| 206 |
#### **Usage**
|
| 207 |
Note: For usage in python, you can either use it by **git clone** or **pip**.
|
|
@@ -222,7 +213,6 @@ cd ./examples
|
|
| 222 |
```
|
| 223 |
python test.py s0724-s0730.wav out.txt
|
| 224 |
```
|
| 225 |
-
<br>
|
| 226 |
|
| 227 |
##### **By using pip:**
|
| 228 |
|
|
@@ -237,7 +227,6 @@ pip install -U --force-reinstall -v git+https://github.com/TEN-framework/ten-vad
|
|
| 237 |
```
|
| 238 |
from ten_vad import TenVad
|
| 239 |
```
|
| 240 |
-
<br>
|
| 241 |
|
| 242 |
### **C Usage**
|
| 243 |
#### **Build Scripts**
|
|
@@ -267,7 +256,6 @@ Runtime library path configuration:
|
|
| 267 |
- Run demo with sample audio s0724-s0730.wav
|
| 268 |
- Processed results saved to out.txt
|
| 269 |
|
| 270 |
-
<br>
|
| 271 |
|
| 272 |
The detailed usage methods of each platform are as follows <br>
|
| 273 |
|
|
@@ -282,7 +270,6 @@ The detailed usage methods of each platform are as follows <br>
|
|
| 282 |
1) cd ./examples
|
| 283 |
2) ./build-and-deploy-linux.sh
|
| 284 |
```
|
| 285 |
-
<br>
|
| 286 |
|
| 287 |
#### **2. Windows**
|
| 288 |
##### **Requirements**
|
|
@@ -298,7 +285,6 @@ The detailed usage methods of each platform are as follows <br>
|
|
| 298 |
- Visual Studio version (default: 2019)
|
| 299 |
3) ./build-and-deploy-windows.bat
|
| 300 |
```
|
| 301 |
-
<br>
|
| 302 |
|
| 303 |
#### **3. macOS**
|
| 304 |
##### **Requirements**
|
|
@@ -313,7 +299,6 @@ The detailed usage methods of each platform are as follows <br>
|
|
| 313 |
- Alternative: x86_64 (Intel)
|
| 314 |
3) ./build-and-deploy-mac.sh
|
| 315 |
```
|
| 316 |
-
<br>
|
| 317 |
|
| 318 |
#### **4. Android**
|
| 319 |
##### **Requirements**
|
|
@@ -330,7 +315,6 @@ The detailed usage methods of each platform are as follows <br>
|
|
| 330 |
- Toolchain: aarch64-linux-android-clang (default) or custom NDK toolchain
|
| 331 |
4) ./build-and-deploy-android.sh
|
| 332 |
```
|
| 333 |
-
<br>
|
| 334 |
|
| 335 |
#### **5. iOS**
|
| 336 |
##### **Requirements**
|
|
@@ -381,7 +365,6 @@ cd ./examples
|
|
| 381 |
- Specify your Certification
|
| 382 |
|
| 383 |
3.5. Build in Xcode and run demo on your device.
|
| 384 |
-
<br>
|
| 385 |
|
| 386 |
## **Citations**
|
| 387 |
```
|
|
@@ -396,7 +379,6 @@ cd ./examples
|
|
| 396 |
email = {developer@ten.ai}
|
| 397 |
}
|
| 398 |
```
|
| 399 |
-
<br>
|
| 400 |
|
| 401 |
## **License**
|
| 402 |
This project is Apache 2.0 licensed.
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
# **TEN VAD**
|
|
|
|
| 5 |
***A Low-Latency, Lightweight and High-Performance Streaming VAD***
|
| 6 |
|
| 7 |
|
|
|
|
| 17 |
|
| 18 |
The precision-recall curves comparing the performance of WebRTC VAD (pitch-based), Silero VAD, and TEN VAD are shown below. The evaluation is conducted on the precisely manually annotated TEN-VAD-TestSet. The audio files are from librispeech, gigaspeech, DNS Challenge etc. As demonstrated, TEN VAD achieves the best performance. Additionally, cross-validation experiments conducted on large internal real-world datasets demonstrate the reproducibility of these findings. The **TEN-VAD-TestSet with annotated labels** is released in directory "TEN-VAD-TestSet" of this repository.
|
| 19 |
|
|
|
|
| 20 |
|
| 21 |
<div style="text-align:">
|
| 22 |
<img src="./images/PR_Curves_TEN-VAD-TestSet.png" width="800">
|
|
|
|
| 28 |
cd ./examples
|
| 29 |
python plot_pr_curves.py
|
| 30 |
```
|
| 31 |
+
|
| 32 |
|
| 33 |
### **2. Agent-Friendly:**
|
| 34 |
As illustrated in the figure below, TEN VAD rapidly detects speech-to-non-speech transitions, whereas Silero VAD suffers from a delay of several hundred milliseconds, resulting in increased end-to-end latency in human-agent interaction systems. In addition, as demonstrated in the 6.5s-7.0s audio segment, Silero VAD fails to identify short silent durations between adjacent speech segments.
|
| 35 |
<div style="text-align:">
|
| 36 |
<img src="./images/Agent-Friendly-image.png" width="800">
|
| 37 |
</div>
|
| 38 |
+
|
| 39 |
|
| 40 |
### **3. Lightweight:**
|
| 41 |
We evaluated the RTF (Real-Time Factor) across five distinct platforms, each equipped with varying CPUs. TEN VAD demonstrates much lower computational complexity and smaller library size than Silero VAD.
|
|
|
|
| 111 |
padding: 8px;
|
| 112 |
}
|
| 113 |
</style>
|
|
|
|
| 114 |
|
| 115 |
### **4. Multiple programming languages and platforms:**
|
| 116 |
TEN VAD provides cross-platform C compatibility across five operating systems (Linux x64, Windows, macOS, Android, iOS), with Python bindings optimized for Linux x64.
|
|
|
|
|
|
|
| 117 |
|
| 118 |
|
| 119 |
### **5. Supproted sampling rate and hop size:**
|
| 120 |
TEN VAD operates on 16kHz audio input with configurable hop sizes (optimized frame configurations: 160/256 samples=10/16ms). Other sampling rates must be resampled to 16kHz.
|
| 121 |
+
|
|
|
|
| 122 |
|
| 123 |
## **Installation**
|
| 124 |
```
|
| 125 |
git clone https://huggingface.co/TEN-framework/ten-vad
|
| 126 |
```
|
|
|
|
| 127 |
|
| 128 |
## **Quick Start**
|
| 129 |
The project supports five major platforms with dynamic library linking.
|
|
|
|
| 173 |
<td> 1. not simulator <br> 2. not iPad </td>
|
| 174 |
</tr>
|
| 175 |
</table>
|
|
|
|
| 176 |
|
| 177 |
|
| 178 |
### **Python Usage**
|
|
|
|
| 193 |
```
|
| 194 |
pip install -r requirements.txt
|
| 195 |
```
|
|
|
|
| 196 |
|
| 197 |
#### **Usage**
|
| 198 |
Note: For usage in python, you can either use it by **git clone** or **pip**.
|
|
|
|
| 213 |
```
|
| 214 |
python test.py s0724-s0730.wav out.txt
|
| 215 |
```
|
|
|
|
| 216 |
|
| 217 |
##### **By using pip:**
|
| 218 |
|
|
|
|
| 227 |
```
|
| 228 |
from ten_vad import TenVad
|
| 229 |
```
|
|
|
|
| 230 |
|
| 231 |
### **C Usage**
|
| 232 |
#### **Build Scripts**
|
|
|
|
| 256 |
- Run demo with sample audio s0724-s0730.wav
|
| 257 |
- Processed results saved to out.txt
|
| 258 |
|
|
|
|
| 259 |
|
| 260 |
The detailed usage methods of each platform are as follows <br>
|
| 261 |
|
|
|
|
| 270 |
1) cd ./examples
|
| 271 |
2) ./build-and-deploy-linux.sh
|
| 272 |
```
|
|
|
|
| 273 |
|
| 274 |
#### **2. Windows**
|
| 275 |
##### **Requirements**
|
|
|
|
| 285 |
- Visual Studio version (default: 2019)
|
| 286 |
3) ./build-and-deploy-windows.bat
|
| 287 |
```
|
|
|
|
| 288 |
|
| 289 |
#### **3. macOS**
|
| 290 |
##### **Requirements**
|
|
|
|
| 299 |
- Alternative: x86_64 (Intel)
|
| 300 |
3) ./build-and-deploy-mac.sh
|
| 301 |
```
|
|
|
|
| 302 |
|
| 303 |
#### **4. Android**
|
| 304 |
##### **Requirements**
|
|
|
|
| 315 |
- Toolchain: aarch64-linux-android-clang (default) or custom NDK toolchain
|
| 316 |
4) ./build-and-deploy-android.sh
|
| 317 |
```
|
|
|
|
| 318 |
|
| 319 |
#### **5. iOS**
|
| 320 |
##### **Requirements**
|
|
|
|
| 365 |
- Specify your Certification
|
| 366 |
|
| 367 |
3.5. Build in Xcode and run demo on your device.
|
|
|
|
| 368 |
|
| 369 |
## **Citations**
|
| 370 |
```
|
|
|
|
| 379 |
email = {developer@ten.ai}
|
| 380 |
}
|
| 381 |
```
|
|
|
|
| 382 |
|
| 383 |
## **License**
|
| 384 |
This project is Apache 2.0 licensed.
|