Voice Activity Detection
ONNX
speech-processing
semantic-vad
multilingual
smart-turn-v3 / benchmarks /smart-turn-v3.1-gpu.md
marcus-daily
Smart Turn v3.2
f766f81
# Endpointing Model Benchmark Report
**Model:** `/data/smart-turn-v3.1-gpu.onnx`
**Generated:** 2026-01-07 17:45:59 UTC
## Accuracy Results
**Total Samples:** 31,527
**Unique Languages:** ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic, ๐Ÿ‡ง๐Ÿ‡ฉ Bengali, ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ฏ๐Ÿ‡ต Japanese, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi, ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch, ๐Ÿ‡ณ๐Ÿ‡ด Norwegian, ๐Ÿ‡ต๐Ÿ‡ฑ Polish, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ท๐Ÿ‡บ Russian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡น๐Ÿ‡ท Turkish, ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian, ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese, ๐Ÿ‡จ๐Ÿ‡ณ Chinese
**Unique Datasets:** chirp3_1, chirp3_2, chirp3_3_short, human_5, human_convcollector_1, liva_1, midcentury_1, mundo_1, orpheus_endfiller_1, orpheus_grammar_1, orpheus_midfiller_1, rime_2
### Overall Performance
| Metric | Sample Count | Accuracy (%) | Precision | Recall | F1 | FPR (%) | FNR (%) |
| :------ | -----------: | -----------: | --------: | -----: | ----: | ------: | ------: |
| Overall | 31,527 | 91.64 | 0.894 | 0.944 | 0.918 | 5.55 | 2.81 |
### Performance by Language
| Language | Sample Count | Accuracy (%) | Precision | Recall | F1 | FPR (%) | FNR (%) |
| :------------ | -----------: | -----------: | --------: | -----: | ----: | ------: | ------: |
| ๐Ÿ‡ฏ๐Ÿ‡ต Japanese | 834 | 95.68 | 0.944 | 0.971 | 0.958 | 2.88 | 1.44 |
| ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch | 1,398 | 95.42 | 0.950 | 0.963 | 0.956 | 2.65 | 1.93 |
| ๐Ÿ‡น๐Ÿ‡ท Turkish | 966 | 95.34 | 0.935 | 0.973 | 0.954 | 3.31 | 1.35 |
| ๐Ÿ‡ซ๐Ÿ‡ท French | 1,252 | 95.29 | 0.950 | 0.958 | 0.954 | 2.56 | 2.16 |
| ๐Ÿ‡ฐ๐Ÿ‡ท Korean | 889 | 95.16 | 0.965 | 0.937 | 0.951 | 1.69 | 3.15 |
| ๐Ÿ‡ฉ๐Ÿ‡ช German | 1,322 | 95.16 | 0.936 | 0.970 | 0.952 | 3.33 | 1.51 |
| ๐Ÿ‡ต๐Ÿ‡น Portuguese | 1,398 | 94.85 | 0.942 | 0.953 | 0.947 | 2.86 | 2.29 |
| ๐Ÿ‡ฎ๐Ÿ‡น Italian | 782 | 94.50 | 0.922 | 0.972 | 0.946 | 4.09 | 1.41 |
| ๐Ÿ‡ต๐Ÿ‡ฑ Polish | 974 | 94.35 | 0.921 | 0.963 | 0.942 | 3.90 | 1.75 |
| ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian | 971 | 93.10 | 0.905 | 0.960 | 0.932 | 4.94 | 1.96 |
| ๐Ÿ‡ท๐Ÿ‡บ Russian | 1,468 | 92.64 | 0.911 | 0.953 | 0.932 | 4.90 | 2.45 |
| ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi | 1,284 | 92.52 | 0.919 | 0.939 | 0.929 | 4.28 | 3.19 |
| ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian | 929 | 92.03 | 0.900 | 0.933 | 0.917 | 4.84 | 3.12 |
| ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English | 7,820 | 91.94 | 0.889 | 0.954 | 0.921 | 5.82 | 2.24 |
| ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish | 779 | 91.14 | 0.880 | 0.954 | 0.916 | 6.55 | 2.31 |
| ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish | 1,010 | 90.50 | 0.859 | 0.968 | 0.910 | 7.92 | 1.58 |
| ๐Ÿ‡ณ๐Ÿ‡ด Norwegian | 1,014 | 89.84 | 0.865 | 0.950 | 0.905 | 7.59 | 2.56 |
| ๐Ÿ‡ช๐Ÿ‡ธ Spanish | 1,783 | 89.62 | 0.871 | 0.924 | 0.897 | 6.67 | 3.70 |
| ๐Ÿ‡จ๐Ÿ‡ณ Chinese | 929 | 88.37 | 0.850 | 0.937 | 0.891 | 8.40 | 3.23 |
| ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic | 947 | 87.01 | 0.838 | 0.923 | 0.878 | 9.08 | 3.91 |
| ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi | 774 | 84.88 | 0.833 | 0.878 | 0.855 | 8.91 | 6.20 |
| ๐Ÿ‡ง๐Ÿ‡ฉ Bengali | 1,000 | 81.20 | 0.801 | 0.820 | 0.810 | 10.00 | 8.80 |
| ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese | 1,004 | 81.08 | 0.780 | 0.862 | 0.819 | 12.05 | 6.87 |
### Performance by Dataset
| Dataset | Sample Count | Accuracy (%) | Precision | Recall | F1 | FPR (%) | FNR (%) |
| :-------------------- | -----------: | -----------: | --------: | -----: | ----: | ------: | ------: |
| orpheus_endfiller_1 | 181 | 97.24 | 1.000 | 0.946 | 0.972 | 0.00 | 2.76 |
| rime_2 | 394 | 97.21 | 0.959 | 0.976 | 0.967 | 1.78 | 1.02 |
| human_5 | 402 | 95.02 | 0.939 | 0.949 | 0.944 | 2.74 | 2.24 |
| liva_1 | 3,831 | 94.23 | 0.929 | 0.959 | 0.944 | 3.68 | 2.09 |
| chirp3_1 | 16,254 | 93.53 | 0.919 | 0.955 | 0.937 | 4.22 | 2.25 |
| orpheus_grammar_1 | 163 | 89.57 | 0.878 | 0.929 | 0.903 | 6.75 | 3.68 |
| orpheus_midfiller_1 | 140 | 89.29 | 0.853 | 0.921 | 0.885 | 7.14 | 3.57 |
| chirp3_2 | 8,428 | 87.81 | 0.850 | 0.916 | 0.882 | 8.03 | 4.15 |
| chirp3_3_short | 104 | 85.58 | 0.867 | 0.812 | 0.839 | 5.77 | 8.65 |
| mundo_1 | 496 | 84.68 | 0.840 | 0.854 | 0.847 | 8.06 | 7.26 |
| human_convcollector_1 | 90 | 84.44 | 0.761 | 0.921 | 0.833 | 12.22 | 3.33 |
| midcentury_1 | 1,044 | 84.39 | 0.766 | 0.974 | 0.858 | 14.37 | 1.25 |