Voice Activity Detection
ONNX
speech-processing
semantic-vad
multilingual
smart-turn-v3 / benchmarks /smart-turn-v3.0.md
marcus-daily
Smart Turn v3.1
16c8130
# Endpointing Model Benchmark Report
**Model:** `/data/smart-turn-v3.0.onnx`
**Generated:** 2025-12-03 16:04:09 UTC
## Accuracy Results
**Total Samples:** 31,473
**Unique Languages:** ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic, ๐Ÿ‡ง๐Ÿ‡ฉ Bengali, ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ฏ๐Ÿ‡ต Japanese, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi, ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch, ๐Ÿ‡ณ๐Ÿ‡ด Norwegian, ๐Ÿ‡ต๐Ÿ‡ฑ Polish, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ท๐Ÿ‡บ Russian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡น๐Ÿ‡ท Turkish, ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian, ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese, ๐Ÿ‡จ๐Ÿ‡ณ Chinese
**Unique Datasets:** chirp3_1, chirp3_2, human_5, human_convcollector_1, liva_1, midcentury_1, mundo_1, orpheus_endfiller_1, orpheus_grammar_1, orpheus_midfiller_1, rime_2
### Overall Performance
| Metric | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|--------|--------------|--------------|---------------------|---------------------|
| Overall | 31,473 | 91.60 | 4.68 | 3.72 |
### Performance by Language
| Language | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|----------|--------------|--------------|---------------------|---------------------|
| ๐Ÿ‡น๐Ÿ‡ท Turkish | 966 | 97.10 | 1.66 | 1.24 |
| ๐Ÿ‡ฏ๐Ÿ‡ต Japanese | 834 | 96.88 | 1.92 | 1.20 |
| ๐Ÿ‡ฐ๐Ÿ‡ท Korean | 890 | 96.74 | 1.12 | 2.13 |
| ๐Ÿ‡ฉ๐Ÿ‡ช German | 1,322 | 96.22 | 2.42 | 1.36 |
| ๐Ÿ‡ซ๐Ÿ‡ท French | 1,253 | 96.17 | 1.52 | 2.31 |
| ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch | 1,401 | 96.15 | 2.00 | 1.86 |
| ๐Ÿ‡ต๐Ÿ‡น Portuguese | 1,398 | 95.42 | 2.79 | 1.79 |
| ๐Ÿ‡ฎ๐Ÿ‡น Italian | 782 | 94.88 | 3.07 | 2.05 |
| ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish | 1,010 | 94.85 | 3.17 | 1.98 |
| ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian | 971 | 94.54 | 4.12 | 1.34 |
| ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian | 929 | 94.51 | 2.80 | 2.69 |
| ๐Ÿ‡ต๐Ÿ‡ฑ Polish | 976 | 94.47 | 2.87 | 2.66 |
| ๐Ÿ‡ณ๐Ÿ‡ด Norwegian | 1,014 | 93.98 | 3.55 | 2.47 |
| ๐Ÿ‡ท๐Ÿ‡บ Russian | 1,470 | 93.54 | 3.33 | 3.13 |
| ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi | 1,295 | 93.36 | 4.40 | 2.24 |
| ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish | 779 | 93.07 | 4.88 | 2.05 |
| ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic | 947 | 88.60 | 6.97 | 4.44 |
| ๐Ÿ‡จ๐Ÿ‡ณ Chinese | 945 | 88.57 | 4.76 | 6.67 |
| ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English | 7,722 | 88.31 | 6.00 | 5.70 |
| ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi | 774 | 87.47 | 8.53 | 4.01 |
| ๐Ÿ‡ช๐Ÿ‡ธ Spanish | 1,791 | 86.71 | 4.69 | 8.60 |
| ๐Ÿ‡ง๐Ÿ‡ฉ Bengali | 1,000 | 84.10 | 10.90 | 5.00 |
| ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese | 1,004 | 81.57 | 14.94 | 3.49 |
### Performance by Dataset
| Dataset | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|---------|--------------|--------------|---------------------|---------------------|
| rime_2 | 396 | 99.75 | 0.00 | 0.25 |
| human_5 | 402 | 96.27 | 1.00 | 2.74 |
| chirp3_1 | 16,300 | 94.53 | 2.93 | 2.53 |
| orpheus_endfiller_1 | 182 | 94.51 | 0.00 | 5.49 |
| orpheus_grammar_1 | 163 | 92.64 | 3.68 | 3.68 |
| orpheus_midfiller_1 | 140 | 91.43 | 3.57 | 5.00 |
| human_convcollector_1 | 90 | 91.11 | 3.33 | 5.56 |
| chirp3_2 | 8,428 | 90.27 | 6.68 | 3.05 |
| midcentury_1 | 1,044 | 85.44 | 11.78 | 2.78 |
| liva_1 | 3,832 | 84.68 | 6.92 | 8.40 |
| mundo_1 | 496 | 72.78 | 5.24 | 21.98 |