Voice Activity Detection
ONNX
speech-processing
semantic-vad
multilingual
smart-turn-v3 / benchmarks /smart-turn-v3.0.md
marcus-daily
Smart Turn v3.1
16c8130

Endpointing Model Benchmark Report

Model: /data/smart-turn-v3.0.onnx

Generated: 2025-12-03 16:04:09 UTC

Accuracy Results

Total Samples: 31,473

Unique Languages: ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic, ๐Ÿ‡ง๐Ÿ‡ฉ Bengali, ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ฏ๐Ÿ‡ต Japanese, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi, ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch, ๐Ÿ‡ณ๐Ÿ‡ด Norwegian, ๐Ÿ‡ต๐Ÿ‡ฑ Polish, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ท๐Ÿ‡บ Russian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡น๐Ÿ‡ท Turkish, ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian, ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese, ๐Ÿ‡จ๐Ÿ‡ณ Chinese

Unique Datasets: chirp3_1, chirp3_2, human_5, human_convcollector_1, liva_1, midcentury_1, mundo_1, orpheus_endfiller_1, orpheus_grammar_1, orpheus_midfiller_1, rime_2

Overall Performance

Metric Sample Count Accuracy (%) False Positives (%) False Negatives (%)
Overall 31,473 91.60 4.68 3.72

Performance by Language

Language Sample Count Accuracy (%) False Positives (%) False Negatives (%)
๐Ÿ‡น๐Ÿ‡ท Turkish 966 97.10 1.66 1.24
๐Ÿ‡ฏ๐Ÿ‡ต Japanese 834 96.88 1.92 1.20
๐Ÿ‡ฐ๐Ÿ‡ท Korean 890 96.74 1.12 2.13
๐Ÿ‡ฉ๐Ÿ‡ช German 1,322 96.22 2.42 1.36
๐Ÿ‡ซ๐Ÿ‡ท French 1,253 96.17 1.52 2.31
๐Ÿ‡ณ๐Ÿ‡ฑ Dutch 1,401 96.15 2.00 1.86
๐Ÿ‡ต๐Ÿ‡น Portuguese 1,398 95.42 2.79 1.79
๐Ÿ‡ฎ๐Ÿ‡น Italian 782 94.88 3.07 2.05
๐Ÿ‡ซ๐Ÿ‡ฎ Finnish 1,010 94.85 3.17 1.98
๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian 971 94.54 4.12 1.34
๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian 929 94.51 2.80 2.69
๐Ÿ‡ต๐Ÿ‡ฑ Polish 976 94.47 2.87 2.66
๐Ÿ‡ณ๐Ÿ‡ด Norwegian 1,014 93.98 3.55 2.47
๐Ÿ‡ท๐Ÿ‡บ Russian 1,470 93.54 3.33 3.13
๐Ÿ‡ฎ๐Ÿ‡ณ Hindi 1,295 93.36 4.40 2.24
๐Ÿ‡ฉ๐Ÿ‡ฐ Danish 779 93.07 4.88 2.05
๐Ÿ‡ธ๐Ÿ‡ฆ Arabic 947 88.60 6.97 4.44
๐Ÿ‡จ๐Ÿ‡ณ Chinese 945 88.57 4.76 6.67
๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English 7,722 88.31 6.00 5.70
๐Ÿ‡ฎ๐Ÿ‡ณ Marathi 774 87.47 8.53 4.01
๐Ÿ‡ช๐Ÿ‡ธ Spanish 1,791 86.71 4.69 8.60
๐Ÿ‡ง๐Ÿ‡ฉ Bengali 1,000 84.10 10.90 5.00
๐Ÿ‡ป๐Ÿ‡ณ Vietnamese 1,004 81.57 14.94 3.49

Performance by Dataset

Dataset Sample Count Accuracy (%) False Positives (%) False Negatives (%)
rime_2 396 99.75 0.00 0.25
human_5 402 96.27 1.00 2.74
chirp3_1 16,300 94.53 2.93 2.53
orpheus_endfiller_1 182 94.51 0.00 5.49
orpheus_grammar_1 163 92.64 3.68 3.68
orpheus_midfiller_1 140 91.43 3.57 5.00
human_convcollector_1 90 91.11 3.33 5.56
chirp3_2 8,428 90.27 6.68 3.05
midcentury_1 1,044 85.44 11.78 2.78
liva_1 3,832 84.68 6.92 8.40
mundo_1 496 72.78 5.24 21.98