SpeechBrain ECAPA-TDNN β ONNX + CoreML
Pre-converted speechbrain/lang-id-voxlingua107-ecapa model for spoken language identification. Supports 107 languages from audio.
Converted for use with Kesha Voice Kit β open-source voice toolkit.
Files
| File | Format | Size | Description |
|---|---|---|---|
lang-id-ecapa.onnx |
ONNX | ~760KB | Model graph |
lang-id-ecapa.onnx.data |
ONNX | ~85MB | Model weights (external data) |
lang-id-ecapa.mlpackage.tar.gz |
CoreML | ~40MB | CoreML model archive (macOS) |
labels.json |
JSON | <1KB | 107 ISO 639-1 language codes |
Usage with Kesha Voice Kit
bun install -g @drakulavich/kesha-voice-kit
kesha install # downloads this model automatically
kesha --json audio.ogg # transcribe + detect language
Usage with ONNX Runtime (Python)
import onnxruntime as ort
import numpy as np
import json
session = ort.InferenceSession("lang-id-ecapa.onnx")
with open("labels.json") as f:
labels = json.load(f)
# Input: 16kHz mono float32 waveform
audio = np.random.randn(1, 160000).astype(np.float32) # 10 seconds
result = session.run(None, {"waveform": audio})
probs = result[0][0]
top_idx = np.argmax(probs)
print(f"Language: {labels[top_idx]} (confidence: {probs[top_idx]:.4f})")
Usage with ONNX Runtime (Rust)
use ort::session::Session;
let session = Session::builder()?.commit_from_file("lang-id-ecapa.onnx")?;
// Input: "waveform" [1, samples] float32
// Output: "language_probs" [1, 107] float32
Model Details
- Architecture: ECAPA-TDNN (originally for speaker recognition, adapted for language ID)
- Training data: VoxLingua107 β 6628 hours of speech across 107 languages
- Input: Raw waveform at 16kHz mono (
[1, samples]float32) - Output: Language probabilities (
[1, 107]float32, softmax applied) - Error rate: 6.7% on VoxLingua107 dev set
Supported Languages
ab, af, am, ar, as, az, ba, be, bg, bn, bo, br, ca, ceb, cs, cy, da, de, el, en, eo, es, et, eu, fa, fi, fo, fr, gl, gn, gu, ha, haw, he, hi, hr, ht, hu, hy, ia, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, war, yi, yo, zh
Conversion
Converted from PyTorch using torch.onnx.export (ONNX) and torch.export + coremltools (CoreML).
Conversion script: scripts/convert-lang-id-model.py
License
Apache 2.0 (same as the original SpeechBrain model)
Model tree for drakulavich/SpeechBrain-coreml
Base model
speechbrain/lang-id-voxlingua107-ecapa