SpeechBrain ECAPA-TDNN β€” ONNX + CoreML

Pre-converted speechbrain/lang-id-voxlingua107-ecapa model for spoken language identification. Supports 107 languages from audio.

Converted for use with Kesha Voice Kit β€” open-source voice toolkit.

Files

File Format Size Description
lang-id-ecapa.onnx ONNX ~760KB Model graph
lang-id-ecapa.onnx.data ONNX ~85MB Model weights (external data)
lang-id-ecapa.mlpackage.tar.gz CoreML ~40MB CoreML model archive (macOS)
labels.json JSON <1KB 107 ISO 639-1 language codes

Usage with Kesha Voice Kit

bun install -g @drakulavich/kesha-voice-kit
kesha install          # downloads this model automatically
kesha --json audio.ogg # transcribe + detect language

Usage with ONNX Runtime (Python)

import onnxruntime as ort
import numpy as np
import json

session = ort.InferenceSession("lang-id-ecapa.onnx")
with open("labels.json") as f:
    labels = json.load(f)

# Input: 16kHz mono float32 waveform
audio = np.random.randn(1, 160000).astype(np.float32)  # 10 seconds
result = session.run(None, {"waveform": audio})
probs = result[0][0]

top_idx = np.argmax(probs)
print(f"Language: {labels[top_idx]} (confidence: {probs[top_idx]:.4f})")

Usage with ONNX Runtime (Rust)

use ort::session::Session;

let session = Session::builder()?.commit_from_file("lang-id-ecapa.onnx")?;
// Input: "waveform" [1, samples] float32
// Output: "language_probs" [1, 107] float32

Model Details

  • Architecture: ECAPA-TDNN (originally for speaker recognition, adapted for language ID)
  • Training data: VoxLingua107 β€” 6628 hours of speech across 107 languages
  • Input: Raw waveform at 16kHz mono ([1, samples] float32)
  • Output: Language probabilities ([1, 107] float32, softmax applied)
  • Error rate: 6.7% on VoxLingua107 dev set

Supported Languages

ab, af, am, ar, as, az, ba, be, bg, bn, bo, br, ca, ceb, cs, cy, da, de, el, en, eo, es, et, eu, fa, fi, fo, fr, gl, gn, gu, ha, haw, he, hi, hr, ht, hu, hy, ia, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, war, yi, yo, zh

Conversion

Converted from PyTorch using torch.onnx.export (ONNX) and torch.export + coremltools (CoreML).

Conversion script: scripts/convert-lang-id-model.py

License

Apache 2.0 (same as the original SpeechBrain model)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for drakulavich/SpeechBrain-coreml

Quantized
(5)
this model