SpeechBrain ECAPA-TDNN — ONNX + CoreML

Pre-converted speechbrain/lang-id-voxlingua107-ecapa model for spoken language identification. Supports 107 languages from audio.

Converted for use with Kesha Voice Kit — open-source voice toolkit.

Files

File	Format	Size	Description
`lang-id-ecapa.onnx`	ONNX	~760KB	Model graph
`lang-id-ecapa.onnx.data`	ONNX	~85MB	Model weights (external data)
`lang-id-ecapa.mlpackage.tar.gz`	CoreML	~40MB	CoreML model archive (macOS)
`labels.json`	JSON	<1KB	107 ISO 639-1 language codes

Usage with Kesha Voice Kit

bun install -g @drakulavich/kesha-voice-kit
kesha install          # downloads this model automatically
kesha --json audio.ogg # transcribe + detect language

Usage with ONNX Runtime (Python)

import onnxruntime as ort
import numpy as np
import json

session = ort.InferenceSession("lang-id-ecapa.onnx")
with open("labels.json") as f:
    labels = json.load(f)

# Input: 16kHz mono float32 waveform
audio = np.random.randn(1, 160000).astype(np.float32)  # 10 seconds
result = session.run(None, {"waveform": audio})
probs = result[0][0]

top_idx = np.argmax(probs)
print(f"Language: {labels[top_idx]} (confidence: {probs[top_idx]:.4f})")

Usage with ONNX Runtime (Rust)

use ort::session::Session;

let session = Session::builder()?.commit_from_file("lang-id-ecapa.onnx")?;
// Input: "waveform" [1, samples] float32
// Output: "language_probs" [1, 107] float32

Model Details

Architecture: ECAPA-TDNN (originally for speaker recognition, adapted for language ID)
Training data: VoxLingua107 — 6628 hours of speech across 107 languages
Input: Raw waveform at 16kHz mono ([1, samples] float32)
Output: Language probabilities ([1, 107] float32, softmax applied)
Error rate: 6.7% on VoxLingua107 dev set

Supported Languages

ab, af, am, ar, as, az, ba, be, bg, bn, bo, br, ca, ceb, cs, cy, da, de, el, en, eo, es, et, eu, fa, fi, fo, fr, gl, gn, gu, ha, haw, he, hi, hr, ht, hu, hy, ia, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, war, yi, yo, zh

Conversion

Converted from PyTorch using torch.onnx.export (ONNX) and torch.export + coremltools (CoreML).

Conversion script: scripts/convert-lang-id-model.py

License

Apache 2.0 (same as the original SpeechBrain model)

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for drakulavich/SpeechBrain-coreml

Base model

speechbrain/lang-id-voxlingua107-ecapa

Quantized

(6)

this model