GGUF for ECAPA-TDNN LID in CrispASR (CommonLanguage 45-lang variant)

by cstr - opened 17 days ago

•

Thanks for the SpeechBrain ECAPA-TDNN LID models! Both VoxLingua107 and CommonLanguage are wired into CrispASR — the ECAPA-TDNN runtime (src/ecapa_lid.cpp) is a single ggml graph (~4.1 s on CPU, ~6× faster than the SpeechBrain reference, 100% accuracy on the LID smoke set).

The CommonLanguage variant (45 langs, full-name labels — "English", "German", …) is wired as one of the --lid-backend ecapa choices alongside the 107-language VoxLingua variant (ISO codes — "en", "de", …):

cstr/ecapa-lid-107-GGUF — VoxLingua107 (default)
cstr/ecapa-lid-commonlanguage-GGUF — CommonLanguage (45 langs, full names)

We default to the 107-lang VoxLingua model in --lid-backend ecapa because the ISO codes plug straight into the ASR backends' -l <code> flag without a name→ISO lookup. The CommonLanguage variant is useful when the downstream target wants a human-readable language string (e.g. for UX display, not routing).

./build/bin/crispasr --backend wav2vec2 -m auto -l auto \
    --lid-backend ecapa --lid-model ecapa-lid-commonlanguage-q4_k.gguf \
    -f audio.wav

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment