GGUF for ECAPA-TDNN LID in CrispASR (CommonLanguage 45-lang variant)

#6
by cstr - opened

Thanks for the SpeechBrain ECAPA-TDNN LID models! Both VoxLingua107 and CommonLanguage are wired into CrispASR — the ECAPA-TDNN runtime (src/ecapa_lid.cpp) is a single ggml graph (~4.1 s on CPU, ~6× faster than the SpeechBrain reference, 100% accuracy on the LID smoke set).

The CommonLanguage variant (45 langs, full-name labels — "English", "German", …) is wired as one of the --lid-backend ecapa choices alongside the 107-language VoxLingua variant (ISO codes — "en", "de", …):

We default to the 107-lang VoxLingua model in --lid-backend ecapa because the ISO codes plug straight into the ASR backends' -l <code> flag without a name→ISO lookup. The CommonLanguage variant is useful when the downstream target wants a human-readable language string (e.g. for UX display, not routing).

./build/bin/crispasr --backend wav2vec2 -m auto -l auto \
    --lid-backend ecapa --lid-model ecapa-lid-commonlanguage-q4_k.gguf \
    -f audio.wav

Sign up or log in to comment