Automatic Speech Recognition

Use Omnilingual ASR Without fairseq2 β€” Native HF Conversions Available

#1
by aadel4 - opened

Hi! Since this model is based on the Wav2Vec2.0 architecture, it can be a drop-in replacement for existing Wav2Vec2.0 pipelines, but only if you can get it running without fairseq2.

I converted the 300M and 1B CTC (V1 and V2) and SSL checkpoints to native HuggingFace format (Wav2Vec2ForCTC). You can load and run inference with standard transformers in a few lines of code, no fairseq2 required. I was not able to convert the 3B+ variants due to GPU limitations, conversion code is open sourced and contributions are welcome.

300M models:
https://huggingface.co/aadel4/omniASR-CTC-300M
https://huggingface.co/aadel4/omniASR-CTC-300M-v2
https://huggingface.co/aadel4/omniASR-W2V-300M

1B models:
https://huggingface.co/aadel4/omniASR-CTC-1B
https://huggingface.co/aadel4/omniASR-CTC-1B-v2
https://huggingface.co/aadel4/omniASR-W2V-1B

Conversion code: https://github.com/ahmedadelattia/omnilingual_to_hf
Screenshot_20260315_174008_Chrome

Sign up or log in to comment