Use Omnilingual ASR Without fairseq2 — Native HF Conversions Available

by aadel4 - opened Mar 15

Mar 15

Hi! Since this model is based on the Wav2Vec2.0 architecture, it can be a drop-in replacement for existing Wav2Vec2.0 pipelines, but only if you can get it running without fairseq2.

I converted the 300M and 1B CTC (V1 and V2) and SSL checkpoints to native HuggingFace format (Wav2Vec2ForCTC). You can load and run inference with standard transformers in a few lines of code, no fairseq2 required. I was not able to convert the 3B+ variants due to GPU limitations, conversion code is open sourced and contributions are welcome.

300M models:
https://huggingface.co/aadel4/omniASR-CTC-300M
https://huggingface.co/aadel4/omniASR-CTC-300M-v2
https://huggingface.co/aadel4/omniASR-W2V-300M

1B models:
https://huggingface.co/aadel4/omniASR-CTC-1B
https://huggingface.co/aadel4/omniASR-CTC-1B-v2
https://huggingface.co/aadel4/omniASR-W2V-1B

Conversion code: https://github.com/ahmedadelattia/omnilingual_to_hf

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment