VibeVoice ASR is part of Transformers from v5.3.0

#20
by bezzam - opened

Here is the checkpoint compatible with Transformers πŸ€— https://huggingface.co/microsoft/VibeVoice-ASR-HF

TODO: updating the LoRA Fine-tuning tutorial as the state dict has changed. You can also see the mapping from original to Transformers here.

I tried it with a 90 seconds sample file, but all it did was immediately OOM on me on a 24 GB 4090. The Gradio demo could handle a 30 minutes file without issues.

@andypotato thanks for trying it out! Could you try adjusting the acoustic tokenizer chunk size as described here? Maybe the Gradio demo used a different value

Sign up or log in to comment