microsoft
/

VibeVoice-ASR-HF

Audio-Text-to-Text

automatic-speech-recognition

Model card Files Files and versions

bezzam HF Staff commited on Mar 2

Commit

6df8b1e

·

verified ·

1 Parent(s): ce17950

Update README.md

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -69,11 +69,8 @@ library_name: transformers
 **VibeVoice-ASR** is a unified speech-to-text model designed to handle **60-minute long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **Customized Hotwords** and over **50 languages**.
-➡️ **Code:** [microsoft/VibeVoice](https://github.com/microsoft/VibeVoice)<br>
 ➡️ **Demo:** [VibeVoice-ASR-Demo](https://aka.ms/vibevoice-asr)<br>
 ➡️ **Report:** [VibeVoice-ASR Technical Report](https://arxiv.org/pdf/2601.18184)<br>
-➡️ **Finetuning:** [Finetuning](https://github.com/microsoft/VibeVoice/blob/main/finetuning-asr/README.md)<br>
-➡️ **vLLM:** [vLLM-VibeVoice-ASR](https://github.com/microsoft/VibeVoice/blob/main/docs/vibevoice-vllm-asr.md)<br>
 <p align="left">
   <img src="figures/VibeVoice_ASR_archi.png" alt="VibeVoice-ASR Architecture" height="250px">
@@ -100,7 +97,11 @@ library_name: transformers
 ### Setup
-Until VibeVoice ASR is part of an official Transformers release, it can be used by installing from the source code:
 ```
 pip install git+https://github.com/huggingface/transformers.git
 ```
@@ -110,7 +111,7 @@ pip install git+https://github.com/huggingface/transformers.git
 ```python
 from transformers import AutoProcessor, VibeVoiceForConditionalGeneration
-model_id = "microsoft/VibeVoice-ASR-HF
 processor = AutoProcessor.from_pretrained(model_id)
 model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id)
 ```

 **VibeVoice-ASR** is a unified speech-to-text model designed to handle **60-minute long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **Customized Hotwords** and over **50 languages**.
 ➡️ **Demo:** [VibeVoice-ASR-Demo](https://aka.ms/vibevoice-asr)<br>
 ➡️ **Report:** [VibeVoice-ASR Technical Report](https://arxiv.org/pdf/2601.18184)<br>
 <p align="left">
   <img src="figures/VibeVoice_ASR_archi.png" alt="VibeVoice-ASR Architecture" height="250px">
 ### Setup
+```
+pip install transformers
+```
+However, if you're here early and VibeVoice ASR is not yet part of an official Transformers release, it can be used by installing from the source code:
 ```
 pip install git+https://github.com/huggingface/transformers.git
 ```
 ```python
 from transformers import AutoProcessor, VibeVoiceForConditionalGeneration
+model_id = "microsoft/VibeVoice-ASR-HF"
 processor = AutoProcessor.from_pretrained(model_id)
 model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id)
 ```