Update README.md
Browse files
README.md
CHANGED
|
@@ -69,11 +69,8 @@ library_name: transformers
|
|
| 69 |
|
| 70 |
**VibeVoice-ASR** is a unified speech-to-text model designed to handle **60-minute long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **Customized Hotwords** and over **50 languages**.
|
| 71 |
|
| 72 |
-
➡️ **Code:** [microsoft/VibeVoice](https://github.com/microsoft/VibeVoice)<br>
|
| 73 |
➡️ **Demo:** [VibeVoice-ASR-Demo](https://aka.ms/vibevoice-asr)<br>
|
| 74 |
➡️ **Report:** [VibeVoice-ASR Technical Report](https://arxiv.org/pdf/2601.18184)<br>
|
| 75 |
-
➡️ **Finetuning:** [Finetuning](https://github.com/microsoft/VibeVoice/blob/main/finetuning-asr/README.md)<br>
|
| 76 |
-
➡️ **vLLM:** [vLLM-VibeVoice-ASR](https://github.com/microsoft/VibeVoice/blob/main/docs/vibevoice-vllm-asr.md)<br>
|
| 77 |
|
| 78 |
<p align="left">
|
| 79 |
<img src="figures/VibeVoice_ASR_archi.png" alt="VibeVoice-ASR Architecture" height="250px">
|
|
@@ -100,7 +97,11 @@ library_name: transformers
|
|
| 100 |
|
| 101 |
### Setup
|
| 102 |
|
| 103 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
```
|
| 105 |
pip install git+https://github.com/huggingface/transformers.git
|
| 106 |
```
|
|
@@ -110,7 +111,7 @@ pip install git+https://github.com/huggingface/transformers.git
|
|
| 110 |
```python
|
| 111 |
from transformers import AutoProcessor, VibeVoiceForConditionalGeneration
|
| 112 |
|
| 113 |
-
model_id = "microsoft/VibeVoice-ASR-HF
|
| 114 |
processor = AutoProcessor.from_pretrained(model_id)
|
| 115 |
model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id)
|
| 116 |
```
|
|
|
|
| 69 |
|
| 70 |
**VibeVoice-ASR** is a unified speech-to-text model designed to handle **60-minute long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **Customized Hotwords** and over **50 languages**.
|
| 71 |
|
|
|
|
| 72 |
➡️ **Demo:** [VibeVoice-ASR-Demo](https://aka.ms/vibevoice-asr)<br>
|
| 73 |
➡️ **Report:** [VibeVoice-ASR Technical Report](https://arxiv.org/pdf/2601.18184)<br>
|
|
|
|
|
|
|
| 74 |
|
| 75 |
<p align="left">
|
| 76 |
<img src="figures/VibeVoice_ASR_archi.png" alt="VibeVoice-ASR Architecture" height="250px">
|
|
|
|
| 97 |
|
| 98 |
### Setup
|
| 99 |
|
| 100 |
+
```
|
| 101 |
+
pip install transformers
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
However, if you're here early and VibeVoice ASR is not yet part of an official Transformers release, it can be used by installing from the source code:
|
| 105 |
```
|
| 106 |
pip install git+https://github.com/huggingface/transformers.git
|
| 107 |
```
|
|
|
|
| 111 |
```python
|
| 112 |
from transformers import AutoProcessor, VibeVoiceForConditionalGeneration
|
| 113 |
|
| 114 |
+
model_id = "microsoft/VibeVoice-ASR-HF"
|
| 115 |
processor = AutoProcessor.from_pretrained(model_id)
|
| 116 |
model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id)
|
| 117 |
```
|