bezzam HF Staff commited on
Commit
ce17950
·
verified ·
1 Parent(s): 8623f43

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -12
README.md CHANGED
@@ -62,7 +62,7 @@ library_name: transformers
62
  ---
63
 
64
 
65
- ## VibeVoice-ASR
66
  [![GitHub](https://img.shields.io/badge/GitHub-Repo-black?logo=github)](https://github.com/microsoft/VibeVoice)
67
  [![Live Playground](https://img.shields.io/badge/Live-Playground-green?logo=gradio)](https://aka.ms/vibevoice-asr)
68
  [![Technical Report](https://img.shields.io/badge/arXiv-2601.18184-b31b1b?logo=arxiv)](https://arxiv.org/pdf/2601.18184)
@@ -100,9 +100,9 @@ library_name: transformers
100
 
101
  ### Setup
102
 
103
- VibeVoice ASR is not yet merged into Transformers but can be used by pulling the source code from the following fork:
104
  ```
105
- pip install git+https://github.com/ebezzam/transformers.git@vibevoice_asr
106
  ```
107
 
108
  ### Loading model
@@ -110,7 +110,7 @@ pip install git+https://github.com/ebezzam/transformers.git@vibevoice_asr
110
  ```python
111
  from transformers import AutoProcessor, VibeVoiceForConditionalGeneration
112
 
113
- model_id = "bezzam/VibeVoice-ASR-7B
114
  processor = AutoProcessor.from_pretrained(model_id)
115
  model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id)
116
  ```
@@ -128,7 +128,7 @@ The example below transcribes the following audio.
128
  ```python
129
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
130
 
131
- model_id = "bezzam/VibeVoice-ASR-7B"
132
  processor = AutoProcessor.from_pretrained(model_id)
133
  model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id, device_map="auto")
134
  print(f"Model loaded on {model.device} with dtype {model.dtype}")
@@ -199,7 +199,7 @@ Below we transcribe an audio where the speaker (with a German accent) talks abou
199
  ```python
200
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
201
 
202
- model_id = "bezzam/VibeVoice-ASR-7B"
203
  processor = AutoProcessor.from_pretrained(model_id)
204
  model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id, device_map="auto")
205
  print(f"Model loaded on {model.device} with dtype {model.dtype}")
@@ -237,7 +237,7 @@ Batch inference is possible by passing a list of audio and (if provided) a list
237
  ```python
238
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
239
 
240
- model_id = "bezzam/VibeVoice-ASR-7B"
241
  audio = [
242
  "https://huggingface.co/datasets/bezzam/vibevoice_samples/resolve/main/realtime_model/vibevoice_tts_german.wav",
243
  "https://huggingface.co/datasets/bezzam/vibevoice_samples/resolve/main/example_output/VibeVoice-1.5B_output.wav"
@@ -266,7 +266,7 @@ However, if chunks of 60 seconds are too large for your device, the `tokenizer_c
266
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
267
 
268
  tokenizer_chunk_size = 64000 # default is 1440000 (60s @ 24kHz)
269
- model_id = "bezzam/VibeVoice-ASR-7B"
270
  audio = [
271
  "https://huggingface.co/datasets/bezzam/vibevoice_samples/resolve/main/realtime_model/vibevoice_tts_german.wav",
272
  "https://huggingface.co/datasets/bezzam/vibevoice_samples/resolve/main/example_output/VibeVoice-1.5B_output.wav"
@@ -290,7 +290,7 @@ VibeVoice ASR also accepts chat template inputs (`apply_transcription_request` i
290
  ```python
291
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
292
 
293
- model_id = "bezzam/VibeVoice-ASR-7B"
294
  processor = AutoProcessor.from_pretrained(model_id)
295
  model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id, device_map="auto")
296
 
@@ -339,7 +339,7 @@ VibeVoice ASR can be trained with the loss outputted by the model.
339
  ```python
340
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
341
 
342
- model_id = "bezzam/VibeVoice-ASR-7B"
343
  processor = AutoProcessor.from_pretrained(model_id)
344
  model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id, device_map="auto")
345
  model.train()
@@ -392,7 +392,7 @@ import time
392
  import torch
393
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
394
 
395
- model_id = "bezzam/VibeVoice-ASR-7B"
396
 
397
  num_warmup = 5
398
  num_runs = 20
@@ -475,7 +475,7 @@ The model can be used as a pipeline, but you will have to define your own method
475
  ```python
476
  from transformers import pipeline
477
 
478
- model_id = "bezzam/VibeVoice-ASR-7B"
479
  pipe = pipeline("any-to-any", model=model_id, device_map="auto")
480
  chat_template = [
481
  {
 
62
  ---
63
 
64
 
65
+ ## VibeVoice-ASR (Transformers-compatible version)
66
  [![GitHub](https://img.shields.io/badge/GitHub-Repo-black?logo=github)](https://github.com/microsoft/VibeVoice)
67
  [![Live Playground](https://img.shields.io/badge/Live-Playground-green?logo=gradio)](https://aka.ms/vibevoice-asr)
68
  [![Technical Report](https://img.shields.io/badge/arXiv-2601.18184-b31b1b?logo=arxiv)](https://arxiv.org/pdf/2601.18184)
 
100
 
101
  ### Setup
102
 
103
+ Until VibeVoice ASR is part of an official Transformers release, it can be used by installing from the source code:
104
  ```
105
+ pip install git+https://github.com/huggingface/transformers.git
106
  ```
107
 
108
  ### Loading model
 
110
  ```python
111
  from transformers import AutoProcessor, VibeVoiceForConditionalGeneration
112
 
113
+ model_id = "microsoft/VibeVoice-ASR-HF
114
  processor = AutoProcessor.from_pretrained(model_id)
115
  model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id)
116
  ```
 
128
  ```python
129
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
130
 
131
+ model_id = "microsoft/VibeVoice-ASR-HF"
132
  processor = AutoProcessor.from_pretrained(model_id)
133
  model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id, device_map="auto")
134
  print(f"Model loaded on {model.device} with dtype {model.dtype}")
 
199
  ```python
200
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
201
 
202
+ model_id = "microsoft/VibeVoice-ASR-HF"
203
  processor = AutoProcessor.from_pretrained(model_id)
204
  model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id, device_map="auto")
205
  print(f"Model loaded on {model.device} with dtype {model.dtype}")
 
237
  ```python
238
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
239
 
240
+ model_id = "microsoft/VibeVoice-ASR-HF"
241
  audio = [
242
  "https://huggingface.co/datasets/bezzam/vibevoice_samples/resolve/main/realtime_model/vibevoice_tts_german.wav",
243
  "https://huggingface.co/datasets/bezzam/vibevoice_samples/resolve/main/example_output/VibeVoice-1.5B_output.wav"
 
266
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
267
 
268
  tokenizer_chunk_size = 64000 # default is 1440000 (60s @ 24kHz)
269
+ model_id = "microsoft/VibeVoice-ASR-HF"
270
  audio = [
271
  "https://huggingface.co/datasets/bezzam/vibevoice_samples/resolve/main/realtime_model/vibevoice_tts_german.wav",
272
  "https://huggingface.co/datasets/bezzam/vibevoice_samples/resolve/main/example_output/VibeVoice-1.5B_output.wav"
 
290
  ```python
291
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
292
 
293
+ model_id = "microsoft/VibeVoice-ASR-HF"
294
  processor = AutoProcessor.from_pretrained(model_id)
295
  model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id, device_map="auto")
296
 
 
339
  ```python
340
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
341
 
342
+ model_id = "microsoft/VibeVoice-ASR-HF"
343
  processor = AutoProcessor.from_pretrained(model_id)
344
  model = VibeVoiceAsrForConditionalGeneration.from_pretrained(model_id, device_map="auto")
345
  model.train()
 
392
  import torch
393
  from transformers import AutoProcessor, VibeVoiceAsrForConditionalGeneration
394
 
395
+ model_id = "microsoft/VibeVoice-ASR-HF"
396
 
397
  num_warmup = 5
398
  num_runs = 20
 
475
  ```python
476
  from transformers import pipeline
477
 
478
+ model_id = "microsoft/VibeVoice-ASR-HF"
479
  pipe = pipeline("any-to-any", model=model_id, device_map="auto")
480
  chat_template = [
481
  {