frontierai commited on
Commit
e49f300
·
verified ·
1 Parent(s): 53ed0b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -13,13 +13,15 @@ library_name: transformers
13
  ---
14
 
15
 
16
- ## VibeVoice-ASR: Long-Form Rich Transcription with User Prompts
 
 
17
 
18
  **VibeVoice-ASR** is the latest addition to the **VibeVoice** family. While the original VibeVoice / VibeVoice-Realtime focused on expressive TTS, **VibeVoice-ASR** focuses on understanding long-form speech with high precision and rich metadata.
19
 
20
  It is a unified speech-to-text model designed to handle **1-hour long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **User-Customized Context**.
21
 
22
- ➡️ **Code:** [microsoft/VibeVoice-Code](https://github.com/microsoft/VibeVoice)
23
 
24
  <p align="left">
25
  <img src="figures/VibeVoice_ASR_archi.png" alt="VibeVoice-ASR Architecture" height="250px">
@@ -37,6 +39,7 @@ It is a unified speech-to-text model designed to handle **1-hour long-form audio
37
  - **📝 Rich Transcription (Who, When, What)**:
38
  The model performs ASR, Diarization, and Timestamping simultaneously. The output is a structured sequence indicating *who* said *what* at *which time*.
39
 
 
40
 
41
  ## Installation and Usage
42
 
 
13
  ---
14
 
15
 
16
+ ## VibeVoice-ASR
17
+ [![Live Playground](https://img.shields.io/badge/Live-Playground-green?logo=gradio)](https://aka.ms/vibevoice-asr)
18
+
19
 
20
  **VibeVoice-ASR** is the latest addition to the **VibeVoice** family. While the original VibeVoice / VibeVoice-Realtime focused on expressive TTS, **VibeVoice-ASR** focuses on understanding long-form speech with high precision and rich metadata.
21
 
22
  It is a unified speech-to-text model designed to handle **1-hour long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **User-Customized Context**.
23
 
24
+ ➡️ **Code:** [microsoft/VibeVoice](https://github.com/microsoft/VibeVoice)
25
 
26
  <p align="left">
27
  <img src="figures/VibeVoice_ASR_archi.png" alt="VibeVoice-ASR Architecture" height="250px">
 
39
  - **📝 Rich Transcription (Who, When, What)**:
40
  The model performs ASR, Diarization, and Timestamping simultaneously. The output is a structured sequence indicating *who* said *what* at *which time*.
41
 
42
+ [Try it here.](https://aka.ms/vibevoice-asr)
43
 
44
  ## Installation and Usage
45