YaoyaoChang commited on
Commit
2d0b945
Β·
1 Parent(s): 4c769d9

update README

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -29,14 +29,15 @@ library_name: transformers
29
 
30
  ## πŸ”₯ Key Features
31
 
32
- - **πŸ•’ 60-min Single-Pass Processing**:
33
- Unlike conventional ASR models that slice audio into short chunks (often losing global context), VibeVoice ASR accepts up to **60 minutes** of continuous audio input within 64K length. This ensures consistent speaker tracking and semantic coherence across the entire hour.
34
 
35
- - **πŸ‘€ Optional Context Injection**:
36
- Users can provide customized context (e.g., specific names, technical terms, or background info) to guide the recognition process, significantly improving accuracy on domain-specific content.
37
 
38
  - **πŸ“ Rich Transcription (Who, When, What)**:
39
- The model performs ASR, Diarization, and Timestamping simultaneously. The output is a structured sequence indicating *who* said *what* at *which time*.
 
40
 
41
 
42
 
 
29
 
30
  ## πŸ”₯ Key Features
31
 
32
+ - **πŸ•’ 60-minute Single-Pass Processing**:
33
+ Unlike conventional ASR models that slice audio into short chunks (often losing global context), VibeVoice ASR accepts up to **60 minutes** of continuous audio input within 64K token length. This ensures consistent speaker tracking and semantic coherence across the entire hour.
34
 
35
+ - **πŸ‘€ Customized Hotwords**:
36
+ Users can provide customized hotwords (e.g., specific names, technical terms, or background info) to guide the recognition process, significantly improving accuracy on domain-specific content.
37
 
38
  - **πŸ“ Rich Transcription (Who, When, What)**:
39
+ The model jointly performs ASR, diarization, and timestamping, producing a structured output that indicates *who* said *what* and *when*.
40
+
41
 
42
 
43