quazim commited on
Commit
2abc3e4
·
verified ·
1 Parent(s): a49184e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -24,13 +24,17 @@ tags:
24
  It provides **streaming transcription**, **word timestamps**, and **scalable performance** for use cases like real-time captioning, meetings, and on-device voice interfaces.
25
 
26
 
27
- ## 📊 Quality Benchmarks
28
 
29
  TheWhisper is a fine-tuned Whisper model that can process audio chunks of any size up to 30 seconds. Unlike the original Whisper models, it doesn't require padding audio with silence to reach 30 seconds. We conducted quality benchmarking across different chunk sizes: 10, 15, 20, and 30 seconds. For quality benchmarks, we used the multilingual benchmarks [Open ASR Leaderboard](https://github.com/huggingface/open_asr_leaderboard#evaluate-a-model).
30
 
31
  <img width="1547" height="531" alt="vanilla whisper (1)" src="https://github.com/user-attachments/assets/f0c86e58-d834-4ac7-a06b-df3a7ae3e9e9" />
32
  <img width="1547" height="458" alt="TheStage AI Whisper (1)" src="https://github.com/user-attachments/assets/17fb45a3-b33d-4c83-b843-69b0f0aa3f65" />
33
 
 
 
 
 
34
 
35
  ### 10s chunks
36
 
 
24
  It provides **streaming transcription**, **word timestamps**, and **scalable performance** for use cases like real-time captioning, meetings, and on-device voice interfaces.
25
 
26
 
27
+ ## 📊 Benchmarks
28
 
29
  TheWhisper is a fine-tuned Whisper model that can process audio chunks of any size up to 30 seconds. Unlike the original Whisper models, it doesn't require padding audio with silence to reach 30 seconds. We conducted quality benchmarking across different chunk sizes: 10, 15, 20, and 30 seconds. For quality benchmarks, we used the multilingual benchmarks [Open ASR Leaderboard](https://github.com/huggingface/open_asr_leaderboard#evaluate-a-model).
30
 
31
  <img width="1547" height="531" alt="vanilla whisper (1)" src="https://github.com/user-attachments/assets/f0c86e58-d834-4ac7-a06b-df3a7ae3e9e9" />
32
  <img width="1547" height="458" alt="TheStage AI Whisper (1)" src="https://github.com/user-attachments/assets/17fb45a3-b33d-4c83-b843-69b0f0aa3f65" />
33
 
34
+ <img width="1547" height="531" src="https://cdn.thestage.ai/production/cms_file_upload/1764602147-b10162ae-e6f7-4307-bcb0-54b94528221c/NVIDIA, RTX-5090 (1).png">
35
+
36
+ For comprehensive performance and quality benchmarks see [TheWhisper](https://github.com/TheStageAI/TheWhisper/blob/main/benchmark/README.md).
37
+
38
 
39
  ### 10s chunks
40