hynt commited on
Commit
c122fdc
·
verified ·
1 Parent(s): bbb7e3c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -5,7 +5,7 @@ license: cc-by-nc-nd-4.0
5
 
6
  ## 🔍 Overview
7
  The **Vietnamese Streaming Speech-to-Text (ASR)** model is built on the **ZipFormer architecture with chunk size 16,32,64** — an improved variant of the Conformer — featuring only **30 million parameters** yet.
8
- On CPU, the model can transcribe a **12-second audio clip in just 0.3 seconds**, significantly faster than most traditional ASR systems without requiring a GPU.
9
 
10
  ---
11
 
@@ -67,8 +67,8 @@ Comprehensive details about **training data**, **optimization strategies**, **ar
67
 
68
  | **Device** | **Audio Length** | **Inference Time** |
69
  |-------------|------------------|--------------------|
70
- | CPU (Hugging Face Basic) | 12 seconds | **0.3 s** |
71
- | GPU (RTX 3090) | 12 seconds | **< 0.1 s** |
72
 
73
  ---
74
 
 
5
 
6
  ## 🔍 Overview
7
  The **Vietnamese Streaming Speech-to-Text (ASR)** model is built on the **ZipFormer architecture with chunk size 16,32,64** — an improved variant of the Conformer — featuring only **30 million parameters** yet.
8
+ On CPU, the model can transcribe a **1-second audio chunk in just 0.05 seconds**, designed for streaming-based tasks with low latency requirements.
9
 
10
  ---
11
 
 
67
 
68
  | **Device** | **Audio Length** | **Inference Time** |
69
  |-------------|------------------|--------------------|
70
+ | CPU (Hugging Face Basic) | 1 seconds audio chunk | **0.05 s** |
71
+ | GPU (RTX 3090) | 1 seconds audio chunk | **< 0.01 s** |
72
 
73
  ---
74