Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,7 @@ license: cc-by-nc-nd-4.0
|
|
| 5 |
|
| 6 |
## 🔍 Overview
|
| 7 |
The **Vietnamese Streaming Speech-to-Text (ASR)** model is built on the **ZipFormer architecture with chunk size 16,32,64** — an improved variant of the Conformer — featuring only **30 million parameters** yet.
|
| 8 |
-
On CPU, the model can transcribe a **
|
| 9 |
|
| 10 |
---
|
| 11 |
|
|
@@ -67,8 +67,8 @@ Comprehensive details about **training data**, **optimization strategies**, **ar
|
|
| 67 |
|
| 68 |
| **Device** | **Audio Length** | **Inference Time** |
|
| 69 |
|-------------|------------------|--------------------|
|
| 70 |
-
| CPU (Hugging Face Basic) |
|
| 71 |
-
| GPU (RTX 3090) |
|
| 72 |
|
| 73 |
---
|
| 74 |
|
|
|
|
| 5 |
|
| 6 |
## 🔍 Overview
|
| 7 |
The **Vietnamese Streaming Speech-to-Text (ASR)** model is built on the **ZipFormer architecture with chunk size 16,32,64** — an improved variant of the Conformer — featuring only **30 million parameters** yet.
|
| 8 |
+
On CPU, the model can transcribe a **1-second audio chunk in just 0.05 seconds**, designed for streaming-based tasks with low latency requirements.
|
| 9 |
|
| 10 |
---
|
| 11 |
|
|
|
|
| 67 |
|
| 68 |
| **Device** | **Audio Length** | **Inference Time** |
|
| 69 |
|-------------|------------------|--------------------|
|
| 70 |
+
| CPU (Hugging Face Basic) | 1 seconds audio chunk | **0.05 s** |
|
| 71 |
+
| GPU (RTX 3090) | 1 seconds audio chunk | **< 0.01 s** |
|
| 72 |
|
| 73 |
---
|
| 74 |
|