Commit ยท
83937ae
1
Parent(s): 33d759a
Update README.md
Browse files
README.md
CHANGED
|
@@ -10,8 +10,6 @@ tags:
|
|
| 10 |
- audio2text
|
| 11 |
- S2T
|
| 12 |
- STT
|
| 13 |
-
datasets:
|
| 14 |
-
- doof-ferb/vlsp2020_vinai_100h
|
| 15 |
metrics:
|
| 16 |
- wer
|
| 17 |
model-index:
|
|
@@ -32,15 +30,7 @@ This is a fine-tuned version of [openai/whisper-base](https://huggingface.co/ope
|
|
| 32 |
|
| 33 |
## ๐ Fine-tuning Results
|
| 34 |
|
| 35 |
-
- **
|
| 36 |
-
- **Word Error Rate (WER)**: 20.3964
|
| 37 |
-
|
| 38 |
-
| Training Loss | Epoch | Step | Validation Loss | Wer |
|
| 39 |
-
|:-------------:|:------:|:----:|:---------------:|:-------:|
|
| 40 |
-
| 0.5199 | 0.5967 | 1000 | 0.5043 | 25.5525 |
|
| 41 |
-
| 0.3967 | 1.1933 | 2000 | 0.4336 | 21.2506 |
|
| 42 |
-
| 0.3459 | 1.7900 | 3000 | 0.4086 | 20.7572 |
|
| 43 |
-
| 0.3208 | 2.0883 | 3500 | 0.4049 | 20.3964 |
|
| 44 |
|
| 45 |
> Evaluation was performed on a held-out test set with diverse regional accents and speaking styles.
|
| 46 |
|
|
@@ -52,8 +42,8 @@ This model works with the WhisperProcessor to pre-process audio inputs into log-
|
|
| 52 |
|
| 53 |
## ๐ Dataset
|
| 54 |
|
| 55 |
-
- Total Duration: 100 hours of high-quality Vietnamese speech data
|
| 56 |
-
- Sources: Public Vietnamese datasets
|
| 57 |
- Format: 16kHz WAV files with corresponding text transcripts
|
| 58 |
- Preprocessing: Audio was normalized and segmented. Transcripts were cleaned and tokenized.
|
| 59 |
|
|
|
|
| 10 |
- audio2text
|
| 11 |
- S2T
|
| 12 |
- STT
|
|
|
|
|
|
|
| 13 |
metrics:
|
| 14 |
- wer
|
| 15 |
model-index:
|
|
|
|
| 30 |
|
| 31 |
## ๐ Fine-tuning Results
|
| 32 |
|
| 33 |
+
- **Word Error Rate (WER)**: 16.9148
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
> Evaluation was performed on a held-out test set with diverse regional accents and speaking styles.
|
| 36 |
|
|
|
|
| 42 |
|
| 43 |
## ๐ Dataset
|
| 44 |
|
| 45 |
+
- Total Duration: More 100 hours of high-quality Vietnamese speech data
|
| 46 |
+
- Sources: Public Vietnamese datasets
|
| 47 |
- Format: 16kHz WAV files with corresponding text transcripts
|
| 48 |
- Preprocessing: Audio was normalized and segmented. Transcripts were cleaned and tokenized.
|
| 49 |
|