namphungdn134 commited on
Commit
83937ae
ยท
1 Parent(s): 33d759a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -13
README.md CHANGED
@@ -10,8 +10,6 @@ tags:
10
  - audio2text
11
  - S2T
12
  - STT
13
- datasets:
14
- - doof-ferb/vlsp2020_vinai_100h
15
  metrics:
16
  - wer
17
  model-index:
@@ -32,15 +30,7 @@ This is a fine-tuned version of [openai/whisper-base](https://huggingface.co/ope
32
 
33
  ## ๐Ÿ“Š Fine-tuning Results
34
 
35
- - **Loss**: 0.4049
36
- - **Word Error Rate (WER)**: 20.3964
37
-
38
- | Training Loss | Epoch | Step | Validation Loss | Wer |
39
- |:-------------:|:------:|:----:|:---------------:|:-------:|
40
- | 0.5199 | 0.5967 | 1000 | 0.5043 | 25.5525 |
41
- | 0.3967 | 1.1933 | 2000 | 0.4336 | 21.2506 |
42
- | 0.3459 | 1.7900 | 3000 | 0.4086 | 20.7572 |
43
- | 0.3208 | 2.0883 | 3500 | 0.4049 | 20.3964 |
44
 
45
  > Evaluation was performed on a held-out test set with diverse regional accents and speaking styles.
46
 
@@ -52,8 +42,8 @@ This model works with the WhisperProcessor to pre-process audio inputs into log-
52
 
53
  ## ๐Ÿ“ Dataset
54
 
55
- - Total Duration: 100 hours of high-quality Vietnamese speech data
56
- - Sources: Public Vietnamese datasets including the [vlsp2020_vinai_100h](https://huggingface.co/doof-ferb/vlsp2020_vinai_100h) dataset
57
  - Format: 16kHz WAV files with corresponding text transcripts
58
  - Preprocessing: Audio was normalized and segmented. Transcripts were cleaned and tokenized.
59
 
 
10
  - audio2text
11
  - S2T
12
  - STT
 
 
13
  metrics:
14
  - wer
15
  model-index:
 
30
 
31
  ## ๐Ÿ“Š Fine-tuning Results
32
 
33
+ - **Word Error Rate (WER)**: 16.9148
 
 
 
 
 
 
 
 
34
 
35
  > Evaluation was performed on a held-out test set with diverse regional accents and speaking styles.
36
 
 
42
 
43
  ## ๐Ÿ“ Dataset
44
 
45
+ - Total Duration: More 100 hours of high-quality Vietnamese speech data
46
+ - Sources: Public Vietnamese datasets
47
  - Format: 16kHz WAV files with corresponding text transcripts
48
  - Preprocessing: Audio was normalized and segmented. Transcripts were cleaned and tokenized.
49