emredeveloper commited on
Commit
19fc62a
·
verified ·
1 Parent(s): cfb5334

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -70
README.md CHANGED
@@ -1,7 +1,14 @@
1
- ```markdown
2
  ---
3
  language: en
4
  license: mit
 
 
 
 
 
 
 
 
5
  model-index:
6
  - name: whisper-small-tr
7
  results:
@@ -15,25 +22,23 @@ model-index:
15
  - type: cer
16
  value: 1.95
17
  name: Character Error Rate
18
- widget:
19
- - audio: https://huggingface.co/datasets/NgoHoang/Vietnamese_Speech_Recognition/resolve/main/Test/audio/common_voice_vi_24070014.mp3
20
  ---
21
 
22
  # whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
23
 
24
- This model is a fine-tuned version of the `openai/whisper-small` base model by OpenAI, optimized for Turkish Automatic Speech Recognition (ASR).
25
 
26
  ## Model Description
27
 
28
- Whisper models are powerful multilingual and multitask models pre-trained on a large variety of audio data. This project aims to significantly enhance the performance of the `whisper-small` model specifically for Turkish, by fine-tuning it on the `Codyfederer/tr-full-dataset` dataset.
29
 
30
  ## Training Data
31
 
32
- The model was primarily trained on the Turkish audio and transcription dataset named `Codyfederer/tr-full-dataset`. From this dataset, 3000 samples were selected and split into 90% for training and 10% for testing.
33
 
34
  ## Training Parameters
35
 
36
- The training was performed using the Hugging Face `Trainer` class with the following `Seq2SeqTrainingArguments`:
37
 
38
  - `output_dir`: `./whisper-small-tr`
39
  - `per_device_train_batch_size`: 16
@@ -42,94 +47,56 @@ The training was performed using the Hugging Face `Trainer` class with the follo
42
  - `warmup_steps`: 50
43
  - `num_train_epochs`: 3
44
  - `weight_decay`: 0.005
45
- - `gradient_checkpointing`: `True` (For memory optimization)
46
- - `fp16`: `True` (For faster training)
47
- - `eval_strategy`: `"steps"`
48
  - `per_device_eval_batch_size`: 8
49
- - `predict_with_generate`: `True`
50
  - `generation_max_length`: 225
51
  - `save_steps`: 200
52
  - `eval_steps`: 200
53
  - `logging_steps`: 25
54
- - `report_to`: `["tensorboard"]`
55
- - `load_best_model_at_end`: `True`
56
- - `metric_for_best_model`: `"wer"` (Lower is better)
57
- - `greater_is_better`: `False`
58
- - `push_to_hub`: `True`
59
- - `hub_model_id`: `whisper-small-tr`
60
- - `optim`: `adamw_torch`
61
  - `dataloader_num_workers`: 4
62
- - `dataloader_pin_memory`: `True`
63
  - `save_total_limit`: 2
64
 
65
  ## Performance
66
 
67
- Evaluation results of the model on the test set:
68
 
69
- - **Word Error Rate (WER)**: 7.75%
70
- - **Character Error Rate (CER)**: 1.95%
71
- - **Loss**: 0.1321
72
 
73
- #### Comparison with Base Model (on example audio)
74
 
75
- In a comparison conducted with a new audio file (`/content/audio.mp3`):
76
 
77
- - **Base Whisper Model**: WER: 23.53% | CER: 2.82%
78
- - **Fine-Tuned Model**: WER: 11.76% | CER: 2.11%
79
 
80
- These results demonstrate a significant improvement in the fine-tuned model's performance for the Turkish ASR task compared to the base model.
81
 
82
- ## How to Use
83
-
84
- You can easily use this model with the Hugging Face `transformers` library:
85
 
86
  ```python
87
  from transformers import pipeline
88
  import torch
89
 
90
- # Load the model
91
  pipeline = pipeline(
92
  task="automatic-speech-recognition",
93
- model="emredeveloper/whisper-small-tr", # Your username/repo name
94
  chunk_length_s=30,
95
  device="cuda" if torch.cuda.is_available() else "cpu",
96
  )
97
 
98
- # Transcribe an audio file
99
- audio_file = "path/to/your/audio.flac" # Specify the path to your audio file
100
  text = pipeline(audio_file)["text"]
101
- print(text)
102
- ```
103
-
104
- ### Gradio Demo
105
-
106
- You can also create a Gradio demo to interactively test the model:
107
-
108
- ```python
109
- import gradio as gr
110
- from transformers import pipeline
111
- import torch
112
-
113
- pipeline = pipeline(
114
- task="automatic-speech-recognition",
115
- model="emredeveloper/whisper-small-tr", # Your username/repo name
116
- chunk_length_s=30,
117
- device="cuda" if torch.cuda.is_available() else "cpu",
118
- )
119
-
120
- def transcribe(audio):
121
- if audio is None:
122
- return ""
123
- text = pipeline(audio)["text"]
124
- return text
125
-
126
- iface = gr.Interface(
127
- fn=transcribe,
128
- inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
129
- outputs="text",
130
- title="Fine-Tuned Whisper Turkish Demo",
131
- description="Record your voice or upload a Turkish audio file to see the model in action.",
132
- )
133
-
134
- iface.launch()
135
- ```
 
 
1
  ---
2
  language: en
3
  license: mit
4
+ tags:
5
+ - audio
6
+ - speech-recognition
7
+ - whisper
8
+ - turkish
9
+ - asr
10
+ datasets:
11
+ - Codyfederer/tr-full-dataset
12
  model-index:
13
  - name: whisper-small-tr
14
  results:
 
22
  - type: cer
23
  value: 1.95
24
  name: Character Error Rate
 
 
25
  ---
26
 
27
  # whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
28
 
29
+ This model is a fine-tuned version of the `openai/whisper-small` base model, optimized for Turkish Automatic Speech Recognition (ASR).
30
 
31
  ## Model Description
32
 
33
+ Whisper models are multilingual and multitask models pre-trained on diverse audio data. This project fine-tunes the `whisper-small` model on the `Codyfederer/tr-full-dataset` to improve Turkish ASR performance.
34
 
35
  ## Training Data
36
 
37
+ The model uses the `Codyfederer/tr-full-dataset`, consisting of 3000 Turkish audio-transcription samples, split into 90% training and 10% testing.
38
 
39
  ## Training Parameters
40
 
41
+ Training utilized the Hugging Face `Trainer` with the following `Seq2SeqTrainingArguments`:
42
 
43
  - `output_dir`: `./whisper-small-tr`
44
  - `per_device_train_batch_size`: 16
 
47
  - `warmup_steps`: 50
48
  - `num_train_epochs`: 3
49
  - `weight_decay`: 0.005
50
+ - `gradient_checkpointing`: True
51
+ - `fp16`: True
52
+ - `eval_strategy`: "steps"
53
  - `per_device_eval_batch_size`: 8
54
+ - `predict_with_generate`: True
55
  - `generation_max_length`: 225
56
  - `save_steps`: 200
57
  - `eval_steps`: 200
58
  - `logging_steps`: 25
59
+ - `report_to`: ["tensorboard"]
60
+ - `load_best_model_at_end`: True
61
+ - `metric_for_best_model`: "wer"
62
+ - `greater_is_better`: False
63
+ - `push_to_hub`: True
64
+ - `hub_model_id`: whisper-small-tr
65
+ - `optim`: adamw_torch
66
  - `dataloader_num_workers`: 4
67
+ - `dataloader_pin_memory`: True
68
  - `save_total_limit`: 2
69
 
70
  ## Performance
71
 
72
+ Test set evaluation results:
73
 
74
+ - Word Error Rate (WER): 7.75%
75
+ - Character Error Rate (CER): 1.95%
76
+ - Loss: 0.1321
77
 
78
+ ### Comparison with Base Model
79
 
80
+ For an example audio file (`/content/audio.mp3`):
81
 
82
+ - Base Whisper Model: WER 23.53%, CER 2.82%
83
+ - Fine-Tuned Model: WER 11.76%, CER 2.11%
84
 
85
+ The fine-tuned model shows significant improvement in Turkish ASR performance.
86
 
87
+ ## Usage
 
 
88
 
89
  ```python
90
  from transformers import pipeline
91
  import torch
92
 
 
93
  pipeline = pipeline(
94
  task="automatic-speech-recognition",
95
+ model="emredeveloper/whisper-small-tr",
96
  chunk_length_s=30,
97
  device="cuda" if torch.cuda.is_available() else "cpu",
98
  )
99
 
100
+ audio_file = "path/to/your/audio.flac"
 
101
  text = pipeline(audio_file)["text"]
102
+ print(text)