emredeveloper commited on
Commit
cfb5334
·
verified ·
1 Parent(s): da34152

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -0
README.md ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ```markdown
2
+ ---
3
+ language: en
4
+ license: mit
5
+ model-index:
6
+ - name: whisper-small-tr
7
+ results:
8
+ - task:
9
+ type: automatic-speech-recognition
10
+ name: Automatic Speech Recognition
11
+ metrics:
12
+ - type: wer
13
+ value: 7.75
14
+ name: Word Error Rate
15
+ - type: cer
16
+ value: 1.95
17
+ name: Character Error Rate
18
+ widget:
19
+ - audio: https://huggingface.co/datasets/NgoHoang/Vietnamese_Speech_Recognition/resolve/main/Test/audio/common_voice_vi_24070014.mp3
20
+ ---
21
+
22
+ # whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
23
+
24
+ This model is a fine-tuned version of the `openai/whisper-small` base model by OpenAI, optimized for Turkish Automatic Speech Recognition (ASR).
25
+
26
+ ## Model Description
27
+
28
+ Whisper models are powerful multilingual and multitask models pre-trained on a large variety of audio data. This project aims to significantly enhance the performance of the `whisper-small` model specifically for Turkish, by fine-tuning it on the `Codyfederer/tr-full-dataset` dataset.
29
+
30
+ ## Training Data
31
+
32
+ The model was primarily trained on the Turkish audio and transcription dataset named `Codyfederer/tr-full-dataset`. From this dataset, 3000 samples were selected and split into 90% for training and 10% for testing.
33
+
34
+ ## Training Parameters
35
+
36
+ The training was performed using the Hugging Face `Trainer` class with the following `Seq2SeqTrainingArguments`:
37
+
38
+ - `output_dir`: `./whisper-small-tr`
39
+ - `per_device_train_batch_size`: 16
40
+ - `gradient_accumulation_steps`: 1
41
+ - `learning_rate`: 3e-5
42
+ - `warmup_steps`: 50
43
+ - `num_train_epochs`: 3
44
+ - `weight_decay`: 0.005
45
+ - `gradient_checkpointing`: `True` (For memory optimization)
46
+ - `fp16`: `True` (For faster training)
47
+ - `eval_strategy`: `"steps"`
48
+ - `per_device_eval_batch_size`: 8
49
+ - `predict_with_generate`: `True`
50
+ - `generation_max_length`: 225
51
+ - `save_steps`: 200
52
+ - `eval_steps`: 200
53
+ - `logging_steps`: 25
54
+ - `report_to`: `["tensorboard"]`
55
+ - `load_best_model_at_end`: `True`
56
+ - `metric_for_best_model`: `"wer"` (Lower is better)
57
+ - `greater_is_better`: `False`
58
+ - `push_to_hub`: `True`
59
+ - `hub_model_id`: `whisper-small-tr`
60
+ - `optim`: `adamw_torch`
61
+ - `dataloader_num_workers`: 4
62
+ - `dataloader_pin_memory`: `True`
63
+ - `save_total_limit`: 2
64
+
65
+ ## Performance
66
+
67
+ Evaluation results of the model on the test set:
68
+
69
+ - **Word Error Rate (WER)**: 7.75%
70
+ - **Character Error Rate (CER)**: 1.95%
71
+ - **Loss**: 0.1321
72
+
73
+ #### Comparison with Base Model (on example audio)
74
+
75
+ In a comparison conducted with a new audio file (`/content/audio.mp3`):
76
+
77
+ - **Base Whisper Model**: WER: 23.53% | CER: 2.82%
78
+ - **Fine-Tuned Model**: WER: 11.76% | CER: 2.11%
79
+
80
+ These results demonstrate a significant improvement in the fine-tuned model's performance for the Turkish ASR task compared to the base model.
81
+
82
+ ## How to Use
83
+
84
+ You can easily use this model with the Hugging Face `transformers` library:
85
+
86
+ ```python
87
+ from transformers import pipeline
88
+ import torch
89
+
90
+ # Load the model
91
+ pipeline = pipeline(
92
+ task="automatic-speech-recognition",
93
+ model="emredeveloper/whisper-small-tr", # Your username/repo name
94
+ chunk_length_s=30,
95
+ device="cuda" if torch.cuda.is_available() else "cpu",
96
+ )
97
+
98
+ # Transcribe an audio file
99
+ audio_file = "path/to/your/audio.flac" # Specify the path to your audio file
100
+ text = pipeline(audio_file)["text"]
101
+ print(text)
102
+ ```
103
+
104
+ ### Gradio Demo
105
+
106
+ You can also create a Gradio demo to interactively test the model:
107
+
108
+ ```python
109
+ import gradio as gr
110
+ from transformers import pipeline
111
+ import torch
112
+
113
+ pipeline = pipeline(
114
+ task="automatic-speech-recognition",
115
+ model="emredeveloper/whisper-small-tr", # Your username/repo name
116
+ chunk_length_s=30,
117
+ device="cuda" if torch.cuda.is_available() else "cpu",
118
+ )
119
+
120
+ def transcribe(audio):
121
+ if audio is None:
122
+ return ""
123
+ text = pipeline(audio)["text"]
124
+ return text
125
+
126
+ iface = gr.Interface(
127
+ fn=transcribe,
128
+ inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
129
+ outputs="text",
130
+ title="Fine-Tuned Whisper Turkish Demo",
131
+ description="Record your voice or upload a Turkish audio file to see the model in action.",
132
+ )
133
+
134
+ iface.launch()
135
+ ```