emredeveloper
/

whisper-small-tr

+```markdown
+---
+language: en
+license: mit
+model-index:
+- name: whisper-small-tr
+  results:
+  - task:
+      type: automatic-speech-recognition
+      name: Automatic Speech Recognition
+    metrics:
+    - type: wer
+      value: 7.75
+      name: Word Error Rate
+    - type: cer
+      value: 1.95
+      name: Character Error Rate
+widget:
+- audio: https://huggingface.co/datasets/NgoHoang/Vietnamese_Speech_Recognition/resolve/main/Test/audio/common_voice_vi_24070014.mp3
+---
+# whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
+This model is a fine-tuned version of the `openai/whisper-small` base model by OpenAI, optimized for Turkish Automatic Speech Recognition (ASR).
+## Model Description
+Whisper models are powerful multilingual and multitask models pre-trained on a large variety of audio data. This project aims to significantly enhance the performance of the `whisper-small` model specifically for Turkish, by fine-tuning it on the `Codyfederer/tr-full-dataset` dataset.
+## Training Data
+The model was primarily trained on the Turkish audio and transcription dataset named `Codyfederer/tr-full-dataset`. From this dataset, 3000 samples were selected and split into 90% for training and 10% for testing.
+## Training Parameters
+The training was performed using the Hugging Face `Trainer` class with the following `Seq2SeqTrainingArguments`:
+- `output_dir`: `./whisper-small-tr`
+- `per_device_train_batch_size`: 16
+- `gradient_accumulation_steps`: 1
+- `learning_rate`: 3e-5
+- `warmup_steps`: 50
+- `num_train_epochs`: 3
+- `weight_decay`: 0.005
+- `gradient_checkpointing`: `True` (For memory optimization)
+- `fp16`: `True` (For faster training)
+- `eval_strategy`: `"steps"`
+- `per_device_eval_batch_size`: 8
+- `predict_with_generate`: `True`
+- `generation_max_length`: 225
+- `save_steps`: 200
+- `eval_steps`: 200
+- `logging_steps`: 25
+- `report_to`: `["tensorboard"]`
+- `load_best_model_at_end`: `True`
+- `metric_for_best_model`: `"wer"` (Lower is better)
+- `greater_is_better`: `False`
+- `push_to_hub`: `True`
+- `hub_model_id`: `whisper-small-tr`
+- `optim`: `adamw_torch`
+- `dataloader_num_workers`: 4
+- `dataloader_pin_memory`: `True`
+- `save_total_limit`: 2
+## Performance
+Evaluation results of the model on the test set:
+- **Word Error Rate (WER)**: 7.75%
+- **Character Error Rate (CER)**: 1.95%
+- **Loss**: 0.1321
+#### Comparison with Base Model (on example audio)
+In a comparison conducted with a new audio file (`/content/audio.mp3`):
+- **Base Whisper Model**: WER: 23.53% | CER: 2.82%
+- **Fine-Tuned Model**: WER: 11.76% | CER: 2.11%
+These results demonstrate a significant improvement in the fine-tuned model's performance for the Turkish ASR task compared to the base model.
+## How to Use
+You can easily use this model with the Hugging Face `transformers` library:
+```python
+from transformers import pipeline
+import torch
+# Load the model
+pipeline = pipeline(
+    task="automatic-speech-recognition",
+    model="emredeveloper/whisper-small-tr", # Your username/repo name
+    chunk_length_s=30,
+    device="cuda" if torch.cuda.is_available() else "cpu",
+)
+# Transcribe an audio file
+audio_file = "path/to/your/audio.flac" # Specify the path to your audio file
+text = pipeline(audio_file)["text"]
+print(text)
+```
+### Gradio Demo
+You can also create a Gradio demo to interactively test the model:
+```python
+import gradio as gr
+from transformers import pipeline
+import torch
+pipeline = pipeline(
+    task="automatic-speech-recognition",
+    model="emredeveloper/whisper-small-tr", # Your username/repo name
+    chunk_length_s=30,
+    device="cuda" if torch.cuda.is_available() else "cpu",
+)
+def transcribe(audio):
+    if audio is None:
+        return ""
+    text = pipeline(audio)["text"]
+    return text
+iface = gr.Interface(
+    fn=transcribe,
+    inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
+    outputs="text",
+    title="Fine-Tuned Whisper Turkish Demo",
+    description="Record your voice or upload a Turkish audio file to see the model in action.",
+)
+iface.launch()
+```