emredeveloper
/

whisper-small-tr

@@ -1,7 +1,14 @@
-```markdown
 ---
 language: en
 license: mit
 model-index:
 - name: whisper-small-tr
   results:
@@ -15,25 +22,23 @@ model-index:
     - type: cer
       value: 1.95
       name: Character Error Rate
-widget:
-- audio: https://huggingface.co/datasets/NgoHoang/Vietnamese_Speech_Recognition/resolve/main/Test/audio/common_voice_vi_24070014.mp3
 ---
 # whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
-This model is a fine-tuned version of the `openai/whisper-small` base model by OpenAI, optimized for Turkish Automatic Speech Recognition (ASR).
 ## Model Description
-Whisper models are powerful multilingual and multitask models pre-trained on a large variety of audio data. This project aims to significantly enhance the performance of the `whisper-small` model specifically for Turkish, by fine-tuning it on the `Codyfederer/tr-full-dataset` dataset.
 ## Training Data
-The model was primarily trained on the Turkish audio and transcription dataset named `Codyfederer/tr-full-dataset`. From this dataset, 3000 samples were selected and split into 90% for training and 10% for testing.
 ## Training Parameters
-The training was performed using the Hugging Face `Trainer` class with the following `Seq2SeqTrainingArguments`:
 - `output_dir`: `./whisper-small-tr`
 - `per_device_train_batch_size`: 16
@@ -42,94 +47,56 @@ The training was performed using the Hugging Face `Trainer` class with the follo
 - `warmup_steps`: 50
 - `num_train_epochs`: 3
 - `weight_decay`: 0.005
-- `gradient_checkpointing`: `True` (For memory optimization)
-- `fp16`: `True` (For faster training)
-- `eval_strategy`: `"steps"`
 - `per_device_eval_batch_size`: 8
-- `predict_with_generate`: `True`
 - `generation_max_length`: 225
 - `save_steps`: 200
 - `eval_steps`: 200
 - `logging_steps`: 25
-- `report_to`: `["tensorboard"]`
-- `load_best_model_at_end`: `True`
-- `metric_for_best_model`: `"wer"` (Lower is better)
-- `greater_is_better`: `False`
-- `push_to_hub`: `True`
-- `hub_model_id`: `whisper-small-tr`
-- `optim`: `adamw_torch`
 - `dataloader_num_workers`: 4
-- `dataloader_pin_memory`: `True`
 - `save_total_limit`: 2
 ## Performance
-Evaluation results of the model on the test set:
-- **Word Error Rate (WER)**: 7.75%
-- **Character Error Rate (CER)**: 1.95%
-- **Loss**: 0.1321
-#### Comparison with Base Model (on example audio)
-In a comparison conducted with a new audio file (`/content/audio.mp3`):
-- **Base Whisper Model**: WER: 23.53% | CER: 2.82%
-- **Fine-Tuned Model**: WER: 11.76% | CER: 2.11%
-These results demonstrate a significant improvement in the fine-tuned model's performance for the Turkish ASR task compared to the base model.
-## How to Use
-You can easily use this model with the Hugging Face `transformers` library:
 ```python
 from transformers import pipeline
 import torch
-# Load the model
 pipeline = pipeline(
     task="automatic-speech-recognition",
-    model="emredeveloper/whisper-small-tr", # Your username/repo name
     chunk_length_s=30,
     device="cuda" if torch.cuda.is_available() else "cpu",
 )
-# Transcribe an audio file
-audio_file = "path/to/your/audio.flac" # Specify the path to your audio file
 text = pipeline(audio_file)["text"]
-print(text)
-```
-### Gradio Demo
-You can also create a Gradio demo to interactively test the model:
-```python
-import gradio as gr
-from transformers import pipeline
-import torch
-pipeline = pipeline(
-    task="automatic-speech-recognition",
-    model="emredeveloper/whisper-small-tr", # Your username/repo name
-    chunk_length_s=30,
-    device="cuda" if torch.cuda.is_available() else "cpu",
-)
-def transcribe(audio):
-    if audio is None:
-        return ""
-    text = pipeline(audio)["text"]
-    return text
-iface = gr.Interface(
-    fn=transcribe,
-    inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
-    outputs="text",
-    title="Fine-Tuned Whisper Turkish Demo",
-    description="Record your voice or upload a Turkish audio file to see the model in action.",
-)
-iface.launch()
-```

 ---
 language: en
 license: mit
+tags:
+- audio
+- speech-recognition
+- whisper
+- turkish
+- asr
+datasets:
+- Codyfederer/tr-full-dataset
 model-index:
 - name: whisper-small-tr
   results:
     - type: cer
       value: 1.95
       name: Character Error Rate
 ---
 # whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
+This model is a fine-tuned version of the `openai/whisper-small` base model, optimized for Turkish Automatic Speech Recognition (ASR).
 ## Model Description
+Whisper models are multilingual and multitask models pre-trained on diverse audio data. This project fine-tunes the `whisper-small` model on the `Codyfederer/tr-full-dataset` to improve Turkish ASR performance.
 ## Training Data
+The model uses the `Codyfederer/tr-full-dataset`, consisting of 3000 Turkish audio-transcription samples, split into 90% training and 10% testing.
 ## Training Parameters
+Training utilized the Hugging Face `Trainer` with the following `Seq2SeqTrainingArguments`:
 - `output_dir`: `./whisper-small-tr`
 - `per_device_train_batch_size`: 16
 - `warmup_steps`: 50
 - `num_train_epochs`: 3
 - `weight_decay`: 0.005
+- `gradient_checkpointing`: True
+- `fp16`: True
+- `eval_strategy`: "steps"
 - `per_device_eval_batch_size`: 8
+- `predict_with_generate`: True
 - `generation_max_length`: 225
 - `save_steps`: 200
 - `eval_steps`: 200
 - `logging_steps`: 25
+- `report_to`: ["tensorboard"]
+- `load_best_model_at_end`: True
+- `metric_for_best_model`: "wer"
+- `greater_is_better`: False
+- `push_to_hub`: True
+- `hub_model_id`: whisper-small-tr
+- `optim`: adamw_torch
 - `dataloader_num_workers`: 4
+- `dataloader_pin_memory`: True
 - `save_total_limit`: 2
 ## Performance
+Test set evaluation results:
+- Word Error Rate (WER): 7.75%
+- Character Error Rate (CER): 1.95%
+- Loss: 0.1321
+### Comparison with Base Model
+For an example audio file (`/content/audio.mp3`):
+- Base Whisper Model: WER 23.53%, CER 2.82%
+- Fine-Tuned Model: WER 11.76%, CER 2.11%
+The fine-tuned model shows significant improvement in Turkish ASR performance.
+## Usage
 ```python
 from transformers import pipeline
 import torch
 pipeline = pipeline(
     task="automatic-speech-recognition",
+    model="emredeveloper/whisper-small-tr",
     chunk_length_s=30,
     device="cuda" if torch.cuda.is_available() else "cpu",
 )
+audio_file = "path/to/your/audio.flac"
 text = pipeline(audio_file)["text"]
+print(text)