File size: 4,508 Bytes
cfb5334 9324e72 cfb5334 19fc62a cfb5334 9324e72 cfb5334 9324e72 cfb5334 9324e72 cfb5334 19fc62a cfb5334 19fc62a cfb5334 19fc62a cfb5334 19fc62a cfb5334 19fc62a cfb5334 19fc62a cfb5334 9324e72 cfb5334 9324e72 cfb5334 19fc62a cfb5334 9324e72 cfb5334 9324e72 cfb5334 19fc62a cfb5334 9324e72 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
---
language: tr
license: mit
tags:
- audio
- speech-recognition
- whisper
- turkish
- asr
datasets:
- Codyfederer/tr-full-dataset
model-index:
- name: whisper-small-tr
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
metrics:
- type: wer
value: 7.75
name: Word Error Rate
- type: cer
value: 1.95
name: Character Error Rate
---
# whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
This model is a fine-tuned version of `openai/whisper-small` optimized for Turkish Automatic Speech Recognition (ASR).
## Model Description
Whisper is a pre-trained model for automatic speech recognition and speech translation. This version has been fine-tuned on Turkish audio data to improve performance on Turkish speech recognition tasks.
- **Base Model:** openai/whisper-small
- **Language:** Turkish (tr)
- **Task:** Automatic Speech Recognition
- **Dataset:** Codyfederer/tr-full-dataset
## Training Data
The model uses the `Codyfederer/tr-full-dataset`, consisting of 3,000 Turkish audio-transcription samples, split into 90% training and 10% testing.
## Training Parameters
Training utilized the Hugging Face `Trainer` with the following `Seq2SeqTrainingArguments`:
- `output_dir`: `./whisper-small-tr`
- `per_device_train_batch_size`: 16
- `gradient_accumulation_steps`: 1
- `learning_rate`: 3e-5
- `warmup_steps`: 50
- `num_train_epochs`: 3
- `weight_decay`: 0.005
- `gradient_checkpointing`: True
- `fp16`: True
- `eval_strategy`: "steps"
- `per_device_eval_batch_size`: 8
- `predict_with_generate`: True
- `generation_max_length`: 225
- `save_steps`: 200
- `eval_steps`: 200
- `logging_steps`: 25
- `report_to`: ["tensorboard"]
- `load_best_model_at_end`: True
- `metric_for_best_model`: "wer"
- `greater_is_better`: False
- `push_to_hub`: True
- `hub_model_id`: whisper-small-tr
- `optim`: adamw_torch
- `dataloader_num_workers`: 4
- `dataloader_pin_memory`: True
- `save_total_limit`: 2
## Performance
Test set evaluation results:
- **Word Error Rate (WER):** 7.75%
- **Character Error Rate (CER):** 1.95%
- **Loss:** 0.1321
The fine-tuned model shows significant improvement in Turkish ASR performance compared to the base model.
## Usage
### Basic Usage
```python
from transformers import pipeline
import torch
pipe = pipeline(
task="automatic-speech-recognition",
model="emredeveloper/whisper-small-tr",
chunk_length_s=30,
device="cuda" if torch.cuda.is_available() else "cpu",
)
audio_file = "path/to/your/audio.mp3"
result = pipe(audio_file)
print(result["text"])
```
### Gradio Demo
```python
import gradio as gr
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="emredeveloper/whisper-small-tr"
)
def transcribe(audio):
if audio is None:
return ""
return pipe(audio)["text"]
demo = gr.Interface(
fn=transcribe,
inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
outputs="text",
title="Turkish Speech Recognition",
description="Upload or record Turkish audio to transcribe."
)
demo.launch(share=True)
```
### Advanced Usage
```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import librosa
processor = WhisperProcessor.from_pretrained("emredeveloper/whisper-small-tr")
model = WhisperForConditionalGeneration.from_pretrained("emredeveloper/whisper-small-tr")
audio, sr = librosa.load("audio.mp3", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])
```
## Limitations
- Trained on 3,000 samples, which may limit generalization
- Performance may vary on noisy audio or non-standard dialects
- Best results with clear audio at 16kHz sampling rate
## Citation
```bibtex
@misc{whisper-small-tr,
author = {emredeveloper},
title = {whisper-small-tr: Fine-tuned Whisper Small for Turkish ASR},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/emredeveloper/whisper-small-tr}}
}
```
## Acknowledgments
- Base model: [openai/whisper-small](https://huggingface.co/openai/whisper-small)
- Dataset: [Codyfederer/tr-full-dataset](https://huggingface.co/datasets/Codyfederer/tr-full-dataset)
- Built with [Hugging Face Transformers](https://github.com/huggingface/transformers) |