---
language: tr
license: mit
tags:
- audio
- speech-recognition
- whisper
- turkish
- asr
datasets:
- Codyfederer/tr-full-dataset
model-index:
- name: whisper-small-tr
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    metrics:
    - type: wer
      value: 7.75
      name: Word Error Rate
    - type: cer
      value: 1.95
      name: Character Error Rate
---

# whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR

This model is a fine-tuned version of `openai/whisper-small` optimized for Turkish Automatic Speech Recognition (ASR).

## Model Description

Whisper is a pre-trained model for automatic speech recognition and speech translation. This version has been fine-tuned on Turkish audio data to improve performance on Turkish speech recognition tasks.

- **Base Model:** openai/whisper-small
- **Language:** Turkish (tr)
- **Task:** Automatic Speech Recognition
- **Dataset:** Codyfederer/tr-full-dataset

## Training Data

The model uses the `Codyfederer/tr-full-dataset`, consisting of 3,000 Turkish audio-transcription samples, split into 90% training and 10% testing.

## Training Parameters

Training utilized the Hugging Face `Trainer` with the following `Seq2SeqTrainingArguments`:

- `output_dir`: `./whisper-small-tr`
- `per_device_train_batch_size`: 16
- `gradient_accumulation_steps`: 1
- `learning_rate`: 3e-5
- `warmup_steps`: 50
- `num_train_epochs`: 3
- `weight_decay`: 0.005
- `gradient_checkpointing`: True
- `fp16`: True
- `eval_strategy`: "steps"
- `per_device_eval_batch_size`: 8
- `predict_with_generate`: True
- `generation_max_length`: 225
- `save_steps`: 200
- `eval_steps`: 200
- `logging_steps`: 25
- `report_to`: ["tensorboard"]
- `load_best_model_at_end`: True
- `metric_for_best_model`: "wer"
- `greater_is_better`: False
- `push_to_hub`: True
- `hub_model_id`: whisper-small-tr
- `optim`: adamw_torch
- `dataloader_num_workers`: 4
- `dataloader_pin_memory`: True
- `save_total_limit`: 2

## Performance

Test set evaluation results:

- **Word Error Rate (WER):** 7.75%
- **Character Error Rate (CER):** 1.95%
- **Loss:** 0.1321

The fine-tuned model shows significant improvement in Turkish ASR performance compared to the base model.

## Usage

### Basic Usage
```python
from transformers import pipeline
import torch

pipe = pipeline(
    task="automatic-speech-recognition",
    model="emredeveloper/whisper-small-tr",
    chunk_length_s=30,
    device="cuda" if torch.cuda.is_available() else "cpu",
)

audio_file = "path/to/your/audio.mp3"
result = pipe(audio_file)
print(result["text"])
```

### Gradio Demo
```python
import gradio as gr
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="emredeveloper/whisper-small-tr"
)

def transcribe(audio):
    if audio is None:
        return ""
    return pipe(audio)["text"]

demo = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
    outputs="text",
    title="Turkish Speech Recognition",
    description="Upload or record Turkish audio to transcribe."
)

demo.launch(share=True)
```

### Advanced Usage
```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import librosa

processor = WhisperProcessor.from_pretrained("emredeveloper/whisper-small-tr")
model = WhisperForConditionalGeneration.from_pretrained("emredeveloper/whisper-small-tr")

audio, sr = librosa.load("audio.mp3", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

print(transcription[0])
```

## Limitations

- Trained on 3,000 samples, which may limit generalization
- Performance may vary on noisy audio or non-standard dialects
- Best results with clear audio at 16kHz sampling rate

## Citation
```bibtex
@misc{whisper-small-tr,
  author = {emredeveloper},
  title = {whisper-small-tr: Fine-tuned Whisper Small for Turkish ASR},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/emredeveloper/whisper-small-tr}}
}
```

## Acknowledgments

- Base model: [openai/whisper-small](https://huggingface.co/openai/whisper-small)
- Dataset: [Codyfederer/tr-full-dataset](https://huggingface.co/datasets/Codyfederer/tr-full-dataset)
- Built with [Hugging Face Transformers](https://github.com/huggingface/transformers)