whisper-small-tr / README.md
emredeveloper's picture
Update README.md
9324e72 verified
---
language: tr
license: mit
tags:
- audio
- speech-recognition
- whisper
- turkish
- asr
datasets:
- Codyfederer/tr-full-dataset
model-index:
- name: whisper-small-tr
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
metrics:
- type: wer
value: 7.75
name: Word Error Rate
- type: cer
value: 1.95
name: Character Error Rate
---
# whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
This model is a fine-tuned version of `openai/whisper-small` optimized for Turkish Automatic Speech Recognition (ASR).
## Model Description
Whisper is a pre-trained model for automatic speech recognition and speech translation. This version has been fine-tuned on Turkish audio data to improve performance on Turkish speech recognition tasks.
- **Base Model:** openai/whisper-small
- **Language:** Turkish (tr)
- **Task:** Automatic Speech Recognition
- **Dataset:** Codyfederer/tr-full-dataset
## Training Data
The model uses the `Codyfederer/tr-full-dataset`, consisting of 3,000 Turkish audio-transcription samples, split into 90% training and 10% testing.
## Training Parameters
Training utilized the Hugging Face `Trainer` with the following `Seq2SeqTrainingArguments`:
- `output_dir`: `./whisper-small-tr`
- `per_device_train_batch_size`: 16
- `gradient_accumulation_steps`: 1
- `learning_rate`: 3e-5
- `warmup_steps`: 50
- `num_train_epochs`: 3
- `weight_decay`: 0.005
- `gradient_checkpointing`: True
- `fp16`: True
- `eval_strategy`: "steps"
- `per_device_eval_batch_size`: 8
- `predict_with_generate`: True
- `generation_max_length`: 225
- `save_steps`: 200
- `eval_steps`: 200
- `logging_steps`: 25
- `report_to`: ["tensorboard"]
- `load_best_model_at_end`: True
- `metric_for_best_model`: "wer"
- `greater_is_better`: False
- `push_to_hub`: True
- `hub_model_id`: whisper-small-tr
- `optim`: adamw_torch
- `dataloader_num_workers`: 4
- `dataloader_pin_memory`: True
- `save_total_limit`: 2
## Performance
Test set evaluation results:
- **Word Error Rate (WER):** 7.75%
- **Character Error Rate (CER):** 1.95%
- **Loss:** 0.1321
The fine-tuned model shows significant improvement in Turkish ASR performance compared to the base model.
## Usage
### Basic Usage
```python
from transformers import pipeline
import torch
pipe = pipeline(
task="automatic-speech-recognition",
model="emredeveloper/whisper-small-tr",
chunk_length_s=30,
device="cuda" if torch.cuda.is_available() else "cpu",
)
audio_file = "path/to/your/audio.mp3"
result = pipe(audio_file)
print(result["text"])
```
### Gradio Demo
```python
import gradio as gr
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="emredeveloper/whisper-small-tr"
)
def transcribe(audio):
if audio is None:
return ""
return pipe(audio)["text"]
demo = gr.Interface(
fn=transcribe,
inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
outputs="text",
title="Turkish Speech Recognition",
description="Upload or record Turkish audio to transcribe."
)
demo.launch(share=True)
```
### Advanced Usage
```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import librosa
processor = WhisperProcessor.from_pretrained("emredeveloper/whisper-small-tr")
model = WhisperForConditionalGeneration.from_pretrained("emredeveloper/whisper-small-tr")
audio, sr = librosa.load("audio.mp3", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])
```
## Limitations
- Trained on 3,000 samples, which may limit generalization
- Performance may vary on noisy audio or non-standard dialects
- Best results with clear audio at 16kHz sampling rate
## Citation
```bibtex
@misc{whisper-small-tr,
author = {emredeveloper},
title = {whisper-small-tr: Fine-tuned Whisper Small for Turkish ASR},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/emredeveloper/whisper-small-tr}}
}
```
## Acknowledgments
- Base model: [openai/whisper-small](https://huggingface.co/openai/whisper-small)
- Dataset: [Codyfederer/tr-full-dataset](https://huggingface.co/datasets/Codyfederer/tr-full-dataset)
- Built with [Hugging Face Transformers](https://github.com/huggingface/transformers)