whisper-small-tr / README.md

Update README.md

9324e72 verified 5 days ago

4.51 kB

	---
	language: tr
	license: mit
	tags:
	- audio
	- speech-recognition
	- whisper
	- turkish
	- asr
	datasets:
	- Codyfederer/tr-full-dataset
	model-index:
	- name: whisper-small-tr
	results:
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	metrics:
	- type: wer
	value: 7.75
	name: Word Error Rate
	- type: cer
	value: 1.95
	name: Character Error Rate
	---

	# whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR

	This model is a fine-tuned version of `openai/whisper-small` optimized for Turkish Automatic Speech Recognition (ASR).

	## Model Description

	Whisper is a pre-trained model for automatic speech recognition and speech translation. This version has been fine-tuned on Turkish audio data to improve performance on Turkish speech recognition tasks.

	- Base Model: openai/whisper-small
	- Language: Turkish (tr)
	- Task: Automatic Speech Recognition
	- Dataset: Codyfederer/tr-full-dataset

	## Training Data

	The model uses the `Codyfederer/tr-full-dataset`, consisting of 3,000 Turkish audio-transcription samples, split into 90% training and 10% testing.

	## Training Parameters

	Training utilized the Hugging Face `Trainer` with the following `Seq2SeqTrainingArguments`:

	- `output_dir`: `./whisper-small-tr`
	- `per_device_train_batch_size`: 16
	- `gradient_accumulation_steps`: 1
	- `learning_rate`: 3e-5
	- `warmup_steps`: 50
	- `num_train_epochs`: 3
	- `weight_decay`: 0.005
	- `gradient_checkpointing`: True
	- `fp16`: True
	- `eval_strategy`: "steps"
	- `per_device_eval_batch_size`: 8
	- `predict_with_generate`: True
	- `generation_max_length`: 225
	- `save_steps`: 200
	- `eval_steps`: 200
	- `logging_steps`: 25
	- `report_to`: ["tensorboard"]
	- `load_best_model_at_end`: True
	- `metric_for_best_model`: "wer"
	- `greater_is_better`: False
	- `push_to_hub`: True
	- `hub_model_id`: whisper-small-tr
	- `optim`: adamw_torch
	- `dataloader_num_workers`: 4
	- `dataloader_pin_memory`: True
	- `save_total_limit`: 2

	## Performance

	Test set evaluation results:

	- Word Error Rate (WER): 7.75%
	- Character Error Rate (CER): 1.95%
	- Loss: 0.1321

	The fine-tuned model shows significant improvement in Turkish ASR performance compared to the base model.

	## Usage

	### Basic Usage
	```python
	from transformers import pipeline
	import torch

	pipe = pipeline(
	task="automatic-speech-recognition",
	model="emredeveloper/whisper-small-tr",
	chunk_length_s=30,
	device="cuda" if torch.cuda.is_available() else "cpu",
	)

	audio_file = "path/to/your/audio.mp3"
	result = pipe(audio_file)
	print(result["text"])
	```

	### Gradio Demo
	```python
	import gradio as gr
	from transformers import pipeline

	pipe = pipeline(
	"automatic-speech-recognition",
	model="emredeveloper/whisper-small-tr"
	)

	def transcribe(audio):
	if audio is None:
	return ""
	return pipe(audio)["text"]

	demo = gr.Interface(
	fn=transcribe,
	inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
	outputs="text",
	title="Turkish Speech Recognition",
	description="Upload or record Turkish audio to transcribe."
	)

	demo.launch(share=True)
	```

	### Advanced Usage
	```python
	from transformers import WhisperProcessor, WhisperForConditionalGeneration
	import torch
	import librosa

	processor = WhisperProcessor.from_pretrained("emredeveloper/whisper-small-tr")
	model = WhisperForConditionalGeneration.from_pretrained("emredeveloper/whisper-small-tr")

	audio, sr = librosa.load("audio.mp3", sr=16000)
	input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

	predicted_ids = model.generate(input_features)
	transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

	print(transcription[0])
	```

	## Limitations

	- Trained on 3,000 samples, which may limit generalization
	- Performance may vary on noisy audio or non-standard dialects
	- Best results with clear audio at 16kHz sampling rate

	## Citation
	```bibtex
	@misc{whisper-small-tr,
	author = {emredeveloper},
	title = {whisper-small-tr: Fine-tuned Whisper Small for Turkish ASR},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/emredeveloper/whisper-small-tr}}
	}
	```

	## Acknowledgments

	- Base model: [openai/whisper-small](https://huggingface.co/openai/whisper-small)
	- Dataset: [Codyfederer/tr-full-dataset](https://huggingface.co/datasets/Codyfederer/tr-full-dataset)
	- Built with [Hugging Face Transformers](https://github.com/huggingface/transformers)