--- language: tr license: mit tags: - audio - speech-recognition - whisper - turkish - asr datasets: - Codyfederer/tr-full-dataset model-index: - name: whisper-small-tr results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition metrics: - type: wer value: 7.75 name: Word Error Rate - type: cer value: 1.95 name: Character Error Rate --- # whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR This model is a fine-tuned version of `openai/whisper-small` optimized for Turkish Automatic Speech Recognition (ASR). ## Model Description Whisper is a pre-trained model for automatic speech recognition and speech translation. This version has been fine-tuned on Turkish audio data to improve performance on Turkish speech recognition tasks. - **Base Model:** openai/whisper-small - **Language:** Turkish (tr) - **Task:** Automatic Speech Recognition - **Dataset:** Codyfederer/tr-full-dataset ## Training Data The model uses the `Codyfederer/tr-full-dataset`, consisting of 3,000 Turkish audio-transcription samples, split into 90% training and 10% testing. ## Training Parameters Training utilized the Hugging Face `Trainer` with the following `Seq2SeqTrainingArguments`: - `output_dir`: `./whisper-small-tr` - `per_device_train_batch_size`: 16 - `gradient_accumulation_steps`: 1 - `learning_rate`: 3e-5 - `warmup_steps`: 50 - `num_train_epochs`: 3 - `weight_decay`: 0.005 - `gradient_checkpointing`: True - `fp16`: True - `eval_strategy`: "steps" - `per_device_eval_batch_size`: 8 - `predict_with_generate`: True - `generation_max_length`: 225 - `save_steps`: 200 - `eval_steps`: 200 - `logging_steps`: 25 - `report_to`: ["tensorboard"] - `load_best_model_at_end`: True - `metric_for_best_model`: "wer" - `greater_is_better`: False - `push_to_hub`: True - `hub_model_id`: whisper-small-tr - `optim`: adamw_torch - `dataloader_num_workers`: 4 - `dataloader_pin_memory`: True - `save_total_limit`: 2 ## Performance Test set evaluation results: - **Word Error Rate (WER):** 7.75% - **Character Error Rate (CER):** 1.95% - **Loss:** 0.1321 The fine-tuned model shows significant improvement in Turkish ASR performance compared to the base model. ## Usage ### Basic Usage ```python from transformers import pipeline import torch pipe = pipeline( task="automatic-speech-recognition", model="emredeveloper/whisper-small-tr", chunk_length_s=30, device="cuda" if torch.cuda.is_available() else "cpu", ) audio_file = "path/to/your/audio.mp3" result = pipe(audio_file) print(result["text"]) ``` ### Gradio Demo ```python import gradio as gr from transformers import pipeline pipe = pipeline( "automatic-speech-recognition", model="emredeveloper/whisper-small-tr" ) def transcribe(audio): if audio is None: return "" return pipe(audio)["text"] demo = gr.Interface( fn=transcribe, inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"), outputs="text", title="Turkish Speech Recognition", description="Upload or record Turkish audio to transcribe." ) demo.launch(share=True) ``` ### Advanced Usage ```python from transformers import WhisperProcessor, WhisperForConditionalGeneration import torch import librosa processor = WhisperProcessor.from_pretrained("emredeveloper/whisper-small-tr") model = WhisperForConditionalGeneration.from_pretrained("emredeveloper/whisper-small-tr") audio, sr = librosa.load("audio.mp3", sr=16000) input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features predicted_ids = model.generate(input_features) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) print(transcription[0]) ``` ## Limitations - Trained on 3,000 samples, which may limit generalization - Performance may vary on noisy audio or non-standard dialects - Best results with clear audio at 16kHz sampling rate ## Citation ```bibtex @misc{whisper-small-tr, author = {emredeveloper}, title = {whisper-small-tr: Fine-tuned Whisper Small for Turkish ASR}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/emredeveloper/whisper-small-tr}} } ``` ## Acknowledgments - Base model: [openai/whisper-small](https://huggingface.co/openai/whisper-small) - Dataset: [Codyfederer/tr-full-dataset](https://huggingface.co/datasets/Codyfederer/tr-full-dataset) - Built with [Hugging Face Transformers](https://github.com/huggingface/transformers)