Whisper Medium Fine-Tuned on Custom English Dataset

This model is a fine-tuned version of OpenAI's whisper-medium, optimized for transcribing English speech from a custom dataset.

🛠️ Model Details

Base Model: openai/whisper-medium
Fine-tuned by: Winardi (Research by Ms. Tong Rong)
Language: English (monolingual)
Framework: PyTorch, Hugging Face Transformers

📚 Training Data

The model was fine-tuned on a proprietary/custom audio dataset using metadata(clean1).csv. Corrupted or low-quality audio files were excluded. The data was split as follows:

Training: 80%
Validation: 10%
Testing: 10% (used only for evaluation, not during training)

🎯 Intended Use

This model is intended for automatic speech recognition (ASR) in English, especially for environments similar to the training dataset (e.g., single-speaker, clean audio).

📉 Performance

Metric: Word Error Rate (WER)
WER: 2.07%
WER with Limited Vocalubary: 3.23%

🚫 Limitations

Not robust to heavy background noise or overlapping speech
May not perform well on dialects or accents not represented in training data
Only supports English input

💬 How to Use

from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model="Pengwin30/whisper-medium-fine-tuned")
result = asr("path/to/audio.wav")
print(result["text"])

📜 License

This model is licensed under the MIT License.

🙏 Citation

If you use this model in your work, please cite:

@misc{Pengwin30/whisper-medium-fine-tuned,
  author = {Tong Rong, Winardi},
  title = {Whisper Medium Fine-Tuned on Custom Dataset},
  year = {2025},
  url = {https://huggingface.co/Pengwin30/whisper-medium-fine-tuned}
}

Downloads last month: -

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for Pengwin30/whisper-medium-fine-tuned

Base model

openai/whisper-medium

Finetuned

(772)

this model

Evaluation results

Word Error Rate on Custom Audio Dataset
self-reported

2.07%
Word Error Rate With Limited Vocabulary on Custom Audio Dataset
self-reported

3.23%