Model Card for Vardis/Whisper-Small-Greek

This model is a fine-tuned version of OpenAI's Whisper-small model for Greek speech recognition. It has been trained on multiple Greek speech datasets and evaluated using WER (Word Error Rate) and CER (Character Error Rate).

Model Details

Model Description

This model is a Whisper-small ASR model fine-tuned for Greek language transcription. It supports automatic speech recognition for general Greek audio data and can be integrated into downstream applications requiring Greek speech-to-text capabilities.

Developed by: Vardis Georgilas
Model type: Automatic Speech Recognition (ASR)
Language(s): Greek (el)
Finetuned from model: openai/whisper-small

Training Details

Training Data

Vardis/Greek_Mosel
Mozilla Common Voice 11.0 (Greek)
Google Fleurs (Greek)

Training Procedure

Fine-tuned from openai/whisper-small.

Speeds, Sizes, Times

Training Duration: ~5h50m (2000 steps)
Hardware: GPU T4 x2

Evaluation

Metrics

Word Error Rate (WER): Measures the number of word errors per 100 words
Character Error Rate (CER): Measures the number of character errors per 100 characters

Results

Step	Training Loss	Validation Loss	WER	CER
250	0.4321	0.4399	32.01%	13.72%
500	0.3840	0.4022	29.44%	12.04%
750	0.3437	0.3826	28.92%	11.66%
1000	0.3272	0.3722	28.25%	11.59%
1250	0.3182	0.3650	27.57%	11.44%
1500	0.2932	0.3613	27.67%	11.64%
1750	0.2654	0.3592	27.20%	11.27%
2000	0.2747	0.3581	26.99%	11.10%

On the test dataset:

WER: 26.54
CER: 11.32

How to Use

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load base model and Greek fine-tuned LoRA weights
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small").to(device)
model = PeftModel.from_pretrained(base_model, "Vardis/Whisper-Small-Greek").to(device)
processor = WhisperProcessor.from_pretrained("Vardis/Whisper-Small-Greek")

# Load your audio waveform (e.g., using librosa or torchaudio)
audio_input = ...  

# Generate transcription
inputs = processor(audio_input, return_tensors="pt").input_features.to(device)
predicted_ids = model.generate(inputs)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

print(transcription)

Context / Reference

This model was developed as part of the work described in:

Georgilas, V., Stafylakis, T. (2025). Automatic Speech Recognition for Greek Medical Dictation.
The paper focuses on Greek medical ASR research in general and is not primarily about the model itself, but provides context for its development. Users are welcome to use the model freely for research and practical applications.

BibTeX citation:

@misc{georgilas2025greekasr,
  title={Automatic Speech Recognition for Greek Medical Dictation},
  author={Vardis Georgilas and Themos Stafylakis},
  year={2025},
  note={Available at: https://www.arxiv.org/abs/2509.23550}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Vardis/Whisper-Small-Greek

Base model

openai/whisper-small

Finetuned

(3490)

this model

Datasets used to train Vardis/Whisper-Small-Greek

Space using Vardis/Whisper-Small-Greek 1

Paper for Vardis/Whisper-Small-Greek

Automatic Speech Recognition for Greek Medical Dictation

Paper • 2509.23550 • Published Sep 28, 2025