Whisper Tiny Italian v1

A fine-tuned version of openai/whisper-tiny for Italian automatic speech recognition.

Model Details

  • Base model: openai/whisper-tiny (39M parameters)
  • Language: Italian
  • Task: Transcription
  • Fine-tuning: Full fine-tuning (all parameters), no LoRA/adapters
  • Hardware: AMD Radeon RX 7900 XTX (24 GB VRAM, ROCm)

Training Data

Dataset Samples Domain
Common Voice 24 Italian ~173,000 Crowd-sourced read speech
VoxPopuli Italian ~22,000 European Parliament sessions
Total ~195,000

Training Hyperparameters

  • Effective batch size: 64 (32 × 2 gradient accumulation)
  • Learning rate: 1e-5 (cosine schedule)
  • Warmup steps: 500
  • Total steps: 11,000 (~1.8 epochs)
  • Precision: FP16
  • Optimizer: AdamW (weight decay 0.01)

Evaluation Results

Metric Value
WER (Common Voice 24 Italian test) 26.24%
Eval loss 0.3919

Comparison

Model WER (CV Italian)
openai/whisper-tiny (base, zero-shot) ~60%+
mattiasu96/whisper-tiny-it 26.5%
This model (v1) 26.24%

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio

processor = WhisperProcessor.from_pretrained("gabrielesilinic/whisper-tiny-it-v1")
model = WhisperForConditionalGeneration.from_pretrained("gabrielesilinic/whisper-tiny-it-v1")

# Load audio (16kHz mono)
audio, sr = torchaudio.load("audio.wav")
if sr != 16000:
    audio = torchaudio.functional.resample(audio, sr, 16000)

input_features = processor(audio.squeeze().numpy(), sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

With whisper.cpp (GGML)

This model can be converted to GGML format for use with whisper.cpp for fast CPU/GPU inference.

Limitations

  • WER is not as good as whisper tiny.en, also italian mixed with english like terms tends to fail at the moment

License

Apache 2.0 (same as the base openai/whisper-tiny model)

Downloads last month
15
Safetensors
Model size
37.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gabrielesilinic/whisper-tiny-it-v1

Finetuned
(1700)
this model
Finetunes
1 model

Dataset used to train gabrielesilinic/whisper-tiny-it-v1

Evaluation results