Whisper Tiny Italian v1
A fine-tuned version of openai/whisper-tiny for Italian automatic speech recognition.
Model Details
- Base model: openai/whisper-tiny (39M parameters)
- Language: Italian
- Task: Transcription
- Fine-tuning: Full fine-tuning (all parameters), no LoRA/adapters
- Hardware: AMD Radeon RX 7900 XTX (24 GB VRAM, ROCm)
Training Data
| Dataset | Samples | Domain |
|---|---|---|
| Common Voice 24 Italian | ~173,000 | Crowd-sourced read speech |
| VoxPopuli Italian | ~22,000 | European Parliament sessions |
| Total | ~195,000 |
Training Hyperparameters
- Effective batch size: 64 (32 × 2 gradient accumulation)
- Learning rate: 1e-5 (cosine schedule)
- Warmup steps: 500
- Total steps: 11,000 (~1.8 epochs)
- Precision: FP16
- Optimizer: AdamW (weight decay 0.01)
Evaluation Results
| Metric | Value |
|---|---|
| WER (Common Voice 24 Italian test) | 26.24% |
| Eval loss | 0.3919 |
Comparison
| Model | WER (CV Italian) |
|---|---|
| openai/whisper-tiny (base, zero-shot) | ~60%+ |
| mattiasu96/whisper-tiny-it | 26.5% |
| This model (v1) | 26.24% |
Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
processor = WhisperProcessor.from_pretrained("gabrielesilinic/whisper-tiny-it-v1")
model = WhisperForConditionalGeneration.from_pretrained("gabrielesilinic/whisper-tiny-it-v1")
# Load audio (16kHz mono)
audio, sr = torchaudio.load("audio.wav")
if sr != 16000:
audio = torchaudio.functional.resample(audio, sr, 16000)
input_features = processor(audio.squeeze().numpy(), sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
With whisper.cpp (GGML)
This model can be converted to GGML format for use with whisper.cpp for fast CPU/GPU inference.
Limitations
- WER is not as good as whisper tiny.en, also italian mixed with english like terms tends to fail at the moment
License
Apache 2.0 (same as the base openai/whisper-tiny model)
- Downloads last month
- 15
Model tree for gabrielesilinic/whisper-tiny-it-v1
Dataset used to train gabrielesilinic/whisper-tiny-it-v1
Evaluation results
- WER on Common Voice 24 Italian (test)test set self-reported26.240