whisper-tiny-he / README.md

amitkot

Upload README.md with huggingface_hub

ff42d72 verified 29 days ago

preview code

raw

history blame contribute delete

1.52 kB

metadata

language: he
license: apache-2.0
library_name: transformers
tags:
  - whisper
  - audio
  - automatic-speech-recognition
  - hebrew
datasets:
  - ivrit-ai/whisper-training
base_model: openai/whisper-tiny
pipeline_tag: automatic-speech-recognition

whisper-tiny-he

Hebrew fine-tuned Whisper Tiny for automatic speech recognition.

Training

Base model: openai/whisper-tiny
Dataset: ivrit-ai/whisper-training (~400h Hebrew)
Method: Supervised fine-tuning with Seq2SeqTrainer
Steps: 5,000 (streaming, effective batch size 16)
Hardware: Apple M4 (MPS), fp32
Final eval WER: 0.659 (on 200-sample test split)

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration

processor = WhisperProcessor.from_pretrained("amitkot/whisper-tiny-he")
model = WhisperForConditionalGeneration.from_pretrained("amitkot/whisper-tiny-he")

model.generation_config.language = "he"
model.generation_config.task = "transcribe"

Training pipeline

Trained using whisper-acft-pipeline:

uv run python scripts/finetune.py --config configs/hebrew_tiny_finetune.yaml

amitkot
/

whisper-tiny-he

whisper-tiny-he

Training

Usage

Training pipeline

See also