amitkot commited on
Commit
023bbed
·
verified ·
1 Parent(s): 95bdf61

Add model card

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: he
3
+ license: apache-2.0
4
+ library_name: transformers
5
+ tags:
6
+ - whisper
7
+ - audio
8
+ - automatic-speech-recognition
9
+ - hebrew
10
+ datasets:
11
+ - ivrit-ai/whisper-training
12
+ base_model: openai/whisper-small
13
+ pipeline_tag: automatic-speech-recognition
14
+ ---
15
+
16
+ # whisper-small-he
17
+
18
+ Hebrew fine-tuned [Whisper Small](https://huggingface.co/openai/whisper-small) for automatic speech recognition.
19
+
20
+ ## Training
21
+
22
+ - **Base model**: [openai/whisper-small](https://huggingface.co/openai/whisper-small)
23
+ - **Dataset**: [ivrit-ai/whisper-training](https://huggingface.co/datasets/ivrit-ai/whisper-training) (~400h Hebrew)
24
+ - **Method**: Supervised fine-tuning with `Seq2SeqTrainer`
25
+ - **Steps**: 5,000 (streaming, effective batch size 16)
26
+ - **Hardware**: Apple M4 (MPS), fp32
27
+ - **Best eval WER**: 0.368 (on 200-sample test split, step 4000)
28
+
29
+ ## Usage
30
+
31
+ ```python
32
+ from transformers import WhisperProcessor, WhisperForConditionalGeneration
33
+
34
+ processor = WhisperProcessor.from_pretrained("amitkot/whisper-small-he")
35
+ model = WhisperForConditionalGeneration.from_pretrained("amitkot/whisper-small-he")
36
+
37
+ model.generation_config.language = "he"
38
+ model.generation_config.task = "transcribe"
39
+ ```
40
+
41
+ ## Training pipeline
42
+
43
+ Trained using [whisper-acft-pipeline](https://github.com/amitkot/whisper-acft-pipeline):
44
+
45
+ ```bash
46
+ uv run python scripts/finetune.py --config configs/hebrew_small_finetune.yaml
47
+ ```
48
+
49
+ ## See also
50
+
51
+ - [amitkot/whisper-small-he-acft](https://huggingface.co/amitkot/whisper-small-he-acft) — ACFT-optimized version of this model for short audio (FUTO Keyboard)
52
+ - [amitkot/whisper-tiny-he](https://huggingface.co/amitkot/whisper-tiny-he) — Smaller/faster variant