|
|
--- |
|
|
library_name: transformers |
|
|
language: |
|
|
- fa |
|
|
license: apache-2.0 |
|
|
base_model: openai/whisper-tiny |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
datasets: |
|
|
- mozilla-foundation/common_voice_17_0 |
|
|
model-index: |
|
|
- name: Whisper tiny Fa - Common Voice |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
# Whisper tiny Fa - Common Voice |
|
|
|
|
|
This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the Common Voice 17.0 dataset. |
|
|
|
|
|
## Model description |
|
|
|
|
|
Whisper-tiny-fa is an automatic speech recognition model specifically adapted for Persian (Farsi) speech. It builds upon OpenAI’s Whisper-tiny architecture, leveraging transfer learning to specialize in transcribing Persian audio. The model is suitable for converting spoken Persian audio into text, enabling applications such as voice assistants, captioning, and speech-driven user interfaces. |
|
|
|
|
|
- Base model: openai/whisper-tiny |
|
|
- Fine-tuned on: Common Voice 17.0 Persian subset |
|
|
- Languages supported: Persian (Farsi) |
|
|
- Model type: Encoder-decoder transformer (speech-to-text) |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
### Intended uses: |
|
|
|
|
|
Transcribing Persian (Farsi) speech to text from audio files or microphone input. |
|
|
Voice-controlled applications and speech interfaces for Persian speakers. |
|
|
Generating subtitles and closed captions in Persian for audio/video content. |
|
|
|
|
|
### Limitations: |
|
|
|
|
|
The model is fine-tuned for Persian and may perform poorly on other languages. |
|
|
Performance may degrade with low-quality or noisy audio, accents, or dialects not well represented in the training data. |
|
|
Not suitable for real-time applications with strict latency constraints due to model size and processing requirements. |
|
|
|
|
|
## Training and evaluation data |
|
|
|
|
|
Dataset: Common Voice 17.0 (Persian subset) |
|
|
Data split: Training, validation, and test splits provided by Common Voice were used. |
|
|
Preprocessing: Audio files were resampled to 16kHz and normalized. Transcripts were cleaned and normalized to standard Persian orthography. |
|
|
|
|
|
## Training procedure |
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 1e-05 |
|
|
- train_batch_size: 16 |
|
|
- eval_batch_size: 8 |
|
|
- seed: 42 |
|
|
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
|
- lr_scheduler_type: linear |
|
|
- lr_scheduler_warmup_steps: 500 |
|
|
- training_steps: 4000 |
|
|
- mixed_precision_training: Native AMP |
|
|
|
|
|
### Test results |
|
|
|
|
|
- Best test WER (Word Error Rate): 0.915 |
|
|
- Best test CER (Character Error Rate): 0.428 |
|
|
|
|
|
### Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline |
|
|
|
|
|
device = "cuda:0" if torch.cuda.is_available() else "cpu" |
|
|
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32 |
|
|
|
|
|
model_id = "aictsharif/whisper-tiny-fa" |
|
|
model = AutoModelForSpeechSeq2Seq.from_pretrained( |
|
|
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True |
|
|
) |
|
|
model.to(device) |
|
|
|
|
|
processor = AutoProcessor.from_pretrained(model_id) |
|
|
pipe = pipeline( |
|
|
"automatic-speech-recognition", |
|
|
model=model, |
|
|
tokenizer=processor.tokenizer, |
|
|
feature_extractor=processor.feature_extractor, |
|
|
torch_dtype=torch_dtype, |
|
|
device=device, |
|
|
) |
|
|
|
|
|
result = pipe('sample.mp3') |
|
|
print(result["text"]) |
|
|
``` |
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.51.3 |
|
|
- Pytorch 2.5.1+cu124 |
|
|
- Datasets 3.5.0 |
|
|
- Tokenizers 0.21.1 |
|
|
|