---
base_model: openai/whisper-small
library_name: peft
tags:
- whisper
- lora
- transformers
- speech-to-text
- persian
- stt
- fine-tune
- adapter
datasets:
- vhdm/persian-voice-v1
language:
- fa
metrics:
- cer
- wer
---

# Whisper-Small Persian STT — LoRA Fine-Tuned

A fine-tuned version of **openai/whisper-small** for **Persian speech-to-text (ASR)** using **LoRA**.  
This model is optimized for persian conversational speech and dataset-quality audio.

---

## Model Details

### Model Description
This model is a **LoRA fine-tuned Whisper Small** focused on **Persian (fa)** speech recognition.  
It improves transcription accuracy on standard Persian audio segments (16kHz, mono, normalized WAV).

- **Developed by:** *Mehdi Pouladrag*
- **Model type:** Speech-to-Text (ASR) — Whisper Small (Seq2Seq Transformer)
- **Language(s):** Persian (fa)
- **License:** MIT (or your preferred license)
- **Finetuned from:** `openai/whisper-small`
- **Dataset:** `persian-voice-v1` (single dataset)
- **Training technique:** LoRA (Low-Rank Adaptation)

### Model Sources
- **Repository:** https://github.com/Mehdipoladrag/Fine-tuning-Whisper-Model

---

## Uses

### Direct Use
- Convert Persian speech to text
- Subtitle generation for Persian audio
- Conversational ASR
- Podcast / video transcription
- General Persian content recognition

### Downstream Use
- Integrate into ASR pipelines
- Use in real-time Persian voice applications
- Further fine-tuning on custom Persian domains (medical, legal, etc.)

### Out-of-Scope Use
- Non-Persian audio
- Low-quality/noisy multi-speaker overlapping speech
- Misuse for surveillance or unethical monitoring

---

## Bias, Risks, and Limitations
- Whisper may still struggle with dialect-heavy, noisy, or low-quality audio.
- The dataset used is relatively limited (~6099 audio–subtitle pairs), so:
  - Certain accents may be underrepresented.
  - Model may hallucinate or mis-transcribe in rare cases.

### Recommendations
Users should:
- Provide clean 16kHz mono WAV audio
- Use domain-specific fine-tuning if necessary
- Validate outputs before critical use