--- base_model: openai/whisper-small library_name: peft tags: - whisper - lora - transformers - speech-to-text - persian - stt - fine-tune - adapter datasets: - vhdm/persian-voice-v1 language: - fa metrics: - cer - wer --- # Whisper-Small Persian STT — LoRA Fine-Tuned A fine-tuned version of **openai/whisper-small** for **Persian speech-to-text (ASR)** using **LoRA**. This model is optimized for persian conversational speech and dataset-quality audio. --- ## Model Details ### Model Description This model is a **LoRA fine-tuned Whisper Small** focused on **Persian (fa)** speech recognition. It improves transcription accuracy on standard Persian audio segments (16kHz, mono, normalized WAV). - **Developed by:** *Mehdi Pouladrag* - **Model type:** Speech-to-Text (ASR) — Whisper Small (Seq2Seq Transformer) - **Language(s):** Persian (fa) - **License:** MIT (or your preferred license) - **Finetuned from:** `openai/whisper-small` - **Dataset:** `persian-voice-v1` (single dataset) - **Training technique:** LoRA (Low-Rank Adaptation) ### Model Sources - **Repository:** https://github.com/Mehdipoladrag/Fine-tuning-Whisper-Model --- ## Uses ### Direct Use - Convert Persian speech to text - Subtitle generation for Persian audio - Conversational ASR - Podcast / video transcription - General Persian content recognition ### Downstream Use - Integrate into ASR pipelines - Use in real-time Persian voice applications - Further fine-tuning on custom Persian domains (medical, legal, etc.) ### Out-of-Scope Use - Non-Persian audio - Low-quality/noisy multi-speaker overlapping speech - Misuse for surveillance or unethical monitoring --- ## Bias, Risks, and Limitations - Whisper may still struggle with dialect-heavy, noisy, or low-quality audio. - The dataset used is relatively limited (~6099 audio–subtitle pairs), so: - Certain accents may be underrepresented. - Model may hallucinate or mis-transcribe in rare cases. ### Recommendations Users should: - Provide clean 16kHz mono WAV audio - Use domain-specific fine-tuning if necessary - Validate outputs before critical use