DevMehdip
/

whisper-small-fa-lora

Model card Files Files and versions

whisper-small-fa-lora / README.md

DevMehdip's picture

Update README.md

ac1fb3f verified about 1 month ago

|

history blame contribute delete

2.11 kB

	---
	base_model: openai/whisper-small
	library_name: peft
	tags:
	- whisper
	- lora
	- transformers
	- speech-to-text
	- persian
	- stt
	- fine-tune
	- adapter
	datasets:
	- vhdm/persian-voice-v1
	language:
	- fa
	metrics:
	- cer
	- wer
	---

	# Whisper-Small Persian STT — LoRA Fine-Tuned

	A fine-tuned version of openai/whisper-small for Persian speech-to-text (ASR) using LoRA.
	This model is optimized for persian conversational speech and dataset-quality audio.

	---

	## Model Details

	### Model Description
	This model is a LoRA fine-tuned Whisper Small focused on Persian (fa) speech recognition.
	It improves transcription accuracy on standard Persian audio segments (16kHz, mono, normalized WAV).

	- Developed by: Mehdi Pouladrag
	- Model type: Speech-to-Text (ASR) — Whisper Small (Seq2Seq Transformer)
	- Language(s): Persian (fa)
	- License: MIT (or your preferred license)
	- Finetuned from: `openai/whisper-small`
	- Dataset: `persian-voice-v1` (single dataset)
	- Training technique: LoRA (Low-Rank Adaptation)

	### Model Sources
	- Repository: https://github.com/Mehdipoladrag/Fine-tuning-Whisper-Model

	---

	## Uses

	### Direct Use
	- Convert Persian speech to text
	- Subtitle generation for Persian audio
	- Conversational ASR
	- Podcast / video transcription
	- General Persian content recognition

	### Downstream Use
	- Integrate into ASR pipelines
	- Use in real-time Persian voice applications
	- Further fine-tuning on custom Persian domains (medical, legal, etc.)

	### Out-of-Scope Use
	- Non-Persian audio
	- Low-quality/noisy multi-speaker overlapping speech
	- Misuse for surveillance or unethical monitoring

	---

	## Bias, Risks, and Limitations
	- Whisper may still struggle with dialect-heavy, noisy, or low-quality audio.
	- The dataset used is relatively limited (~6099 audio–subtitle pairs), so:
	- Certain accents may be underrepresented.
	- Model may hallucinate or mis-transcribe in rare cases.

	### Recommendations
	Users should:
	- Provide clean 16kHz mono WAV audio
	- Use domain-specific fine-tuning if necessary
	- Validate outputs before critical use