Bruno7
/

ksa-whisper-model

Automatic Speech Recognition

transformers- peft

Model card Files Files and versions

ksa-whisper-model / README.md

Bruno7's picture

Upload ASR model - Documentation

98c22a8 verified 7 months ago

|

history blame contribute delete

2.02 kB

	---
	language: en
	license: apache-2.0
	tags:
	- automatic-speech-recognition
	- speech
	- audio
	- transformers- peft
	- lora
	- adapter

	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	---

	# Bruno7/ksa-whisper-model

	## Model Description
	Fine-tuned Arabic Whisper model for Saudi dialect

	## Base Model
	This adapter is designed to work with: `openai/whisper-large-v3`





	## Usage

	```python
	from transformers import pipeline
	from peft import PeftModel, PeftConfig

	# Load the adapter configuration
	config = PeftConfig.from_pretrained("Bruno7/ksa-whisper-model")

	# Load base model and apply adapter
	pipe = pipeline(
	"automatic-speech-recognition",
	model=config.base_model_name_or_path,
	device="cuda" if torch.cuda.is_available() else "cpu"
	)

	# Load and apply the adapter
	model = PeftModel.from_pretrained(pipe.model, "Bruno7/ksa-whisper-model")
	pipe.model = model

	# Process audio
	result = pipe("path_to_audio.wav")
	print(result["text"])
	```

	### Alternative Usage (Direct Loading)
	```python
	from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
	from peft import PeftModel

	# Load base model and processor
	processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")
	model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3")

	# Apply adapter
	model = PeftModel.from_pretrained(model, "Bruno7/ksa-whisper-model")

	# Your inference code here
	```

	## Model Architecture

	This is a PEFT (Parameter-Efficient Fine-Tuning) adapter model that modifies a base Whisper model for improved performance on specific domains or languages. The adapter uses LoRA (Low-Rank Adaptation) techniques to efficiently fine-tune the model while keeping the parameter count minimal.

	## Inference

	This adapter can be applied to the base model for domain-specific speech recognition tasks.

	## Limitations

	- Requires the base model to be loaded separately
	- Performance may vary with different audio qualities and accents
	- Requires audio preprocessing for optimal results