|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- automatic-speech-recognition |
|
|
- speech |
|
|
- audio |
|
|
- transformers- peft |
|
|
- lora |
|
|
- adapter |
|
|
|
|
|
library_name: transformers |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
|
|
|
# Bruno7/ksa-whisper-model |
|
|
|
|
|
## Model Description |
|
|
Fine-tuned Arabic Whisper model for Saudi dialect |
|
|
|
|
|
## Base Model |
|
|
This adapter is designed to work with: `openai/whisper-large-v3` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
from peft import PeftModel, PeftConfig |
|
|
|
|
|
# Load the adapter configuration |
|
|
config = PeftConfig.from_pretrained("Bruno7/ksa-whisper-model") |
|
|
|
|
|
# Load base model and apply adapter |
|
|
pipe = pipeline( |
|
|
"automatic-speech-recognition", |
|
|
model=config.base_model_name_or_path, |
|
|
device="cuda" if torch.cuda.is_available() else "cpu" |
|
|
) |
|
|
|
|
|
# Load and apply the adapter |
|
|
model = PeftModel.from_pretrained(pipe.model, "Bruno7/ksa-whisper-model") |
|
|
pipe.model = model |
|
|
|
|
|
# Process audio |
|
|
result = pipe("path_to_audio.wav") |
|
|
print(result["text"]) |
|
|
``` |
|
|
|
|
|
### Alternative Usage (Direct Loading) |
|
|
```python |
|
|
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq |
|
|
from peft import PeftModel |
|
|
|
|
|
# Load base model and processor |
|
|
processor = AutoProcessor.from_pretrained("openai/whisper-large-v3") |
|
|
model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3") |
|
|
|
|
|
# Apply adapter |
|
|
model = PeftModel.from_pretrained(model, "Bruno7/ksa-whisper-model") |
|
|
|
|
|
# Your inference code here |
|
|
``` |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
This is a PEFT (Parameter-Efficient Fine-Tuning) adapter model that modifies a base Whisper model for improved performance on specific domains or languages. The adapter uses LoRA (Low-Rank Adaptation) techniques to efficiently fine-tune the model while keeping the parameter count minimal. |
|
|
|
|
|
## Inference |
|
|
|
|
|
This adapter can be applied to the base model for domain-specific speech recognition tasks. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Requires the base model to be loaded separately |
|
|
- Performance may vary with different audio qualities and accents |
|
|
- Requires audio preprocessing for optimal results |
|
|
|