ksa-whisper-model / README.md

Bruno7

Upload ASR model - Documentation

98c22a8 verified 7 months ago

preview code

raw

history blame contribute delete

2.02 kB

metadata

language: en
license: apache-2.0
tags:
  - automatic-speech-recognition
  - speech
  - audio
  - transformers- peft
  - lora
  - adapter
library_name: transformers
pipeline_tag: automatic-speech-recognition

Bruno7/ksa-whisper-model

Model Description

Fine-tuned Arabic Whisper model for Saudi dialect

Base Model

This adapter is designed to work with: openai/whisper-large-v3

Usage

from transformers import pipeline
from peft import PeftModel, PeftConfig

# Load the adapter configuration
config = PeftConfig.from_pretrained("Bruno7/ksa-whisper-model")

# Load base model and apply adapter
pipe = pipeline(
    "automatic-speech-recognition",
    model=config.base_model_name_or_path,
    device="cuda" if torch.cuda.is_available() else "cpu"
)

# Load and apply the adapter
model = PeftModel.from_pretrained(pipe.model, "Bruno7/ksa-whisper-model")
pipe.model = model

# Process audio
result = pipe("path_to_audio.wav")
print(result["text"])

Alternative Usage (Direct Loading)

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
from peft import PeftModel

# Load base model and processor
processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")
model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3")

# Apply adapter
model = PeftModel.from_pretrained(model, "Bruno7/ksa-whisper-model")

# Your inference code here

Model Architecture

This is a PEFT (Parameter-Efficient Fine-Tuning) adapter model that modifies a base Whisper model for improved performance on specific domains or languages. The adapter uses LoRA (Low-Rank Adaptation) techniques to efficiently fine-tune the model while keeping the parameter count minimal.

Inference

This adapter can be applied to the base model for domain-specific speech recognition tasks.

Limitations

Requires the base model to be loaded separately
Performance may vary with different audio qualities and accents
Requires audio preprocessing for optimal results