File size: 1,990 Bytes
796d92a
8fac15c
 
 
 
 
 
 
 
 
 
796d92a
8fac15c
796d92a
 
8fac15c
796d92a
8fac15c
 
796d92a
8fac15c
 
796d92a
 
 
 
 
8fac15c
796d92a
8fac15c
 
 
796d92a
8fac15c
 
796d92a
8fac15c
 
 
 
 
 
796d92a
8fac15c
 
 
796d92a
8fac15c
 
 
 
796d92a
8fac15c
 
 
 
796d92a
8fac15c
 
 
796d92a
8fac15c
 
796d92a
8fac15c
 
796d92a
8fac15c
796d92a
8fac15c
796d92a
8fac15c
796d92a
8fac15c
796d92a
8fac15c
796d92a
8fac15c
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
language: en
license: apache-2.0
tags:
- automatic-speech-recognition
- speech
- audio
- transformers- peft
- lora
- adapter

library_name: transformers
pipeline_tag: automatic-speech-recognition
---

# Bruno7/ksa-whisper

## Model Description
Arabic Whisper PEFT adapter for Saudi dialect

## Base Model
This adapter is designed to work with: `openai/whisper-large-v3`





## Usage

```python
from transformers import pipeline
from peft import PeftModel, PeftConfig

# Load the adapter configuration
config = PeftConfig.from_pretrained("Bruno7/ksa-whisper")

# Load base model and apply adapter
pipe = pipeline(
    "automatic-speech-recognition",
    model=config.base_model_name_or_path,
    device="cuda" if torch.cuda.is_available() else "cpu"
)

# Load and apply the adapter
model = PeftModel.from_pretrained(pipe.model, "Bruno7/ksa-whisper")
pipe.model = model

# Process audio
result = pipe("path_to_audio.wav")
print(result["text"])
```

### Alternative Usage (Direct Loading)
```python
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
from peft import PeftModel

# Load base model and processor
processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")
model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3")

# Apply adapter
model = PeftModel.from_pretrained(model, "Bruno7/ksa-whisper")

# Your inference code here
```

## Model Architecture

This is a PEFT (Parameter-Efficient Fine-Tuning) adapter model that modifies a base Whisper model for improved performance on specific domains or languages. The adapter uses LoRA (Low-Rank Adaptation) techniques to efficiently fine-tune the model while keeping the parameter count minimal.

## Inference

This adapter can be applied to the base model for domain-specific speech recognition tasks.

## Limitations

- Requires the base model to be loaded separately
- Performance may vary with different audio qualities and accents
- Requires audio preprocessing for optimal results