File size: 2,018 Bytes
2a28d50
98c22a8
 
 
 
 
 
 
 
 
 
2a28d50
98c22a8
2a28d50
 
98c22a8
2a28d50
98c22a8
 
2a28d50
98c22a8
 
2a28d50
 
 
 
 
98c22a8
2a28d50
98c22a8
 
 
2a28d50
98c22a8
 
2a28d50
98c22a8
 
 
 
 
 
2a28d50
98c22a8
 
 
2a28d50
98c22a8
 
 
 
2a28d50
98c22a8
 
 
 
2a28d50
98c22a8
 
 
2a28d50
98c22a8
 
2a28d50
98c22a8
 
2a28d50
98c22a8
2a28d50
98c22a8
2a28d50
98c22a8
2a28d50
98c22a8
2a28d50
98c22a8
2a28d50
98c22a8
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
language: en
license: apache-2.0
tags:
- automatic-speech-recognition
- speech
- audio
- transformers- peft
- lora
- adapter

library_name: transformers
pipeline_tag: automatic-speech-recognition
---

# Bruno7/ksa-whisper-model

## Model Description
Fine-tuned Arabic Whisper model for Saudi dialect

## Base Model
This adapter is designed to work with: `openai/whisper-large-v3`





## Usage

```python
from transformers import pipeline
from peft import PeftModel, PeftConfig

# Load the adapter configuration
config = PeftConfig.from_pretrained("Bruno7/ksa-whisper-model")

# Load base model and apply adapter
pipe = pipeline(
    "automatic-speech-recognition",
    model=config.base_model_name_or_path,
    device="cuda" if torch.cuda.is_available() else "cpu"
)

# Load and apply the adapter
model = PeftModel.from_pretrained(pipe.model, "Bruno7/ksa-whisper-model")
pipe.model = model

# Process audio
result = pipe("path_to_audio.wav")
print(result["text"])
```

### Alternative Usage (Direct Loading)
```python
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
from peft import PeftModel

# Load base model and processor
processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")
model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3")

# Apply adapter
model = PeftModel.from_pretrained(model, "Bruno7/ksa-whisper-model")

# Your inference code here
```

## Model Architecture

This is a PEFT (Parameter-Efficient Fine-Tuning) adapter model that modifies a base Whisper model for improved performance on specific domains or languages. The adapter uses LoRA (Low-Rank Adaptation) techniques to efficiently fine-tune the model while keeping the parameter count minimal.

## Inference

This adapter can be applied to the base model for domain-specific speech recognition tasks.

## Limitations

- Requires the base model to be loaded separately
- Performance may vary with different audio qualities and accents
- Requires audio preprocessing for optimal results