UBC-NLP/Casablanca
Viewer • Updated • 13.6k • 694 • 33
How to use Tilas/qwen2audio-darija-multitask with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-Audio-7B-Instruct")
model = PeftModel.from_pretrained(base_model, "Tilas/qwen2audio-darija-multitask")This is a LoRA adapter for Qwen/Qwen2-Audio-7B-Instruct, fine-tuned on Moroccan Darija (Arabic dialect) for three tasks:
| Model | WER | CER |
|---|---|---|
| Baseline (zero-shot) | 121.48% | 94.17% |
| Phase 1 (Casablanca only) | 82.69% | 39.03% |
| Multi-task (this model) | 67.94% | 29.26% |
from transformers import Qwen2AudioForConditionalGeneration, AutoProcessor
from peft import PeftModel
import librosa
# Load base model + adapter
base_model = Qwen2AudioForConditionalGeneration.from_pretrained(
"Qwen/Qwen2-Audio-7B-Instruct", torch_dtype=torch.bfloat16
).to("cuda")
model = PeftModel.from_pretrained(base_model, "Tilas/qwen2audio-darija-multitask")
processor = AutoProcessor.from_pretrained("Tilas/qwen2audio-darija-multitask")
# Transcribe
audio, sr = librosa.load("audio.wav", sr=16000)
conversation = [{"role": "user", "content": [
{"type": "audio", "audio_url": "placeholder"},
{"type": "text", "text": "Transcribe this speech exactly as spoken, using Arabic script."},
]}]
text = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False)
inputs = processor(text=text, audios=[audio], sampling_rate=16000, return_tensors="pt", padding=True).to("cuda")
output = model.generate(**inputs, max_new_tokens=256)
print(processor.batch_decode(output[:, inputs.input_ids.size(1):], skip_special_tokens=True)[0])
If you use this model, please cite the underlying datasets:
@article{talafha2024casablanca,
title={Casablanca: Data and Models for Multidialectal Arabic Speech Recognition},
author={Talafha, Bashar and others},
journal={arXiv preprint arXiv:2410.04527},
year={2024}
}
Base model
Qwen/Qwen2-Audio-7B-Instruct