Qwen2-Audio Fine-tuned for Moroccan Darija (Multi-task)

This is a LoRA adapter for Qwen/Qwen2-Audio-7B-Instruct, fine-tuned on Moroccan Darija (Arabic dialect) for three tasks:

Tasks

  1. ASR: Audio → Darija transcription (Arabic script)
  2. Translation: Audio → English translation
  3. Transliteration: Audio → Darija in Latin script (Arabizi)

Training Details

  • Base model: Qwen2-Audio-7B-Instruct (8.2B params)
  • Method: LoRA (r=32, alpha=32, ~109M trainable params / 1.28%)
  • Training data: ~19,100 samples
    • Casablanca Morocco (UBC-NLP): ~1,045 ASR samples
    • DODa-audio (AtlasIA): ~12,700 samples (ASR + Translation + Transliteration)
  • Epochs: 2
  • Best eval loss: 0.7265

Results on Casablanca Morocco Test Set (held out)

Model WER CER
Baseline (zero-shot) 121.48% 94.17%
Phase 1 (Casablanca only) 82.69% 39.03%
Multi-task (this model) 67.94% 29.26%

Usage

from transformers import Qwen2AudioForConditionalGeneration, AutoProcessor
from peft import PeftModel
import librosa

# Load base model + adapter
base_model = Qwen2AudioForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-Audio-7B-Instruct", torch_dtype=torch.bfloat16
).to("cuda")
model = PeftModel.from_pretrained(base_model, "Tilas/qwen2audio-darija-multitask")
processor = AutoProcessor.from_pretrained("Tilas/qwen2audio-darija-multitask")

# Transcribe
audio, sr = librosa.load("audio.wav", sr=16000)
conversation = [{"role": "user", "content": [
    {"type": "audio", "audio_url": "placeholder"},
    {"type": "text", "text": "Transcribe this speech exactly as spoken, using Arabic script."},
]}]
text = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False)
inputs = processor(text=text, audios=[audio], sampling_rate=16000, return_tensors="pt", padding=True).to("cuda")
output = model.generate(**inputs, max_new_tokens=256)
print(processor.batch_decode(output[:, inputs.input_ids.size(1):], skip_special_tokens=True)[0])

Citation

If you use this model, please cite the underlying datasets:

@article{talafha2024casablanca,
  title={Casablanca: Data and Models for Multidialectal Arabic Speech Recognition},
  author={Talafha, Bashar and others},
  journal={arXiv preprint arXiv:2410.04527},
  year={2024}
}
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tilas/qwen2audio-darija-multitask

Adapter
(15)
this model

Datasets used to train Tilas/qwen2audio-darija-multitask

Paper for Tilas/qwen2audio-darija-multitask