ArTST-v2 Dialects โ€” Arabic Dialect ASR

ArTST-v2 Dialects is an automatic speech recognition model for Arabic speech-to-text transcription across multiple Arabic dialects.

The model is based on ArTST, an Arabic Text and Speech Transformer model developed by the Speech Lab at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI).

This checkpoint was fine-tuned for automatic speech recognition (ASR) on multiple Arabic dialects.

For more details about the original ArTST model, please refer to the official repository:
https://github.com/mbzuai-nlp/ArTST


Model Description

This model is designed to transcribe spoken Arabic into text, with a focus on dialectal Arabic speech. It primarily supports the Arabic language and has been fine-tuned to improve recognition performance across multiple Arabic dialects.


Model Usage

import torch
import soundfile as sf

from transformers import (
    SpeechT5ForSpeechToText,
    SpeechT5Processor,
    SpeechT5Tokenizer,
)

device = "cuda" if torch.cuda.is_available() else "cpu"

model_id = "Mohammed01/ArTST-v2-Dialects"

tokenizer = SpeechT5Tokenizer.from_pretrained(model_id)
processor = SpeechT5Processor.from_pretrained(model_id, tokenizer=tokenizer)
model = SpeechT5ForSpeechToText.from_pretrained(model_id).to(device)

audio, sampling_rate = sf.read("audio.wav")

inputs = processor(
    audio=audio,
    sampling_rate=sampling_rate,
    return_tensors="pt"
)

inputs = {key: value.to(device) for key, value in inputs.items()}

predicted_ids = model.generate(
    **inputs,
    max_length=250
)

transcription = processor.batch_decode(
    predicted_ids,
    skip_special_tokens=True
)

print(transcription[0])
Downloads last month
21
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support