ArTST-v2 Dialects — Arabic Dialect ASR

ArTST-v2 Dialects is an automatic speech recognition model for Arabic speech-to-text transcription across multiple Arabic dialects.

The model is based on ArTST, an Arabic Text and Speech Transformer model developed by the Speech Lab at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI).

This checkpoint was fine-tuned for automatic speech recognition (ASR) on multiple Arabic dialects.

For more details about the original ArTST model, please refer to the official repository:
https://github.com/mbzuai-nlp/ArTST

Model Description

This model is designed to transcribe spoken Arabic into text, with a focus on dialectal Arabic speech. It primarily supports the Arabic language and has been fine-tuned to improve recognition performance across multiple Arabic dialects.

Model Usage

import torch
import soundfile as sf

from transformers import (
    SpeechT5ForSpeechToText,
    SpeechT5Processor,
    SpeechT5Tokenizer,
)

device = "cuda" if torch.cuda.is_available() else "cpu"

model_id = "Mohammed01/ArTST-v2-Dialects"

tokenizer = SpeechT5Tokenizer.from_pretrained(model_id)
processor = SpeechT5Processor.from_pretrained(model_id, tokenizer=tokenizer)
model = SpeechT5ForSpeechToText.from_pretrained(model_id).to(device)

audio, sampling_rate = sf.read("audio.wav")

inputs = processor(
    audio=audio,
    sampling_rate=sampling_rate,
    return_tensors="pt"
)

inputs = {key: value.to(device) for key, value in inputs.items()}

predicted_ids = model.generate(
    **inputs,
    max_length=250
)

transcription = processor.batch_decode(
    predicted_ids,
    skip_special_tokens=True
)

print(transcription[0])

Downloads last month: 18

Safetensors

Model size

0.2B params

Tensor type

F32