English โ†’ Moroccan Darija Translation (NLLB-200)

This model is a fine-tuned version of facebook/nllb-200-distilled-600m for machine translation from English to Moroccan Darija.

It is intended for informal, conversational Darija rather than Modern Standard Arabic.

Languages

  • Source: English (eng_Latn)
  • Target: Moroccan Darija (ary_Arab)
  • Primary script: Arabic

Training Data

The model was fine-tuned on a custom Englishโ€“Darija parallel dataset compiled from the Darija Open Dataset.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "mwkhettab/nllb-200-en-darija"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "How was your day?"

inputs = tokenizer(text, return_tensors="pt", max_length=512)

outputs = model.generate(
    **inputs,
    forced_bos_token_id=tokenizer.convert_tokens_to_ids("ary_Arab"),
    max_new_tokens=128,
    num_beams=5,
)

translation = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True,
).strip()

print(translation)
Downloads last month
16
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mwkhettab/nllb-200-en-darija

Finetuned
(273)
this model

Collection including mwkhettab/nllb-200-en-darija