Darija-English Translation (NLLB-200 Distilled)
Collection
A pair of fine-tuned NLLB-200-distilled-600M models for bidirectional translation between Moroccan Darija and English. โข 2 items โข Updated
This model is a fine-tuned version of facebook/nllb-200-distilled-600m for
machine translation from English to Moroccan Darija.
It is intended for informal, conversational Darija rather than Modern Standard Arabic.
eng_Latn)ary_Arab)The model was fine-tuned on a custom EnglishโDarija parallel dataset compiled from the Darija Open Dataset.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "mwkhettab/nllb-200-en-darija"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
text = "How was your day?"
inputs = tokenizer(text, return_tensors="pt", max_length=512)
outputs = model.generate(
**inputs,
forced_bos_token_id=tokenizer.convert_tokens_to_ids("ary_Arab"),
max_new_tokens=128,
num_beams=5,
)
translation = tokenizer.decode(
outputs[0],
skip_special_tokens=True,
).strip()
print(translation)
Base model
facebook/nllb-200-distilled-600M