| | --- |
| | base_model: |
| | - UBC-NLP/AraT5v2-base-1024 |
| | language: |
| | - ar |
| | library_name: transformers |
| | license: apache-2.0 |
| | metrics: |
| | - bleu |
| | pipeline_tag: translation |
| | tags: |
| | - Syrian |
| | - Shami |
| | - MT |
| | - MSA |
| | - Dialect |
| | - ArabicNLP |
| | --- |
| | |
| | # SHAMI-MT-2MSA : A Machine Translation Model From Syrian Dialect to MSA |
| |
|
| | This model is part of the work presented in the paper [SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System](https://huggingface.co/papers/2508.02268). |
| |
|
| | ## Paper Abstract |
| |
|
| | The rich linguistic landscape of the Arab world is characterized by a significant gap between Modern Standard Arabic (MSA), the language of formal communication, and the diverse regional dialects used in everyday life. This diglossia presents a formidable challenge for natural language processing, particularly machine translation. This paper introduces \textbf{SHAMI-MT}, a bidirectional machine translation system specifically engineered to bridge the communication gap between MSA and the Syrian dialect. We present two specialized models, one for MSA-to-Shami and another for Shami-to-MSA translation, both built upon the state-of-the-art AraT5v2-base-1024 architecture. The models were fine-tuned on the comprehensive Nabra dataset and rigorously evaluated on unseen data from the MADAR corpus. Our MSA-to-Shami model achieved an outstanding average quality score of \textbf{4.01 out of 5.0} when judged by OPENAI model GPT-4.1, demonstrating its ability to produce translations that are not only accurate but also dialectally authentic. This work provides a crucial, high-fidelity tool for a previously underserved language pair, advancing the field of dialectal Arabic translation and offering significant applications in content localization, cultural heritage, and intercultural communication. |
| |
|
| |  |
| |
|
| | ## Model Description |
| |
|
| | SHAMI-MT-2MSA is one of two specialized models that constitute the **SHAMI-MT** bidirectional machine translation system. This particular model is designed to translate from **Syrian dialect to Modern Standard Arabic (MSA)**. Built on the robust AraT5v2-base-1024 architecture, this model bridges the gap between formal Arabic and the rich dialectal variations of Syrian Arabic. |
| |
|
| | ## Usage |
| |
|
| | This model can be used directly with the Hugging Face `transformers` library. |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
| | |
| | # Load tokenizer and model |
| | model_id = "Omartificial-Intelligence-Space/Shami-MT-2MSA" |
| | tokenizer = AutoTokenizer.from_pretrained(model_id) |
| | model = AutoModelForSeq2SeqLM.from_pretrained(model_id) |
| | |
| | # Example input: Syrian Arabic dialect |
| | input_text = "ูููู ุงูููู
ุ" # "How are you today?" in Syrian dialect |
| | inputs = tokenizer(input_text, return_tensors="pt") |
| | |
| | # Generate translation |
| | outputs = model.generate(**inputs, max_new_tokens=128) # Added max_new_tokens for generation to prevent infinite loop |
| | translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| | |
| | print(f"Syrian Dialect: {input_text}") |
| | print(f"Modern Standard Arabic: {translated_text}") |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | If you use this model in your research, please cite the main paper and the dataset paper: |
| |
|
| | ```bibtex |
| | @article{sibaee2025shamimt, |
| | title={SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System}, |
| | author={Sibaee, Serry and Nacar, Omer}, |
| | year={2025}, |
| | journal={Hugging Face Papers}, |
| | url={https://huggingface.co/papers/2508.02268} |
| | } |
| | |
| | @article{nayouf2023nabra, |
| | title={Nรขbra: Syrian Arabic dialects with morphological annotations}, |
| | author={Nayouf, Amal and Hammouda, Tymaa Hasanain and Jarrar, Mustafa and Zaraket, Fadi A and Kurdy, Mohamad-Bassam}, |
| | journal={arXiv preprint arXiv:2310.17315}, |
| | year={2023} |
| | } |
| | ``` |
| |
|
| | ## Contact & Support |
| |
|
| | For questions, issues, or contributions, please visit the [model repository](https://huggingface.co/Omartificial-Intelligence-Space/Shami-MT-2MSA) or contact the development team. |