nielsr HF Staff commited on
Commit
b17777c
ยท
verified ยท
1 Parent(s): 89f6f3b

Improve model card: Add pipeline tag, paper info, usage, and refine description

Browse files

This PR significantly enhances the model card for `Omartificial-Intelligence-Space/Shami-MT-2MSA` by:

- Adding `pipeline_tag: translation` to the metadata, which is crucial for model discoverability on the Hugging Face Hub, accurately reflecting its machine translation capabilities.
- Correcting a typo in the main model card title ("ton" to "to").
- Including a direct link to the accompanying paper, [SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System](https://huggingface.co/papers/2508.02268), at the top of the card.
- Adding the paper's abstract to provide comprehensive context about the model's design, training, and evaluation.
- Clarifying the model description to explicitly state that `SHAMI-MT-2MSA` is the Syrian dialect to MSA component of the broader `SHAMI-MT` bidirectional system.
- Adding a Python sample usage snippet, demonstrating how to load and use the model with the `transformers` library, which improves immediate usability.
- Refining the citation section to include a proper BibTeX entry for the main paper and removing redundant details.
- Removing the "Model Details" section as its content is either covered by the metadata or the main description.

These improvements make the model card more informative, user-friendly, and aligned with best practices on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +41 -26
README.md CHANGED
@@ -1,12 +1,13 @@
1
  ---
2
- license: apache-2.0
 
3
  language:
4
  - ar
5
  library_name: transformers
 
6
  metrics:
7
  - bleu
8
- base_model:
9
- - UBC-NLP/AraT5v2-base-1024
10
  tags:
11
  - Syrian
12
  - Shami
@@ -15,35 +16,56 @@ tags:
15
  - Dialect
16
  - ArabicNLP
17
  ---
18
- # SHAMI-MT-2MSA : A Machine Translation Model From Syrian Dialect ton MSA
19
 
 
 
 
 
 
 
 
20
 
21
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/_w3yOhX5YjaMB4Jc_mrib.png)
22
 
23
  ## Model Description
24
 
25
- SHAMI-MT is a specialized machine translation model designed to translate from **Syrian dialect to Modern Standard Arabic (MSA)**. Built on the robust AraT5v2-base-1024 architecture, this model bridges the gap between formal Arabic and the rich dialectal variations of Syrian Arabic.
 
 
26
 
27
- ## Model Details
28
 
29
- - **Model Type**: Sequence-to-Sequence Translation
30
- - **Base Model**: UBC-NLP/AraT5v2-base-1024
31
- - **Language**: Arabic (Syrian Dialect โ†’ MSA)
32
- - **License**: Apache 2.0
33
- - **Library**: Transformers
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ## Citation
37
 
38
- If you use this model in your research, please cite:
39
 
40
  ```bibtex
41
- @misc{shami-mt-2024,
42
- title={SHAMI-MT: A Machine Translation Model From MSA to Syrian Dialect},
43
- author={Omartificial Intelligence Space},
44
- year={2024},
45
- publisher={Hugging Face},
46
- url={https://huggingface.co/Omartificial-Intelligence-Space/Shami-MT}
47
  }
48
 
49
  @article{nayouf2023nabra,
@@ -52,15 +74,8 @@ If you use this model in your research, please cite:
52
  journal={arXiv preprint arXiv:2310.17315},
53
  year={2023}
54
  }
55
-
56
- @misc{onajar2025shamiMT,
57
- title={Shami-MT : A Machine Translation from MSA to Syrian Dialect},
58
- author={Sibaee, Serry and Nacar, Omer},
59
- year={2025}
60
- }
61
  ```
62
 
63
  ## Contact & Support
64
 
65
- For questions, issues, or contributions, please visit the [model repository](https://huggingface.co/Omartificial-Intelligence-Space/Shami-MT) or contact the development team.
66
-
 
1
  ---
2
+ base_model:
3
+ - UBC-NLP/AraT5v2-base-1024
4
  language:
5
  - ar
6
  library_name: transformers
7
+ license: apache-2.0
8
  metrics:
9
  - bleu
10
+ pipeline_tag: translation
 
11
  tags:
12
  - Syrian
13
  - Shami
 
16
  - Dialect
17
  - ArabicNLP
18
  ---
 
19
 
20
+ # SHAMI-MT-2MSA : A Machine Translation Model From Syrian Dialect to MSA
21
+
22
+ This model is part of the work presented in the paper [SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System](https://huggingface.co/papers/2508.02268).
23
+
24
+ ## Paper Abstract
25
+
26
+ The rich linguistic landscape of the Arab world is characterized by a significant gap between Modern Standard Arabic (MSA), the language of formal communication, and the diverse regional dialects used in everyday life. This diglossia presents a formidable challenge for natural language processing, particularly machine translation. This paper introduces \textbf{SHAMI-MT}, a bidirectional machine translation system specifically engineered to bridge the communication gap between MSA and the Syrian dialect. We present two specialized models, one for MSA-to-Shami and another for Shami-to-MSA translation, both built upon the state-of-the-art AraT5v2-base-1024 architecture. The models were fine-tuned on the comprehensive Nabra dataset and rigorously evaluated on unseen data from the MADAR corpus. Our MSA-to-Shami model achieved an outstanding average quality score of \textbf{4.01 out of 5.0} when judged by OPENAI model GPT-4.1, demonstrating its ability to produce translations that are not only accurate but also dialectally authentic. This work provides a crucial, high-fidelity tool for a previously underserved language pair, advancing the field of dialectal Arabic translation and offering significant applications in content localization, cultural heritage, and intercultural communication.
27
 
28
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/_w3yOhX5YjaMB4Jc_mrib.png)
29
 
30
  ## Model Description
31
 
32
+ SHAMI-MT-2MSA is one of two specialized models that constitute the **SHAMI-MT** bidirectional machine translation system. This particular model is designed to translate from **Syrian dialect to Modern Standard Arabic (MSA)**. Built on the robust AraT5v2-base-1024 architecture, this model bridges the gap between formal Arabic and the rich dialectal variations of Syrian Arabic.
33
+
34
+ ## Usage
35
 
36
+ This model can be used directly with the Hugging Face `transformers` library.
37
 
38
+ ```python
39
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
 
 
 
40
 
41
+ # Load tokenizer and model
42
+ model_id = "Omartificial-Intelligence-Space/Shami-MT-2MSA"
43
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
44
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
45
+
46
+ # Example input: Syrian Arabic dialect
47
+ input_text = "ูƒูŠููƒ ุงู„ูŠูˆู…ุŸ" # "How are you today?" in Syrian dialect
48
+ inputs = tokenizer(input_text, return_tensors="pt")
49
+
50
+ # Generate translation
51
+ outputs = model.generate(**inputs, max_new_tokens=128) # Added max_new_tokens for generation to prevent infinite loop
52
+ translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
53
+
54
+ print(f"Syrian Dialect: {input_text}")
55
+ print(f"Modern Standard Arabic: {translated_text}")
56
+ ```
57
 
58
  ## Citation
59
 
60
+ If you use this model in your research, please cite the main paper and the dataset paper:
61
 
62
  ```bibtex
63
+ @article{sibaee2025shamimt,
64
+ title={SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System},
65
+ author={Sibaee, Serry and Nacar, Omer},
66
+ year={2025},
67
+ journal={Hugging Face Papers},
68
+ url={https://huggingface.co/papers/2508.02268}
69
  }
70
 
71
  @article{nayouf2023nabra,
 
74
  journal={arXiv preprint arXiv:2310.17315},
75
  year={2023}
76
  }
 
 
 
 
 
 
77
  ```
78
 
79
  ## Contact & Support
80
 
81
+ For questions, issues, or contributions, please visit the [model repository](https://huggingface.co/Omartificial-Intelligence-Space/Shami-MT-2MSA) or contact the development team.