Improve model card: Add pipeline tag, paper info, usage, and refine description

This PR significantly enhances the model card for `Omartificial-Intelligence-Space/Shami-MT-2MSA` by:

- Adding `pipeline_tag: translation` to the metadata, which is crucial for model discoverability on the Hugging Face Hub, accurately reflecting its machine translation capabilities.
- Correcting a typo in the main model card title ("ton" to "to").
- Including a direct link to the accompanying paper, [SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System](https://huggingface.co/papers/2508.02268), at the top of the card.
- Adding the paper's abstract to provide comprehensive context about the model's design, training, and evaluation.
- Clarifying the model description to explicitly state that `SHAMI-MT-2MSA` is the Syrian dialect to MSA component of the broader `SHAMI-MT` bidirectional system.
- Adding a Python sample usage snippet, demonstrating how to load and use the model with the `transformers` library, which improves immediate usability.
- Refining the citation section to include a proper BibTeX entry for the main paper and removing redundant details.
- Removing the "Model Details" section as its content is either covered by the metadata or the main description.

These improvements make the model card more informative, user-friendly, and aligned with best practices on the Hugging Face Hub.

Files changed (1) hide show

README.md +41 -26

README.md CHANGED Viewed

@@ -1,12 +1,13 @@
 ---
-license: apache-2.0
 language:
 - ar
 library_name: transformers
 metrics:
 - bleu
-base_model:
-- UBC-NLP/AraT5v2-base-1024
 tags:
 - Syrian
 - Shami
@@ -15,35 +16,56 @@ tags:
 - Dialect
 - ArabicNLP
 ---
-# SHAMI-MT-2MSA : A Machine Translation Model From Syrian Dialect ton MSA
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/_w3yOhX5YjaMB4Jc_mrib.png)
 ## Model Description
-SHAMI-MT is a specialized machine translation model designed to translate from **Syrian dialect to Modern Standard Arabic (MSA)**. Built on the robust AraT5v2-base-1024 architecture, this model bridges the gap between formal Arabic and the rich dialectal variations of Syrian Arabic.
-## Model Details
-- **Model Type**: Sequence-to-Sequence Translation
-- **Base Model**: UBC-NLP/AraT5v2-base-1024
-- **Language**: Arabic (Syrian Dialect → MSA)
-- **License**: Apache 2.0
-- **Library**: Transformers
 ## Citation
-If you use this model in your research, please cite:
 ```bibtex
-@misc{shami-mt-2024,
-  title={SHAMI-MT: A Machine Translation Model From MSA to Syrian Dialect},
-  author={Omartificial Intelligence Space},
-  year={2024},
-  publisher={Hugging Face},
-  url={https://huggingface.co/Omartificial-Intelligence-Space/Shami-MT}
 }
 @article{nayouf2023nabra,
@@ -52,15 +74,8 @@ If you use this model in your research, please cite:
   journal={arXiv preprint arXiv:2310.17315},
   year={2023}
 }
-@misc{onajar2025shamiMT,
-  title={Shami-MT : A Machine Translation from MSA to Syrian Dialect},
-  author={Sibaee, Serry and Nacar, Omer},
-  year={2025}
-}
 ```
 ## Contact & Support
-For questions, issues, or contributions, please visit the [model repository](https://huggingface.co/Omartificial-Intelligence-Space/Shami-MT) or contact the development team.

 ---
+base_model:
+- UBC-NLP/AraT5v2-base-1024
 language:
 - ar
 library_name: transformers
+license: apache-2.0
 metrics:
 - bleu
+pipeline_tag: translation
 tags:
 - Syrian
 - Shami
 - Dialect
 - ArabicNLP
 ---
+# SHAMI-MT-2MSA : A Machine Translation Model From Syrian Dialect to MSA
+This model is part of the work presented in the paper [SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System](https://huggingface.co/papers/2508.02268).
+## Paper Abstract
+The rich linguistic landscape of the Arab world is characterized by a significant gap between Modern Standard Arabic (MSA), the language of formal communication, and the diverse regional dialects used in everyday life. This diglossia presents a formidable challenge for natural language processing, particularly machine translation. This paper introduces \textbf{SHAMI-MT}, a bidirectional machine translation system specifically engineered to bridge the communication gap between MSA and the Syrian dialect. We present two specialized models, one for MSA-to-Shami and another for Shami-to-MSA translation, both built upon the state-of-the-art AraT5v2-base-1024 architecture. The models were fine-tuned on the comprehensive Nabra dataset and rigorously evaluated on unseen data from the MADAR corpus. Our MSA-to-Shami model achieved an outstanding average quality score of \textbf{4.01 out of 5.0} when judged by OPENAI model GPT-4.1, demonstrating its ability to produce translations that are not only accurate but also dialectally authentic. This work provides a crucial, high-fidelity tool for a previously underserved language pair, advancing the field of dialectal Arabic translation and offering significant applications in content localization, cultural heritage, and intercultural communication.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/_w3yOhX5YjaMB4Jc_mrib.png)
 ## Model Description
+SHAMI-MT-2MSA is one of two specialized models that constitute the **SHAMI-MT** bidirectional machine translation system. This particular model is designed to translate from **Syrian dialect to Modern Standard Arabic (MSA)**. Built on the robust AraT5v2-base-1024 architecture, this model bridges the gap between formal Arabic and the rich dialectal variations of Syrian Arabic.
+## Usage
+This model can be used directly with the Hugging Face `transformers` library.
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+# Load tokenizer and model
+model_id = "Omartificial-Intelligence-Space/Shami-MT-2MSA"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
+# Example input: Syrian Arabic dialect
+input_text = "كيفك اليوم؟" # "How are you today?" in Syrian dialect
+inputs = tokenizer(input_text, return_tensors="pt")
+# Generate translation
+outputs = model.generate(**inputs, max_new_tokens=128) # Added max_new_tokens for generation to prevent infinite loop
+translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(f"Syrian Dialect: {input_text}")
+print(f"Modern Standard Arabic: {translated_text}")
+```
 ## Citation
+If you use this model in your research, please cite the main paper and the dataset paper:
 ```bibtex
+@article{sibaee2025shamimt,
+  title={SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System},
+  author={Sibaee, Serry and Nacar, Omer},
+  year={2025},
+  journal={Hugging Face Papers},
+  url={https://huggingface.co/papers/2508.02268}
 }
 @article{nayouf2023nabra,
   journal={arXiv preprint arXiv:2310.17315},
   year={2023}
 }
 ```
 ## Contact & Support
+For questions, issues, or contributions, please visit the [model repository](https://huggingface.co/Omartificial-Intelligence-Space/Shami-MT-2MSA) or contact the development team.