| library_name: transformers |
| license: mit |
| model_name: MBart-Urdu-Text-Summarization |
| pipeline_tag: summarization |
| tags: |
| - text-generation |
| - mbart |
| - nlp |
| - transformers |
| - text-generation-inference |
| author: Wali Muhammad Ahmad |
| private: false |
| gated: false |
| inference: true |
| mask_token: <mask> |
| widget_data: |
| text: Enter your para here |
| transformers_info: |
| auto_class: MBartForConditionalGeneration |
| processor: AutoTokenizer |
| language: |
| - en |
| - ur |
| --- |
| |
| # Model Card |
| |
| MBart-Urdu-Text-Summarization is a fine-tuned MBart model designed for summarizing Urdu text. It leverages the multilingual capabilities of MBart to generate concise and accurate summaries for Urdu paragraphs. |
| |
| ## Model Details |
| |
| ### Model Description |
| |
| This model is based on the MBart architecture, which is a sequence-to-sequence model pre-trained on multilingual data. It has been fine-tuned specifically for Urdu text summarization tasks. The model is capable of understanding and generating text in both English and Urdu, making it suitable for multilingual applications. |
| |
| ### Model Sources [optional] |
| |
| - **Repository:** [https://github.com/WaliMuhammadAhmad/UrduTextSummarizationUsingm-BART] |
| - **Paper [Multilingual Denoising Pre-training for Neural Machine Translation]:** [https://arxiv.org/abs/2001.08210] |
| |
| ## Uses |
| |
| ### Direct Use |
| |
| This model can be used directly for Urdu text summarization tasks. It is suitable for applications such as news summarization, document summarization, and content generation. |
| |
| ### Downstream Use [optional] |
| |
| The model can be fine-tuned for specific downstream tasks such as sentiment analysis, question answering, or machine translation for Urdu and English. |
| |
| ### Out-of-Scope Use |
| |
| This model is not intended for generating biased, harmful, or misleading content. It should not be used for tasks outside of text summarization without proper fine-tuning and evaluation. |
| |
| ## Bias, Risks, and Limitations |
| |
| - The model may generate biased or inappropriate content if the input text contains biases. |
| - It is trained on a specific dataset and may not generalize well to other domains or languages. |
| - The model's performance may degrade for very long input texts. |
| |
| ### Recommendations |
| |
| Users should carefully evaluate the model's outputs for biases and appropriateness. Fine-tuning on domain-specific data is recommended for better performance in specialized applications. |
| |
| ## How to Get Started with the Model |
| |
| Use the code below to get started with the model. |
| |
| ```python |
| from transformers import AutoTokenizer, MBartForConditionalGeneration |
| |
| # Load the model and tokenizer |
| model_name = "ihatenlp/MBart-Urdu-Text-Summarization" |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = MBartForConditionalGeneration.from_pretrained(model_name) |
|
|
| # Example input text |
| input_text = "Enter your Urdu paragraph here." |
| |
| # Tokenize and generate summary |
| inputs = tokenizer(input_text, return_tensors="pt") |
| summary_ids = model.generate(inputs["input_ids"], max_length=50, num_beams=4, early_stopping=True) |
| summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) |
| |
| print("Summary:", summary) |
| ``` |
| |
| ## Environmental Impact |
| |
| Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
| |
| ## Citation [optional] |
| |
| **BibTeX:** |
| |
| ```bibtex |
| @misc{liu2020multilingualdenoisingpretrainingneural, |
| title={Multilingual Denoising Pre-training for Neural Machine Translation}, |
| author={Yinhan Liu and Jiatao Gu and Naman Goyal and Xian Li and Sergey Edunov and Marjan Ghazvininejad and Mike Lewis and Luke Zettlemoyer}, |
| year={2020}, |
| eprint={2001.08210}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2001.08210}, |
| } |
| ``` |
| |
| ## Model Card Authors [optional] |
| |
| - **Wali Muhammad Ahmad** |
| - **Muhammad Labeeb Tariq** |
| |
| ## Model Card Contact |
| |
| - **Email:** [wali.muhammad.ahmad@gmail.com] |
| - **Hugging Face Profile:** [Wali Muhammad Ahmad](https://huggingface.co/ihatenlp) |