| library_name: transformers | |
| license: mit | |
| model_name: MBart-Urdu-Text-Summarization | |
| pipeline_tag: summarization | |
| tags: | |
| - text-generation | |
| - mbart | |
| - nlp | |
| - transformers | |
| - text-generation-inference | |
| author: Wali Muhammad Ahmad | |
| private: false | |
| gated: false | |
| inference: true | |
| mask_token: <mask> | |
| widget_data: | |
| text: Enter your para here | |
| transformers_info: | |
| auto_class: MBartForConditionalGeneration | |
| processor: AutoTokenizer | |
| language: | |
| - en | |
| - ur | |
| --- | |
| # Model Card | |
| MBart-Urdu-Text-Summarization is a fine-tuned MBart model designed for summarizing Urdu text. It leverages the multilingual capabilities of MBart to generate concise and accurate summaries for Urdu paragraphs. | |
| ## Model Details | |
| ### Model Description | |
| This model is based on the MBart architecture, which is a sequence-to-sequence model pre-trained on multilingual data. It has been fine-tuned specifically for Urdu text summarization tasks. The model is capable of understanding and generating text in both English and Urdu, making it suitable for multilingual applications. | |
| ### Model Sources [optional] | |
| - **Repository:** [https://github.com/WaliMuhammadAhmad/UrduTextSummarizationUsingm-BART] | |
| - **Paper [Multilingual Denoising Pre-training for Neural Machine Translation]:** [https://arxiv.org/abs/2001.08210] | |
| ## Uses | |
| ### Direct Use | |
| This model can be used directly for Urdu text summarization tasks. It is suitable for applications such as news summarization, document summarization, and content generation. | |
| ### Downstream Use [optional] | |
| The model can be fine-tuned for specific downstream tasks such as sentiment analysis, question answering, or machine translation for Urdu and English. | |
| ### Out-of-Scope Use | |
| This model is not intended for generating biased, harmful, or misleading content. It should not be used for tasks outside of text summarization without proper fine-tuning and evaluation. | |
| ## Bias, Risks, and Limitations | |
| - The model may generate biased or inappropriate content if the input text contains biases. | |
| - It is trained on a specific dataset and may not generalize well to other domains or languages. | |
| - The model's performance may degrade for very long input texts. | |
| ### Recommendations | |
| Users should carefully evaluate the model's outputs for biases and appropriateness. Fine-tuning on domain-specific data is recommended for better performance in specialized applications. | |
| ## How to Get Started with the Model | |
| Use the code below to get started with the model. | |
| ```python | |
| from transformers import AutoTokenizer, MBartForConditionalGeneration | |
| # Load the model and tokenizer | |
| model_name = "ihatenlp/MBart-Urdu-Text-Summarization" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = MBartForConditionalGeneration.from_pretrained(model_name) | |
| # Example input text | |
| input_text = "Enter your Urdu paragraph here." | |
| # Tokenize and generate summary | |
| inputs = tokenizer(input_text, return_tensors="pt") | |
| summary_ids = model.generate(inputs["input_ids"], max_length=50, num_beams=4, early_stopping=True) | |
| summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) | |
| print("Summary:", summary) | |
| ``` | |
| ## Environmental Impact | |
| Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). | |
| ## Citation [optional] | |
| **BibTeX:** | |
| ```bibtex | |
| @misc{liu2020multilingualdenoisingpretrainingneural, | |
| title={Multilingual Denoising Pre-training for Neural Machine Translation}, | |
| author={Yinhan Liu and Jiatao Gu and Naman Goyal and Xian Li and Sergey Edunov and Marjan Ghazvininejad and Mike Lewis and Luke Zettlemoyer}, | |
| year={2020}, | |
| eprint={2001.08210}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2001.08210}, | |
| } | |
| ``` | |
| ## Model Card Authors [optional] | |
| - **Wali Muhammad Ahmad** | |
| - **Muhammad Labeeb Tariq** | |
| ## Model Card Contact | |
| - **Email:** [wali.muhammad.ahmad@gmail.com] | |
| - **Hugging Face Profile:** [Wali Muhammad Ahmad](https://huggingface.co/ihatenlp) |