Update README.md (#1)

51a4d2f verified over 1 year ago

4.04 kB

	library_name: transformers
	license: mit
	model_name: MBart-Urdu-Text-Summarization
	pipeline_tag: summarization
	tags:
	- text-generation
	- mbart
	- nlp
	- transformers
	- text-generation-inference
	author: Wali Muhammad Ahmad
	private: false
	gated: false
	inference: true
	mask_token: <mask>
	widget_data:
	text: Enter your para here
	transformers_info:
	auto_class: MBartForConditionalGeneration
	processor: AutoTokenizer
	language:
	- en
	- ur
	---

	# Model Card

	MBart-Urdu-Text-Summarization is a fine-tuned MBart model designed for summarizing Urdu text. It leverages the multilingual capabilities of MBart to generate concise and accurate summaries for Urdu paragraphs.

	## Model Details

	### Model Description

	This model is based on the MBart architecture, which is a sequence-to-sequence model pre-trained on multilingual data. It has been fine-tuned specifically for Urdu text summarization tasks. The model is capable of understanding and generating text in both English and Urdu, making it suitable for multilingual applications.

	### Model Sources [optional]

	- Repository: [https://github.com/WaliMuhammadAhmad/UrduTextSummarizationUsingm-BART]
	- Paper [Multilingual Denoising Pre-training for Neural Machine Translation]: [https://arxiv.org/abs/2001.08210]

	## Uses

	### Direct Use

	This model can be used directly for Urdu text summarization tasks. It is suitable for applications such as news summarization, document summarization, and content generation.

	### Downstream Use [optional]

	The model can be fine-tuned for specific downstream tasks such as sentiment analysis, question answering, or machine translation for Urdu and English.

	### Out-of-Scope Use

	This model is not intended for generating biased, harmful, or misleading content. It should not be used for tasks outside of text summarization without proper fine-tuning and evaluation.

	## Bias, Risks, and Limitations

	- The model may generate biased or inappropriate content if the input text contains biases.
	- It is trained on a specific dataset and may not generalize well to other domains or languages.
	- The model's performance may degrade for very long input texts.

	### Recommendations

	Users should carefully evaluate the model's outputs for biases and appropriateness. Fine-tuning on domain-specific data is recommended for better performance in specialized applications.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import AutoTokenizer, MBartForConditionalGeneration

	# Load the model and tokenizer
	model_name = "ihatenlp/MBart-Urdu-Text-Summarization"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = MBartForConditionalGeneration.from_pretrained(model_name)

	# Example input text
	input_text = "Enter your Urdu paragraph here."

	# Tokenize and generate summary
	inputs = tokenizer(input_text, return_tensors="pt")
	summary_ids = model.generate(inputs["input_ids"], max_length=50, num_beams=4, early_stopping=True)
	summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

	print("Summary:", summary)
	```

	## Environmental Impact

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	## Citation [optional]

	BibTeX:

	```bibtex
	@misc{liu2020multilingualdenoisingpretrainingneural,
	title={Multilingual Denoising Pre-training for Neural Machine Translation},
	author={Yinhan Liu and Jiatao Gu and Naman Goyal and Xian Li and Sergey Edunov and Marjan Ghazvininejad and Mike Lewis and Luke Zettlemoyer},
	year={2020},
	eprint={2001.08210},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2001.08210},
	}
	```

	## Model Card Authors [optional]

	- Wali Muhammad Ahmad
	- Muhammad Labeeb Tariq

	## Model Card Contact

	- Email: [wali.muhammad.ahmad@gmail.com]
	- Hugging Face Profile: [Wali Muhammad Ahmad](https://huggingface.co/ihatenlp)