RamshaAnwar
/

summarization_model_trained_on_reduced_dataset

text2text-generation

text-summarization

fine-tuned-model

Model card Files Files and versions

Metrics Training metrics Community

summarization_model_trained_on_reduced_dataset / README.md

RamshaAnwar's picture

Update README.md

3285c51 verified about 2 months ago

|

history blame contribute delete

3.13 kB

	---
	library_name: transformers
	base_model: csebuetnlp/mT5_m2o_english_crossSum
	tags:
	- text-summarization
	- mt5
	- multilingual
	- fine-tuned-model
	model-index:
	- name: finetuned_text_summarization_model
	results: []
	---



	# Finetuned Text Summarization Model

	This repository contains a fine-tuned version of [csebuetnlp/mT5_m2o_english_crossSum](https://huggingface.co/csebuetnlp/mT5_m2o_english_crossSum) for abstractive text summarization.
	The model has been optimized specifically for generating concise, coherent English summaries from long-form text.

	---

	## Model Description

	This model is based on the multilingual T5 architecture (mT5) and has been fine-tuned to improve performance on English abstractive summarization tasks. It is capable of generating well-structured summaries that preserve essential meaning while reducing verbosity.

	The model uses the encoder–decoder architecture of mT5 and benefits from the pretrained multilingual representations, which can also help generalize to noisy or domain-specific English text.

	---

	## Intended Uses & Limitations

	### Intended Uses
	- Abstractive summarization of news articles, reports, social media text, academic paragraphs, and general long-form English content.
	- Use in applications such as:
	- content condensation tools
	- research assistants
	- note-generation tools
	- automated documentation systems

	### Limitations
	- May hallucinate facts in cases where the input is ambiguous or overly short.
	- Not optimized for:
	- non-English summarization
	- extractive summarization
	- legal, medical, or highly specialized summaries requiring domain accuracy
	- Summary quality may decline on extremely long inputs unless chunking is applied.

	---

	## Training and Evaluation Data

	This model was trained on a combined dataset of English news, long-form articles, and instructional text.
	Data was preprocessed to remove duplicates, extremely short samples, and malformed text.

	The validation set consisted of structurally similar English articles to ensure reliable ROUGE evaluation.


	---

	## Training Procedure

	### Training Hyperparameters
	The following hyperparameters were used:

	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam (β1 = 0.9, β2 = 0.999, ε = 1e-08)
	- lr_scheduler_type: linear
	- num_epochs: 3
	- mixed_precision_training: Native AMP

	---

	## Training Results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| RougeL \| RougeLSum \| Generated Length \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:------:\|:------:\|:---------:\|:----------------:\|
	\| 4.1823 \| 1.0 \| 190 \| 3.7432 \| 0.1825 \| 0.0547 \| 0.1382 \| 0.1383 \| 33.99 \|
	\| 3.5210 \| 2.0 \| 380 \| 3.1028 \| 0.2496 \| 0.0913 \| 0.1987 \| 0.1994 \| 36.41 \|
	\| 2.9844 \| 3.0 \| 570 \| 2.8471 \| 0.2874 \| 0.1185 \| 0.2312 \| 0.2320 \| 37.22 \|

	---

	## Framework Versions
	- Transformers 4.44.2
	- PyTorch 2.4.1+cu121
	- Datasets 3.0.0
	- Tokenizers 0.19.1

	```