| | --- |
| | library_name: transformers |
| | base_model: csebuetnlp/mT5_m2o_english_crossSum |
| | tags: |
| | - text-summarization |
| | - mt5 |
| | - multilingual |
| | - fine-tuned-model |
| | model-index: |
| | - name: finetuned_text_summarization_model |
| | results: [] |
| | --- |
| | |
| |
|
| |
|
| | # Finetuned Text Summarization Model |
| |
|
| | This repository contains a fine-tuned version of **[csebuetnlp/mT5_m2o_english_crossSum](https://huggingface.co/csebuetnlp/mT5_m2o_english_crossSum)** for abstractive text summarization. |
| | The model has been optimized specifically for generating concise, coherent English summaries from long-form text. |
| |
|
| | --- |
| |
|
| | ## Model Description |
| |
|
| | This model is based on the multilingual T5 architecture (mT5) and has been fine-tuned to improve performance on English abstractive summarization tasks. It is capable of generating well-structured summaries that preserve essential meaning while reducing verbosity. |
| |
|
| | The model uses the encoder–decoder architecture of mT5 and benefits from the pretrained multilingual representations, which can also help generalize to noisy or domain-specific English text. |
| |
|
| | --- |
| |
|
| | ## Intended Uses & Limitations |
| |
|
| | ### **Intended Uses** |
| | - Abstractive summarization of news articles, reports, social media text, academic paragraphs, and general long-form English content. |
| | - Use in applications such as: |
| | - content condensation tools |
| | - research assistants |
| | - note-generation tools |
| | - automated documentation systems |
| |
|
| | ### **Limitations** |
| | - May hallucinate facts in cases where the input is ambiguous or overly short. |
| | - Not optimized for: |
| | - non-English summarization |
| | - extractive summarization |
| | - legal, medical, or highly specialized summaries requiring domain accuracy |
| | - Summary quality may decline on extremely long inputs unless chunking is applied. |
| |
|
| | --- |
| |
|
| | ## Training and Evaluation Data |
| |
|
| | This model was trained on a combined dataset of English news, long-form articles, and instructional text. |
| | Data was preprocessed to remove duplicates, extremely short samples, and malformed text. |
| |
|
| | The validation set consisted of structurally similar English articles to ensure reliable ROUGE evaluation. |
| |
|
| |
|
| | --- |
| |
|
| | ## Training Procedure |
| |
|
| | ### **Training Hyperparameters** |
| | The following hyperparameters were used: |
| |
|
| | - learning_rate: 2e-05 |
| | - train_batch_size: 8 |
| | - eval_batch_size: 8 |
| | - seed: 42 |
| | - optimizer: Adam (β1 = 0.9, β2 = 0.999, ε = 1e-08) |
| | - lr_scheduler_type: linear |
| | - num_epochs: **3** |
| | - mixed_precision_training: Native AMP |
| |
|
| | --- |
| |
|
| | ## Training Results |
| |
|
| | | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | RougeL | RougeLSum | Generated Length | |
| | |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|:----------------:| |
| | | 4.1823 | 1.0 | 190 | 3.7432 | 0.1825 | 0.0547 | 0.1382 | 0.1383 | 33.99 | |
| | | 3.5210 | 2.0 | 380 | 3.1028 | 0.2496 | 0.0913 | 0.1987 | 0.1994 | 36.41 | |
| | | 2.9844 | 3.0 | 570 | 2.8471 | 0.2874 | 0.1185 | 0.2312 | 0.2320 | 37.22 | |
| |
|
| | --- |
| |
|
| | ## Framework Versions |
| | - Transformers 4.44.2 |
| | - PyTorch 2.4.1+cu121 |
| | - Datasets 3.0.0 |
| | - Tokenizers 0.19.1 |
| |
|
| | ``` |
| | |