File size: 2,932 Bytes
0be4fba a0aaaa3 0be4fba a0aaaa3 0be4fba 1a71ebe 0be4fba |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
---
language: en
base_model: google/flan-t5-base
library_name: peft
tags:
- base_model:adapter:google/flan-t5-base
- lora
- transformers
- summarization
- abstractive-summarization
- generated_from_trainer
model-index:
- name: Abstractive Style Summarizer
results: []
datasets:
- xsum
- cnn_dailymail
- multi_news
license: mit
---
# Abstractive Style Summarizer
This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) using PEFT (LoRA). It is designed to generate abstractive summaries in three distinct styles: **Harsh** (concise), **Balanced** (standard), and **Detailed** (comprehensive).
## Model Details
### Model Description
- **Model type:** Sequence-to-Sequence Transformer (T5)
- **Language(s):** English
- **License:** MIT
- **Finetuned from model:** google/flan-t5-base
- **Training Method:** PEFT (LoRA)
### Model Sources
- **Repository:** [Flatten](https://github.com/LityoPS/Flatten)
- **Base Model:** [google/flan-t5-base](https://huggingface.co/google/flan-t5-base)
## Uses
### Direct Use
The model interprets a prefixed prompt to determine the style of the summary.
- **Harsh**: Generates very short, punchy summaries (approx. 35% of input length).
- **Balanced**: Generates standard news summaries (approx. 50% of input length).
- **Detailed**: Generates in-depth summaries (approx. 70% of input length).
### Prompt Format
The input text should be prefixed with the desired style:
```
Summarize {Style}: {Input Text}
```
Example: `Summarize Harsh: The Walt Disney Co. announced...`
## Training Details
### Training Data
The model was trained on a combined dataset of 12,000 samples, split into 80% Train, 10% Validation, and 10% Test.
| Style | Source Dataset | Size |
| :--- | :--- | :--- |
| **Harsh** | [XSum](https://huggingface.co/datasets/xsum) | 4000 |
| **Balanced** | [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) | 4000 |
| **Detailed** | [Multi-News](https://huggingface.co/datasets/multi_news) | 4000 |
### Training Procedure
#### Training Hyperparameters
- **Learning Rate:** 5e-4
- **Batch Size:** 4 per device
- **Gradient Accumulation Steps:** 2
- **Num Epochs:** 5
- **Optimizer:** AdamW
- **LR Scheduler:** Linear with warmup (ratio 0.05)
- **Mixed Precision:** BF16
#### LoRA Configuration
- **r:** 32
- **lora_alpha:** 64
- **lora_dropout:** 0.05
- **target_modules:** ["q", "k", "v", "o"]
- **bias:** "none"
- **task_type:** "SEQ_2_SEQ_LM"
### Evaluation Results
Evaluated on the held-out test set (1,200 samples) at Step 6000.
| Metric | Score |
| :--- | :--- |
| **ROUGE-1** | 0.3925 |
| **ROUGE-2** | 0.1608 |
| **ROUGE-L** | 0.2776 |
| **Validation Loss** | 0.7824 |
## Environmental Impact
- **Hardware Type:** CUDA-enabled GPU
- **Compute:** LoRA fine-tuning (Parameters: 7M trainable / 254M total)
## Framework Versions
- Datasets==3.6.0
- Pytorch>=2.5.1
- Transformers>=4.36.0
- PEFT>=0.8.0 |