lityops
/

Abstractive-Style-Summarizer

abstractive-summarization

Generated from Trainer

Model card Files Files and versions

Abstractive-Style-Summarizer / README.md

lityops's picture

Update README.md

a0aaaa3 verified 25 days ago

|

history blame contribute delete

2.93 kB

	---
	language: en
	base_model: google/flan-t5-base
	library_name: peft
	tags:
	- base_model:adapter:google/flan-t5-base
	- lora
	- transformers
	- summarization
	- abstractive-summarization
	- generated_from_trainer
	model-index:
	- name: Abstractive Style Summarizer
	results: []
	datasets:
	- xsum
	- cnn_dailymail
	- multi_news
	license: mit
	---

	# Abstractive Style Summarizer

	This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) using PEFT (LoRA). It is designed to generate abstractive summaries in three distinct styles: Harsh (concise), Balanced (standard), and Detailed (comprehensive).

	## Model Details

	### Model Description

	- Model type: Sequence-to-Sequence Transformer (T5)
	- Language(s): English
	- License: MIT
	- Finetuned from model: google/flan-t5-base
	- Training Method: PEFT (LoRA)

	### Model Sources

	- Repository: [Flatten](https://github.com/LityoPS/Flatten)
	- Base Model: [google/flan-t5-base](https://huggingface.co/google/flan-t5-base)

	## Uses

	### Direct Use

	The model interprets a prefixed prompt to determine the style of the summary.
	- Harsh: Generates very short, punchy summaries (approx. 35% of input length).
	- Balanced: Generates standard news summaries (approx. 50% of input length).
	- Detailed: Generates in-depth summaries (approx. 70% of input length).

	### Prompt Format

	The input text should be prefixed with the desired style:
	```
	Summarize {Style}: {Input Text}
	```
	Example: `Summarize Harsh: The Walt Disney Co. announced...`

	## Training Details

	### Training Data

	The model was trained on a combined dataset of 12,000 samples, split into 80% Train, 10% Validation, and 10% Test.

	\| Style \| Source Dataset \| Size \|
	\| :--- \| :--- \| :--- \|
	\| Harsh \| [XSum](https://huggingface.co/datasets/xsum) \| 4000 \|
	\| Balanced \| [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) \| 4000 \|
	\| Detailed \| [Multi-News](https://huggingface.co/datasets/multi_news) \| 4000 \|

	### Training Procedure

	#### Training Hyperparameters

	- Learning Rate: 5e-4
	- Batch Size: 4 per device
	- Gradient Accumulation Steps: 2
	- Num Epochs: 5
	- Optimizer: AdamW
	- LR Scheduler: Linear with warmup (ratio 0.05)
	- Mixed Precision: BF16

	#### LoRA Configuration

	- r: 32
	- lora_alpha: 64
	- lora_dropout: 0.05
	- target_modules: ["q", "k", "v", "o"]
	- bias: "none"
	- task_type: "SEQ_2_SEQ_LM"

	### Evaluation Results

	Evaluated on the held-out test set (1,200 samples) at Step 6000.

	\| Metric \| Score \|
	\| :--- \| :--- \|
	\| ROUGE-1 \| 0.3925 \|
	\| ROUGE-2 \| 0.1608 \|
	\| ROUGE-L \| 0.2776 \|
	\| Validation Loss \| 0.7824 \|

	## Environmental Impact

	- Hardware Type: CUDA-enabled GPU
	- Compute: LoRA fine-tuning (Parameters: 7M trainable / 254M total)

	## Framework Versions

	- Datasets==3.6.0
	- Pytorch>=2.5.1
	- Transformers>=4.36.0
	- PEFT>=0.8.0