--- language: en base_model: google/flan-t5-base library_name: peft tags: - base_model:adapter:google/flan-t5-base - lora - transformers - summarization - abstractive-summarization - generated_from_trainer model-index: - name: Abstractive Style Summarizer results: [] datasets: - xsum - cnn_dailymail - multi_news license: mit --- # Abstractive Style Summarizer This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) using PEFT (LoRA). It is designed to generate abstractive summaries in three distinct styles: **Harsh** (concise), **Balanced** (standard), and **Detailed** (comprehensive). ## Model Details ### Model Description - **Model type:** Sequence-to-Sequence Transformer (T5) - **Language(s):** English - **License:** MIT - **Finetuned from model:** google/flan-t5-base - **Training Method:** PEFT (LoRA) ### Model Sources - **Repository:** [Flatten](https://github.com/LityoPS/Flatten) - **Base Model:** [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) ## Uses ### Direct Use The model interprets a prefixed prompt to determine the style of the summary. - **Harsh**: Generates very short, punchy summaries (approx. 35% of input length). - **Balanced**: Generates standard news summaries (approx. 50% of input length). - **Detailed**: Generates in-depth summaries (approx. 70% of input length). ### Prompt Format The input text should be prefixed with the desired style: ``` Summarize {Style}: {Input Text} ``` Example: `Summarize Harsh: The Walt Disney Co. announced...` ## Training Details ### Training Data The model was trained on a combined dataset of 12,000 samples, split into 80% Train, 10% Validation, and 10% Test. | Style | Source Dataset | Size | | :--- | :--- | :--- | | **Harsh** | [XSum](https://huggingface.co/datasets/xsum) | 4000 | | **Balanced** | [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) | 4000 | | **Detailed** | [Multi-News](https://huggingface.co/datasets/multi_news) | 4000 | ### Training Procedure #### Training Hyperparameters - **Learning Rate:** 5e-4 - **Batch Size:** 4 per device - **Gradient Accumulation Steps:** 2 - **Num Epochs:** 5 - **Optimizer:** AdamW - **LR Scheduler:** Linear with warmup (ratio 0.05) - **Mixed Precision:** BF16 #### LoRA Configuration - **r:** 32 - **lora_alpha:** 64 - **lora_dropout:** 0.05 - **target_modules:** ["q", "k", "v", "o"] - **bias:** "none" - **task_type:** "SEQ_2_SEQ_LM" ### Evaluation Results Evaluated on the held-out test set (1,200 samples) at Step 6000. | Metric | Score | | :--- | :--- | | **ROUGE-1** | 0.3925 | | **ROUGE-2** | 0.1608 | | **ROUGE-L** | 0.2776 | | **Validation Loss** | 0.7824 | ## Environmental Impact - **Hardware Type:** CUDA-enabled GPU - **Compute:** LoRA fine-tuning (Parameters: 7M trainable / 254M total) ## Framework Versions - Datasets==3.6.0 - Pytorch>=2.5.1 - Transformers>=4.36.0 - PEFT>=0.8.0