| | --- |
| | language: en |
| | base_model: google/flan-t5-base |
| | library_name: peft |
| | tags: |
| | - base_model:adapter:google/flan-t5-base |
| | - lora |
| | - transformers |
| | - summarization |
| | - abstractive-summarization |
| | - generated_from_trainer |
| | model-index: |
| | - name: Abstractive Style Summarizer |
| | results: [] |
| | datasets: |
| | - xsum |
| | - cnn_dailymail |
| | - multi_news |
| | license: mit |
| | --- |
| | |
| | # Abstractive Style Summarizer |
| |
|
| | This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) using PEFT (LoRA). It is designed to generate abstractive summaries in three distinct styles: **Harsh** (concise), **Balanced** (standard), and **Detailed** (comprehensive). |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | - **Model type:** Sequence-to-Sequence Transformer (T5) |
| | - **Language(s):** English |
| | - **License:** MIT |
| | - **Finetuned from model:** google/flan-t5-base |
| | - **Training Method:** PEFT (LoRA) |
| |
|
| | ### Model Sources |
| |
|
| | - **Repository:** [Flatten](https://github.com/LityoPS/Flatten) |
| | - **Base Model:** [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use |
| |
|
| | The model interprets a prefixed prompt to determine the style of the summary. |
| | - **Harsh**: Generates very short, punchy summaries (approx. 35% of input length). |
| | - **Balanced**: Generates standard news summaries (approx. 50% of input length). |
| | - **Detailed**: Generates in-depth summaries (approx. 70% of input length). |
| |
|
| | ### Prompt Format |
| |
|
| | The input text should be prefixed with the desired style: |
| | ``` |
| | Summarize {Style}: {Input Text} |
| | ``` |
| | Example: `Summarize Harsh: The Walt Disney Co. announced...` |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | The model was trained on a combined dataset of 12,000 samples, split into 80% Train, 10% Validation, and 10% Test. |
| |
|
| | | Style | Source Dataset | Size | |
| | | :--- | :--- | :--- | |
| | | **Harsh** | [XSum](https://huggingface.co/datasets/xsum) | 4000 | |
| | | **Balanced** | [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) | 4000 | |
| | | **Detailed** | [Multi-News](https://huggingface.co/datasets/multi_news) | 4000 | |
| |
|
| | ### Training Procedure |
| |
|
| | #### Training Hyperparameters |
| |
|
| | - **Learning Rate:** 5e-4 |
| | - **Batch Size:** 4 per device |
| | - **Gradient Accumulation Steps:** 2 |
| | - **Num Epochs:** 5 |
| | - **Optimizer:** AdamW |
| | - **LR Scheduler:** Linear with warmup (ratio 0.05) |
| | - **Mixed Precision:** BF16 |
| |
|
| | #### LoRA Configuration |
| |
|
| | - **r:** 32 |
| | - **lora_alpha:** 64 |
| | - **lora_dropout:** 0.05 |
| | - **target_modules:** ["q", "k", "v", "o"] |
| | - **bias:** "none" |
| | - **task_type:** "SEQ_2_SEQ_LM" |
| | |
| | ### Evaluation Results |
| | |
| | Evaluated on the held-out test set (1,200 samples) at Step 6000. |
| | |
| | | Metric | Score | |
| | | :--- | :--- | |
| | | **ROUGE-1** | 0.3925 | |
| | | **ROUGE-2** | 0.1608 | |
| | | **ROUGE-L** | 0.2776 | |
| | | **Validation Loss** | 0.7824 | |
| | |
| | ## Environmental Impact |
| | |
| | - **Hardware Type:** CUDA-enabled GPU |
| | - **Compute:** LoRA fine-tuning (Parameters: 7M trainable / 254M total) |
| | |
| | ## Framework Versions |
| | |
| | - Datasets==3.6.0 |
| | - Pytorch>=2.5.1 |
| | - Transformers>=4.36.0 |
| | - PEFT>=0.8.0 |