ccdv/arxiv-summarization
Viewer • Updated • 432k • 13.9k • 124
How to use gabe-zhang/Llama-PaperSummarization-LoRA with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
model = PeftModel.from_pretrained(base_model, "gabe-zhang/Llama-PaperSummarization-LoRA")How to use gabe-zhang/Llama-PaperSummarization-LoRA with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "summarization" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("summarization", model="gabe-zhang/Llama-PaperSummarization-LoRA") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("gabe-zhang/Llama-PaperSummarization-LoRA", dtype="auto")A LoRA fine-tuned adapter for scientific paper summarization, built on meta-llama/Llama-3.2-1B-Instruct.
Evaluated on 6,440 test samples with beam search (beam size = 4):
| Model | ROUGE-1 | ROUGE-2 | ROUGE-3 | ROUGE-L |
|---|---|---|---|---|
| Llama-3.2-1B-Instruct (baseline) | 36.69 | 7.47 | 1.95 | 19.36 |
| Llama-PaperSummarization-LoRA | 41.56 | 11.31 | 2.67 | 21.86 |
+51% ROUGE-2 and +37% ROUGE-3 improvement over baseline.
import torch
from transformers import LlamaForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = LlamaForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-1B-Instruct",
dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(
base_model,
"gabe-zhang/Llama-PaperSummarization-LoRA"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
| Parameter | Value |
|---|---|
| Base Model | Llama-3.2-1B-Instruct (1.3GB) |
| LoRA Rank | 8 |
| Target Modules | q_proj, v_proj |
| Trainable Parameters | ~850K (0.07%) |
| Context Length | 10,182 tokens |
| Gradient Accumulation | 4 steps |
| Training Steps | 5,000 |
| Evaluation Interval | Every 20 steps |
| Training Time | ~28 hours on RTX A6000 |
Fine-tuned on 10% of ccdv/arxiv-summarization:
| Split | Samples | Avg. Article Tokens | Avg. Abstract Tokens |
|---|---|---|---|
| Train | ~20,000 | 6,038 | 299 |
| Validation | ~640 | 5,894 | 172 |
| Test | 6,440 | 5,905 | 174 |
github.com/gabe-zhang/paper2summary
Built with Llama.
Base model
meta-llama/Llama-3.2-1B-Instruct