TL;DR-Sci: Extreme Summarization of Scientific Papers

A LoRA adapter for T5-base that compresses scientific paper abstracts (150-300 words) into single-sentence TLDRs (15-25 words). Trained on the SciTLDR dataset using Parameter-Efficient Fine-Tuning.

Model Details

Base model: t5-base (223M parameters)
PEFT method: LoRA (r=16, α=32, dropout=0.05)
Target modules: q, k, v, o (all attention projections)
Trainable parameters: 3.5M / 223M (1.4%)
Adapter size: ~14MB
Language: English
License: MIT
Training hardware: Google Colab free tier (NVIDIA T4, 15GB VRAM)
Training time: ~30-35 minutes (5 epochs)
Training precision: float32

Why T5-base and not FLAN-T5-base?

We discovered that the google/flan-t5-base checkpoint on HuggingFace has corrupted weight tying — lm_head.weight (norm 3,958) and shared.weight (norm 54,486) are untied with a 14x norm mismatch, causing initial loss of ~9.7 and degenerate outputs. Plain t5-base loads correctly with tied weights and healthy initial loss of ~1.35.

Quick Start

from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load model and tokenizer
repo_id = "ArenaRune/scitldr-t5-base-lora"
config = PeftConfig.from_pretrained(repo_id)
base_model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(base_model, repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model.eval()

# Generate TLDR
abstract = (
    "We propose a new simple network architecture, the Transformer, "
    "based solely on attention mechanisms, dispensing with recurrence "
    "and convolutions entirely. Experiments on two machine translation "
    "tasks show these models to be superior in quality while being more "
    "parallelizable and requiring significantly less time to train."
)

inputs = tokenizer(
    "summarize: " + abstract,
    return_tensors="pt",
    max_length=512,
    truncation=True,
)
outputs = model.generate(**inputs, max_new_tokens=64, num_beams=4)
tldr = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(tldr)

Training Data

Dataset: allenai/scitldr (SciTLDR)
Domain: Computer science research papers
Task: Abstract → single-sentence TLDR
Splits: 1,992 train / 619 validation / 618 test
Input format: "summarize: " + abstract text
Target: First expert-written TLDR per paper

Preprocessing

Source sentences joined into single abstract string
Prepended with T5's native "summarize: " prefix
Tokenized with text_target= for proper decoder-side formatting
Max input length: 512 tokens | Max target length: 64 tokens

Training Procedure

LoRA Configuration

Parameter	Value
Rank (r)	16
Alpha (α)	32
Dropout	0.05
Target modules	q, k, v, o
Bias	none

Hyperparameters

Best configuration selected from grid search over 3 learning rates (1e-4, 3e-4, 5e-4) using ROUGE-L on validation set.

Parameter	Value
Epochs	5
Batch size	8
Optimizer	AdamW (weight_decay=0.01)
LR schedule	Cosine with 100 warmup steps
Gradient clipping	max_norm=1.0
Precision	float32

Evaluation

Metrics

ROUGE-1: Unigram overlap
ROUGE-2: Bigram overlap
ROUGE-L: Longest common subsequence (primary metric)

All scores computed with stemming on 100 test samples.

Comparative Results

Method	Type	ROUGE-1	ROUGE-2	ROUGE-L	Avg Len
Lead sentence	Extractive	0.2594	0.0926	0.1975	23.7
Last sentence	Extractive	0.1526	0.0144	0.1123	29.9
Longest sentence	Extractive	0.1977	0.0456	0.1309	78.9
T5-base (zero-shot)	Generative	0.2955	0.1105	0.2123	41.1
T5-base + LoRA (ours)	Generative	0.3953	0.1931	0.3344	22.1

Improvement over Zero-shot Baseline

Metric	Baseline	Fine-Tuned	Δ	% Change
ROUGE-1	0.2955	0.3953	+0.0999	+33.8%
ROUGE-2	0.1105	0.1931	+0.0826	+74.8%
ROUGE-L	0.2123	0.3344	+0.1221	+57.5%
Avg Length	41.1 words	22.1 words	-19.0	—

The fine-tuned model achieves substantial improvements across all ROUGE metrics while generating outputs closer to the target length range (15-25 words) compared to the verbose zero-shot baseline (41 words).

Uses

Intended Use

Screening tool for researchers scanning large volumes of papers
Rapid literature review and paper triage
Generating paper summaries for reading lists or feeds

Limitations

CS domain only: Trained exclusively on computer science papers. Quality on biomedical, legal, physics, or social science abstracts is untested and likely lower.
Not a replacement for reading: TLDRs may omit critical caveats, overstate findings, or miss nuance. Always read the full abstract before citing.
English only: Cannot process or generate TLDRs in other languages.
No factual verification: May generate plausible-sounding but inaccurate summaries.

Out-of-Scope Use

Generating authoritative summaries for citation without reading the original paper
Medical, legal, or safety-critical applications where omitted details could cause harm
Non-English abstracts

Bias, Risks, and Limitations

Misrepresentation risk: TLDRs may drop qualifiers (e.g., "under controlled conditions") making results appear more general than they are
Domain bias: Reflects CS research conventions; may mishandle terminology from other fields
Temporal bias: Trained on papers from a specific time period; novel terminology may not be handled well
Automation bias: Users may over-rely on TLDRs and stop reading abstracts

Recommendations

Always label outputs as machine-generated
Use as a screening tool only, not as a substitute for reading
Verify key claims against the original abstract
Exercise extra caution when applying to non-CS domains

Environmental Impact

Hardware: NVIDIA Tesla T4 (15GB VRAM)
Training time: ~30-35 minutes
Cloud provider: Google Colab (free tier)
Estimated emissions: Minimal (~0.005 kg CO2eq based on T4 power consumption of ~70W)

Technical Specifications

Model Architecture

Architecture: Encoder-decoder (T5)
Base model parameters: 223M (frozen)
Adapter parameters: 3.5M (trainable)
Total adapter size: ~14MB

Software

Python 3.10+
transformers >= 4.36.0
peft >= 0.7.0
torch >= 2.0.0

Citation

If you use this model, please cite the underlying dataset and techniques:

@inproceedings{cachola2020tldr,
  title={TLDR: Extreme Summarization of Scientific Documents},
  author={Cachola, Isabel and Lo, Kyle and Cohan, Arman and Weld, Daniel},
  booktitle={Findings of EMNLP},
  year={2020}
}

@inproceedings{hu2022lora,
  title={LoRA: Low-Rank Adaptation of Large Language Models},
  author={Hu, Edward and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu},
  booktitle={ICLR},
  year={2022}
}

@article{raffel2020t5,
  title={Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
  author={Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter},
  journal={JMLR},
  year={2020}
}

Model Card Author

ArenaRune

Downloads last month: 4

Model tree for ArenaRune/scitldr-t5-base-lora

Base model

google-t5/t5-base

Adapter

(83)

this model

ArenaRune
/

scitldr-t5-base-lora