TL;DR-Sci: Extreme Summarization of Scientific Papers

A LoRA adapter for T5-base that compresses scientific paper abstracts (150-300 words) into single-sentence TLDRs (15-25 words). Trained on the SciTLDR dataset using Parameter-Efficient Fine-Tuning.

Model Details

  • Base model: t5-base (223M parameters)
  • PEFT method: LoRA (r=16, α=32, dropout=0.05)
  • Target modules: q, k, v, o (all attention projections)
  • Trainable parameters: 3.5M / 223M (1.4%)
  • Adapter size: ~14MB
  • Language: English
  • License: MIT
  • Training hardware: Google Colab free tier (NVIDIA T4, 15GB VRAM)
  • Training time: ~30-35 minutes (5 epochs)
  • Training precision: float32

Why T5-base and not FLAN-T5-base?

We discovered that the google/flan-t5-base checkpoint on HuggingFace has corrupted weight tyinglm_head.weight (norm 3,958) and shared.weight (norm 54,486) are untied with a 14x norm mismatch, causing initial loss of ~9.7 and degenerate outputs. Plain t5-base loads correctly with tied weights and healthy initial loss of ~1.35.

Quick Start

from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load model and tokenizer
repo_id = "ArenaRune/scitldr-t5-base-lora"
config = PeftConfig.from_pretrained(repo_id)
base_model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(base_model, repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model.eval()

# Generate TLDR
abstract = (
    "We propose a new simple network architecture, the Transformer, "
    "based solely on attention mechanisms, dispensing with recurrence "
    "and convolutions entirely. Experiments on two machine translation "
    "tasks show these models to be superior in quality while being more "
    "parallelizable and requiring significantly less time to train."
)

inputs = tokenizer(
    "summarize: " + abstract,
    return_tensors="pt",
    max_length=512,
    truncation=True,
)
outputs = model.generate(**inputs, max_new_tokens=64, num_beams=4)
tldr = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(tldr)

Training Data

  • Dataset: allenai/scitldr (SciTLDR)
  • Domain: Computer science research papers
  • Task: Abstract → single-sentence TLDR
  • Splits: 1,992 train / 619 validation / 618 test
  • Input format: "summarize: " + abstract text
  • Target: First expert-written TLDR per paper

Preprocessing

  1. Source sentences joined into single abstract string
  2. Prepended with T5's native "summarize: " prefix
  3. Tokenized with text_target= for proper decoder-side formatting
  4. Max input length: 512 tokens | Max target length: 64 tokens

Training Procedure

LoRA Configuration

Parameter Value
Rank (r) 16
Alpha (α) 32
Dropout 0.05
Target modules q, k, v, o
Bias none

Hyperparameters

Best configuration selected from grid search over 3 learning rates (1e-4, 3e-4, 5e-4) using ROUGE-L on validation set.

Parameter Value
Epochs 5
Batch size 8
Optimizer AdamW (weight_decay=0.01)
LR schedule Cosine with 100 warmup steps
Gradient clipping max_norm=1.0
Precision float32

Evaluation

Metrics

  • ROUGE-1: Unigram overlap
  • ROUGE-2: Bigram overlap
  • ROUGE-L: Longest common subsequence (primary metric)

All scores computed with stemming on 100 test samples.

Comparative Results

Method Type ROUGE-1 ROUGE-2 ROUGE-L Avg Len
Lead sentence Extractive 0.2594 0.0926 0.1975 23.7
Last sentence Extractive 0.1526 0.0144 0.1123 29.9
Longest sentence Extractive 0.1977 0.0456 0.1309 78.9
T5-base (zero-shot) Generative 0.2955 0.1105 0.2123 41.1
T5-base + LoRA (ours) Generative 0.3953 0.1931 0.3344 22.1

Improvement over Zero-shot Baseline

Metric Baseline Fine-Tuned Δ % Change
ROUGE-1 0.2955 0.3953 +0.0999 +33.8%
ROUGE-2 0.1105 0.1931 +0.0826 +74.8%
ROUGE-L 0.2123 0.3344 +0.1221 +57.5%
Avg Length 41.1 words 22.1 words -19.0

The fine-tuned model achieves substantial improvements across all ROUGE metrics while generating outputs closer to the target length range (15-25 words) compared to the verbose zero-shot baseline (41 words).

Uses

Intended Use

  • Screening tool for researchers scanning large volumes of papers
  • Rapid literature review and paper triage
  • Generating paper summaries for reading lists or feeds

Limitations

  • CS domain only: Trained exclusively on computer science papers. Quality on biomedical, legal, physics, or social science abstracts is untested and likely lower.
  • Not a replacement for reading: TLDRs may omit critical caveats, overstate findings, or miss nuance. Always read the full abstract before citing.
  • English only: Cannot process or generate TLDRs in other languages.
  • No factual verification: May generate plausible-sounding but inaccurate summaries.

Out-of-Scope Use

  • Generating authoritative summaries for citation without reading the original paper
  • Medical, legal, or safety-critical applications where omitted details could cause harm
  • Non-English abstracts

Bias, Risks, and Limitations

  • Misrepresentation risk: TLDRs may drop qualifiers (e.g., "under controlled conditions") making results appear more general than they are
  • Domain bias: Reflects CS research conventions; may mishandle terminology from other fields
  • Temporal bias: Trained on papers from a specific time period; novel terminology may not be handled well
  • Automation bias: Users may over-rely on TLDRs and stop reading abstracts

Recommendations

  • Always label outputs as machine-generated
  • Use as a screening tool only, not as a substitute for reading
  • Verify key claims against the original abstract
  • Exercise extra caution when applying to non-CS domains

Environmental Impact

  • Hardware: NVIDIA Tesla T4 (15GB VRAM)
  • Training time: ~30-35 minutes
  • Cloud provider: Google Colab (free tier)
  • Estimated emissions: Minimal (~0.005 kg CO2eq based on T4 power consumption of ~70W)

Technical Specifications

Model Architecture

  • Architecture: Encoder-decoder (T5)
  • Base model parameters: 223M (frozen)
  • Adapter parameters: 3.5M (trainable)
  • Total adapter size: ~14MB

Software

  • Python 3.10+
  • transformers >= 4.36.0
  • peft >= 0.7.0
  • torch >= 2.0.0

Citation

If you use this model, please cite the underlying dataset and techniques:

@inproceedings{cachola2020tldr,
  title={TLDR: Extreme Summarization of Scientific Documents},
  author={Cachola, Isabel and Lo, Kyle and Cohan, Arman and Weld, Daniel},
  booktitle={Findings of EMNLP},
  year={2020}
}

@inproceedings{hu2022lora,
  title={LoRA: Low-Rank Adaptation of Large Language Models},
  author={Hu, Edward and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu},
  booktitle={ICLR},
  year={2022}
}

@article{raffel2020t5,
  title={Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
  author={Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter},
  journal={JMLR},
  year={2020}
}

Model Card Author

ArenaRune

Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ArenaRune/scitldr-t5-base-lora

Base model

google-t5/t5-base
Adapter
(79)
this model

Dataset used to train ArenaRune/scitldr-t5-base-lora