Qwen3-1.7B PubMed Summarization (QLoRA)
Model Description and Intended Use
This is a QLoRA-fine-tuned version of Qwen/Qwen3-1.7B optimized for scientific article summarization, specifically for generating concise, factual summaries of biomedical research papers from the PubMed corpus.
The model is intended for:
- Automated summarization of biomedical literature sections (e.g., Methods, Results)
- Assisting researchers in rapid literature review and knowledge extraction
- Serving as a domain-adapted base for further fine-tuning on clinical or life-science tasks
This is a causal language model adapted via supervised instruction fine-tuning (SFT), not a chat model β use the prescribed prompt format for optimal results.
Training Data
- Source: Subset of
ccdv/pubmed-summarization - Format: Articleβabstract pairs (scientific paper section β expert-written summary)
- Preprocessing: Raw text, no tokenization; paragraphs preserved with
\n - Splits:
- Train: 10,000 samples
- Validation: 1,000 samples
- Test: 1,000 samples
Each sample contains:
article: Full text of a PubMed paper section (mean ~3,000 tokens)abstract: Expert-written summary (mean ~215 tokens)
π‘ Data was split and saved in Apache Parquet format for efficient loading.
Training Procedure and Hyperparameters
Methodology
- Approach: QLoRA (4-bit Quantized Low-Rank Adaptation)
- Base Model:
Qwen/Qwen3-1.7B(1.7B parameters) - PEFT: LoRA adapters (
r=8,Ξ±=16, dropout=0.05) applied only toq_projandv_proj - Trainable Parameters: ~1.2M (< 0.1% of base)
- Hardware: 1Γ NVIDIA RTX 3900 (24 GB VRAM)
Key Configurations
| Component | Setting |
|---|---|
| Quantization | 4-bit NF4 + double quantization |
| Compute dtype | bfloat16 |
| Max sequence length | 1,024 tokens (768 article + 256 summary) |
| Prompt format | Instruction-style with explicit "Summary:" separator |
| Loss masking | Prompt tokens masked with -100; loss computed only on summary |
| Batch size | 1 (gradient accumulation not used) |
| Optimizer | paged_adamw_8bit |
| Learning rate | 2e-4 (cosine decay) |
| Warmup | 250 steps |
| Early stopping | Patience = 10, Ξ = 1e-4 |
| Eval frequency | Every 200 steps |
Training completed in 1 epoch (~200 steps), with early stopping not triggered.
Evaluation Results
ROUGE Metrics (Test Set, n=1,000)
| Metric | Zero-shot (Base) | After QLoRA | Ξ |
|---|---|---|---|
| ROUGE-1 | 38.03 | 39.75 | +1.72 |
| ROUGE-2 | 12.26 | 15.37 | +3.11 |
| ROUGE-L | 21.35 | 22.21 | +0.86 |
| ROUGE-Lsum | 31.45 | 36.53 | +5.08 |
β ROUGE-Lsum (sentence-level coherence) shows the largest gain β indicating improved structural and factual alignment with reference summaries.
HellaSwag (Zero-shot Commonsense Reasoning)
| Setting | Accuracy |
|---|---|
| Before QLoRA | 47.04% |
| After QLoRA | 46.36% |
| Ξ | β0.68 pp |
π‘ Minimal degradation (<1 pp) confirms no catastrophic forgetting β the model retains strong general language competence while gaining domain-specific summarization skills.
Limitations and Known Issues
- Scope: Trained on section-level summaries from PubMed; may underperform on full papers or non-biomedical texts
- Technical depth: May omit highly specialized terminology or nuanced statistical findings
- Hallucination risk: Like all generative models, may produce plausible but inaccurate statements on rare entities
- Language: English-only; not evaluated on multilingual inputs
- Bias: Inherits biases from PubMed corpus (e.g., publication bias, Western-centric studies)
- Safety: Not aligned for safety or refusal behavior β unsuitable for direct clinical decision support
Usage
Loading and Inference
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
# Load base model in 4-bit
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-1.7B",
quantization_config=bnb_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
# Load LoRA adapters
model = PeftModel.from_pretrained(
base_model,
"GermanovDev/qwen3-pubmed-summarization"
)
# Inference
prompt = """You are a helpful assistant who writes concise, factual summaries of articles. Summarize the following article into a few sentences.
Article:
Recent meta-analyses confirm that SGLT2 inhibitors significantly reduce hospitalization for heart failure in patients with type 2 diabetes, independent of glycemic control.
Summary:"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=128,
do_sample=False,
pad_token_id=tokenizer.eos_token_id
)
# Extract generated summary (skip prompt)
summary = tokenizer.decode(
outputs[0][inputs.input_ids.shape[1]:],
skip_special_tokens=True
)
print(summary)
# β "SGLT2 inhibitors reduce heart failure hospitalizations in type 2 diabetes patients, regardless of blood sugar control."
Prompt Template (Required)
Always use this exact format:
You are a helpful assistant who writes concise, factual summaries of articles. Summarize the following article into a few sentences.
Article:
{full_article_text}
Summary:
β οΈ Do not append the ground-truth abstract during inference.
Citation
@misc{germanov2025qwen3_pubmed_qlora,
author = {Andrei Germanov},
title = {{Qwen3-1.7B PubMed Summarization via QLoRA}},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/GermanovDev/qwen3-pubmed-summarization}},
doi = {10.57967/hf.00000000}
}
Built upon: