Qwen3-1.7B PubMed Summarization (QLoRA)

Model Description and Intended Use

This is a QLoRA-fine-tuned version of Qwen/Qwen3-1.7B optimized for scientific article summarization, specifically for generating concise, factual summaries of biomedical research papers from the PubMed corpus.

The model is intended for:

  • Automated summarization of biomedical literature sections (e.g., Methods, Results)
  • Assisting researchers in rapid literature review and knowledge extraction
  • Serving as a domain-adapted base for further fine-tuning on clinical or life-science tasks

This is a causal language model adapted via supervised instruction fine-tuning (SFT), not a chat model β€” use the prescribed prompt format for optimal results.


Training Data

  • Source: Subset of ccdv/pubmed-summarization
  • Format: Article–abstract pairs (scientific paper section β†’ expert-written summary)
  • Preprocessing: Raw text, no tokenization; paragraphs preserved with \n
  • Splits:
    • Train: 10,000 samples
    • Validation: 1,000 samples
    • Test: 1,000 samples

Each sample contains:

  • article: Full text of a PubMed paper section (mean ~3,000 tokens)
  • abstract: Expert-written summary (mean ~215 tokens)

πŸ’‘ Data was split and saved in Apache Parquet format for efficient loading.


Training Procedure and Hyperparameters

Methodology

  • Approach: QLoRA (4-bit Quantized Low-Rank Adaptation)
  • Base Model: Qwen/Qwen3-1.7B (1.7B parameters)
  • PEFT: LoRA adapters (r=8, Ξ±=16, dropout=0.05) applied only to q_proj and v_proj
  • Trainable Parameters: ~1.2M (< 0.1% of base)
  • Hardware: 1Γ— NVIDIA RTX 3900 (24 GB VRAM)

Key Configurations

Component Setting
Quantization 4-bit NF4 + double quantization
Compute dtype bfloat16
Max sequence length 1,024 tokens (768 article + 256 summary)
Prompt format Instruction-style with explicit "Summary:" separator
Loss masking Prompt tokens masked with -100; loss computed only on summary
Batch size 1 (gradient accumulation not used)
Optimizer paged_adamw_8bit
Learning rate 2e-4 (cosine decay)
Warmup 250 steps
Early stopping Patience = 10, Ξ” = 1e-4
Eval frequency Every 200 steps

Training completed in 1 epoch (~200 steps), with early stopping not triggered.


Evaluation Results

ROUGE Metrics (Test Set, n=1,000)

Metric Zero-shot (Base) After QLoRA Ξ”
ROUGE-1 38.03 39.75 +1.72
ROUGE-2 12.26 15.37 +3.11
ROUGE-L 21.35 22.21 +0.86
ROUGE-Lsum 31.45 36.53 +5.08

βœ… ROUGE-Lsum (sentence-level coherence) shows the largest gain β€” indicating improved structural and factual alignment with reference summaries.

HellaSwag (Zero-shot Commonsense Reasoning)

Setting Accuracy
Before QLoRA 47.04%
After QLoRA 46.36%
Ξ” –0.68 pp

🟑 Minimal degradation (<1 pp) confirms no catastrophic forgetting β€” the model retains strong general language competence while gaining domain-specific summarization skills.


Limitations and Known Issues

  • Scope: Trained on section-level summaries from PubMed; may underperform on full papers or non-biomedical texts
  • Technical depth: May omit highly specialized terminology or nuanced statistical findings
  • Hallucination risk: Like all generative models, may produce plausible but inaccurate statements on rare entities
  • Language: English-only; not evaluated on multilingual inputs
  • Bias: Inherits biases from PubMed corpus (e.g., publication bias, Western-centric studies)
  • Safety: Not aligned for safety or refusal behavior β€” unsuitable for direct clinical decision support

Usage

Loading and Inference

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Load base model in 4-bit
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-1.7B",
    quantization_config=bnb_config,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")

# Load LoRA adapters
model = PeftModel.from_pretrained(
    base_model, 
    "GermanovDev/qwen3-pubmed-summarization"
)

# Inference
prompt = """You are a helpful assistant who writes concise, factual summaries of articles. Summarize the following article into a few sentences.
Article:
Recent meta-analyses confirm that SGLT2 inhibitors significantly reduce hospitalization for heart failure in patients with type 2 diabetes, independent of glycemic control.
Summary:"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=False,
    pad_token_id=tokenizer.eos_token_id
)

# Extract generated summary (skip prompt)
summary = tokenizer.decode(
    outputs[0][inputs.input_ids.shape[1]:], 
    skip_special_tokens=True
)
print(summary)
# β†’ "SGLT2 inhibitors reduce heart failure hospitalizations in type 2 diabetes patients, regardless of blood sugar control."

Prompt Template (Required)

Always use this exact format:

You are a helpful assistant who writes concise, factual summaries of articles. Summarize the following article into a few sentences.
Article:
{full_article_text}
Summary:

⚠️ Do not append the ground-truth abstract during inference.


Citation

@misc{germanov2025qwen3_pubmed_qlora,
  author = {Andrei Germanov},
  title = {{Qwen3-1.7B PubMed Summarization via QLoRA}},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/GermanovDev/qwen3-pubmed-summarization}},
  doi = {10.57967/hf.00000000}
}

Built upon:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for GermanovDev/qwen3-pubmed-summarization

Finetuned
Qwen/Qwen3-1.7B
Adapter
(255)
this model