Qwen3-1.7B PubMed Summarization (QLoRA)

Model Description and Intended Use

This is a QLoRA-fine-tuned version of Qwen/Qwen3-1.7B optimized for scientific article summarization, specifically for generating concise, factual summaries of biomedical research papers from the PubMed corpus.

The model is intended for:

Automated summarization of biomedical literature sections (e.g., Methods, Results)
Assisting researchers in rapid literature review and knowledge extraction
Serving as a domain-adapted base for further fine-tuning on clinical or life-science tasks

This is a causal language model adapted via supervised instruction fine-tuning (SFT), not a chat model — use the prescribed prompt format for optimal results.

Training Data

Source: Subset of ccdv/pubmed-summarization
Format: Article–abstract pairs (scientific paper section → expert-written summary)
Preprocessing: Raw text, no tokenization; paragraphs preserved with \n
Splits:
- Train: 10,000 samples
- Validation: 1,000 samples
- Test: 1,000 samples

Each sample contains:

article: Full text of a PubMed paper section (mean ~3,000 tokens)
abstract: Expert-written summary (mean ~215 tokens)

💡 Data was split and saved in Apache Parquet format for efficient loading.

Training Procedure and Hyperparameters

Methodology

Approach: QLoRA (4-bit Quantized Low-Rank Adaptation)
Base Model: Qwen/Qwen3-1.7B (1.7B parameters)
PEFT: LoRA adapters (r=8, α=16, dropout=0.05) applied only to q_proj and v_proj
Trainable Parameters: ~1.2M (< 0.1% of base)
Hardware: 1× NVIDIA RTX 3900 (24 GB VRAM)

Key Configurations

Component	Setting
Quantization	4-bit NF4 + double quantization
Compute dtype	`bfloat16`
Max sequence length	1,024 tokens (768 article + 256 summary)
Prompt format	Instruction-style with explicit "Summary:" separator
Loss masking	Prompt tokens masked with `-100`; loss computed only on summary
Batch size	1 (gradient accumulation not used)
Optimizer	`paged_adamw_8bit`
Learning rate	2e-4 (cosine decay)
Warmup	250 steps
Early stopping	Patience = 10, Δ = 1e-4
Eval frequency	Every 200 steps

Training completed in 1 epoch (~200 steps), with early stopping not triggered.

Evaluation Results

ROUGE Metrics (Test Set, n=1,000)

Metric	Zero-shot (Base)	After QLoRA	Δ
ROUGE-1	38.03	39.75	+1.72
ROUGE-2	12.26	15.37	+3.11
ROUGE-L	21.35	22.21	+0.86
ROUGE-Lsum	31.45	36.53	+5.08

✅ ROUGE-Lsum (sentence-level coherence) shows the largest gain — indicating improved structural and factual alignment with reference summaries.

HellaSwag (Zero-shot Commonsense Reasoning)

Setting	Accuracy
Before QLoRA	47.04%
After QLoRA	46.36%
Δ	–0.68 pp

🟡 Minimal degradation (<1 pp) confirms no catastrophic forgetting — the model retains strong general language competence while gaining domain-specific summarization skills.

Limitations and Known Issues

Scope: Trained on section-level summaries from PubMed; may underperform on full papers or non-biomedical texts
Technical depth: May omit highly specialized terminology or nuanced statistical findings
Hallucination risk: Like all generative models, may produce plausible but inaccurate statements on rare entities
Language: English-only; not evaluated on multilingual inputs
Bias: Inherits biases from PubMed corpus (e.g., publication bias, Western-centric studies)
Safety: Not aligned for safety or refusal behavior — unsuitable for direct clinical decision support

Usage

Loading and Inference

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Load base model in 4-bit
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-1.7B",
    quantization_config=bnb_config,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")

# Load LoRA adapters
model = PeftModel.from_pretrained(
    base_model, 
    "GermanovDev/qwen3-pubmed-summarization"
)

# Inference
prompt = """You are a helpful assistant who writes concise, factual summaries of articles. Summarize the following article into a few sentences.
Article:
Recent meta-analyses confirm that SGLT2 inhibitors significantly reduce hospitalization for heart failure in patients with type 2 diabetes, independent of glycemic control.
Summary:"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=False,
    pad_token_id=tokenizer.eos_token_id
)

# Extract generated summary (skip prompt)
summary = tokenizer.decode(
    outputs[0][inputs.input_ids.shape[1]:], 
    skip_special_tokens=True
)
print(summary)
# → "SGLT2 inhibitors reduce heart failure hospitalizations in type 2 diabetes patients, regardless of blood sugar control."

Prompt Template (Required)

Always use this exact format:

You are a helpful assistant who writes concise, factual summaries of articles. Summarize the following article into a few sentences.
Article:
{full_article_text}
Summary:

⚠️ Do not append the ground-truth abstract during inference.

Citation

@misc{germanov2025qwen3_pubmed_qlora,
  author = {Andrei Germanov},
  title = {{Qwen3-1.7B PubMed Summarization via QLoRA}},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/GermanovDev/qwen3-pubmed-summarization}},
  doi = {10.57967/hf.00000000}
}

Built upon:

Qwen3: Tang et al., 2024, Qwen Technical Report
Dataset: Cohan et al., 2018, A Discourse-Aware Attention Model for Abstractive Summarization
QLoRA: Dettmers et al., 2023

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for GermanovDev/qwen3-pubmed-summarization

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(496)

this model

Papers for GermanovDev/qwen3-pubmed-summarization

Structured IB: Improving Information Bottleneck with Structured Feature Learning

Paper • 2412.08222 • Published Dec 11, 2024

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 61