Expert-Specific Large Language Models for Radiology

This repository contains the model weights for "Expert-Specific Large Language Models for Radiology: Achieving Clinical-Grade Performance with Small Datasets".

🎯 Key Findings

Expert-specific models trained on 2,175–9,016 reports achieve comparable performance to benchmark models trained on 520,442 reports
Up to 95.05% time efficiency gains in prospective clinical deployment
97.8% concordance with radiologist-finalised impressions (BERTScore F1: 0.95–1.00)
Significantly outperforms public LLMs (GPT-4, Baidu Qianfan) on all CHARM dimensions

📦 Available Models

Expert-Specific Models (BLOOMZ-7B based)

Model	Base	Training Data	Description
`7b_radiologist1`	BLOOMZ-7B	~3,000 reports	Expert-specific model for Radiologist 1
`7b_radiologist4`	BLOOMZ-7B	~5,000 reports	Expert-specific model for Radiologist 4
`7b_radiologist5`	BLOOMZ-7B	~2,175 reports	Expert-specific model for Radiologist 5

Expert-Specific Models (BLOOMZ-3B based)

Model	Base	Training Data	Description
`3b_radiologist1`	BLOOMZ-3B	~3,000 reports	Compact expert-specific model for Radiologist 1
`3b_radiologist4`	BLOOMZ-3B	~5,000 reports	Compact expert-specific model for Radiologist 4
`3b_radiologist5`	BLOOMZ-3B	~2,175 reports	Compact expert-specific model for Radiologist 5

Benchmark SFT Models (trained on 520,442 reports)

Model	Base	Epochs	Description
`bloom_1b1_3`	BLOOMZ-1B	3	Benchmark SFT model (1B params, 3 epochs)
`bloom_1b1_16`	BLOOMZ-1B	16	Benchmark SFT model (1B params, 16 epochs)
`bloom_3b_3`	BLOOMZ-3B	3	Benchmark SFT model (3B params, 3 epochs)
`bloom_3b_16`	BLOOMZ-3B	16	Benchmark SFT model (3B params, 16 epochs)

RLHF Models (refined with human feedback)

Model	Base	PPO Steps	Description
`rlhf_checkpoint-80`	BLOOMZ-3B	80	RLHF-refined model (early checkpoint)
`rlhf_checkpoint-120`	BLOOMZ-3B	120	RLHF-refined model (optimal checkpoint)

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "your-org/7b_radiologist1"  # Replace with actual path
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Prepare input with prompt template
findings = "肝脏大小形态正常，实质内未见明确异常密度影。胆囊大小正常，壁不厚，腔内未见明确异常密度影。"

prompt = f"According to the following medical imaging description: {findings} Generate a corresponding CT image impression:"

# Generate impression
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)
impression = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(impression)

📋 Training Details

Hyperparameters

Parameter	SFT (Benchmark/Expert)	RLHF (PPO)
Base Model	BLOOMZ-1B/3B/7B	BLOOMZ-3B
Learning Rate	2×10⁻⁵	1.41×10⁻⁵
LR Schedule	Cosine decay	Constant
Batch Size	8	256
Gradient Accumulation	16	1
Max Sequence Length	2048	2048
Weight Decay	0.01	0
Dropout	0.1	0.1
Epochs	16	—
PPO Epochs	—	4
PPO Clip Range	—	0.2

Hardware

8× NVIDIA A100 (40GB) GPUs
DeepSpeed ZeRO-3 for distributed training

📊 Performance

NLP Metrics (31,434 test reports)

Model Type	BLEU-4	ROUGE-L F1	BERTScore F1
Expert-Specific (7B)	0.58–0.65	0.71–0.77	0.69
Benchmark SFT (3B)	0.68–0.70	0.81–0.82	0.69–0.70
RLHF (3B)	0.65–0.68	0.78–0.80	0.97
GPT-4	0.03	0.13	0.74
Baidu Qianfan	0.05	0.20	0.69

CHARM Clinical Evaluation

Expert-specific models achieved up to 95.0% performance overlap with benchmark RLHF models on CHARM metrics (Clarity, Helpfulness, Accuracy, Redundancy, Misleading) despite using 58–239× less training data.

⚠️ Intended Use & Limitations

Intended Use

Research purposes in medical AI and radiology NLP
Educational demonstrations of expert-specific fine-tuning approaches
Baseline comparisons for radiology report generation systems

📜 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgements

This work was supported by:

National Natural Science Foundation of China (82472065, W2432049)
National Key Research and Development Program of China (2022YFC2409501)
National Center for Translational Medicine Shanghai (NRCTM(SH)-2025-11)
Shanghai Explorer Program (24TS1414900)
Shanghai Pujiang Program (2023PJD053)