junwen-liu's picture
Update README.md
001d779 verified

Expert-Specific Large Language Models for Radiology

Paper License

This repository contains the model weights for "Expert-Specific Large Language Models for Radiology: Achieving Clinical-Grade Performance with Small Datasets".

๐ŸŽฏ Key Findings

  • Expert-specific models trained on 2,175โ€“9,016 reports achieve comparable performance to benchmark models trained on 520,442 reports
  • Up to 95.05% time efficiency gains in prospective clinical deployment
  • 97.8% concordance with radiologist-finalised impressions (BERTScore F1: 0.95โ€“1.00)
  • Significantly outperforms public LLMs (GPT-4, Baidu Qianfan) on all CHARM dimensions

๐Ÿ“ฆ Available Models

Expert-Specific Models (BLOOMZ-7B based)

Model Base Training Data Description
7b_radiologist1 BLOOMZ-7B ~3,000 reports Expert-specific model for Radiologist 1
7b_radiologist4 BLOOMZ-7B ~5,000 reports Expert-specific model for Radiologist 4
7b_radiologist5 BLOOMZ-7B ~2,175 reports Expert-specific model for Radiologist 5

Expert-Specific Models (BLOOMZ-3B based)

Model Base Training Data Description
3b_radiologist1 BLOOMZ-3B ~3,000 reports Compact expert-specific model for Radiologist 1
3b_radiologist4 BLOOMZ-3B ~5,000 reports Compact expert-specific model for Radiologist 4
3b_radiologist5 BLOOMZ-3B ~2,175 reports Compact expert-specific model for Radiologist 5

Benchmark SFT Models (trained on 520,442 reports)

Model Base Epochs Description
bloom_1b1_3 BLOOMZ-1B 3 Benchmark SFT model (1B params, 3 epochs)
bloom_1b1_16 BLOOMZ-1B 16 Benchmark SFT model (1B params, 16 epochs)
bloom_3b_3 BLOOMZ-3B 3 Benchmark SFT model (3B params, 3 epochs)
bloom_3b_16 BLOOMZ-3B 16 Benchmark SFT model (3B params, 16 epochs)

RLHF Models (refined with human feedback)

Model Base PPO Steps Description
rlhf_checkpoint-80 BLOOMZ-3B 80 RLHF-refined model (early checkpoint)
rlhf_checkpoint-120 BLOOMZ-3B 120 RLHF-refined model (optimal checkpoint)

๐Ÿš€ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "your-org/7b_radiologist1"  # Replace with actual path
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Prepare input with prompt template
findings = "่‚่„ๅคงๅฐๅฝขๆ€ๆญฃๅธธ๏ผŒๅฎž่ดจๅ†…ๆœช่งๆ˜Ž็กฎๅผ‚ๅธธๅฏ†ๅบฆๅฝฑใ€‚่ƒ†ๅ›Šๅคงๅฐๆญฃๅธธ๏ผŒๅฃไธๅŽš๏ผŒ่…”ๅ†…ๆœช่งๆ˜Ž็กฎๅผ‚ๅธธๅฏ†ๅบฆๅฝฑใ€‚"

prompt = f"According to the following medical imaging description: {findings} Generate a corresponding CT image impression:"

# Generate impression
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)
impression = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(impression)

๐Ÿ“‹ Training Details

Hyperparameters

Parameter SFT (Benchmark/Expert) RLHF (PPO)
Base Model BLOOMZ-1B/3B/7B BLOOMZ-3B
Learning Rate 2ร—10โปโต 1.41ร—10โปโต
LR Schedule Cosine decay Constant
Batch Size 8 256
Gradient Accumulation 16 1
Max Sequence Length 2048 2048
Weight Decay 0.01 0
Dropout 0.1 0.1
Epochs 16 โ€”
PPO Epochs โ€” 4
PPO Clip Range โ€” 0.2

Hardware

  • 8ร— NVIDIA A100 (40GB) GPUs
  • DeepSpeed ZeRO-3 for distributed training

๐Ÿ“Š Performance

NLP Metrics (31,434 test reports)

Model Type BLEU-4 ROUGE-L F1 BERTScore F1
Expert-Specific (7B) 0.58โ€“0.65 0.71โ€“0.77 0.69
Benchmark SFT (3B) 0.68โ€“0.70 0.81โ€“0.82 0.69โ€“0.70
RLHF (3B) 0.65โ€“0.68 0.78โ€“0.80 0.97
GPT-4 0.03 0.13 0.74
Baidu Qianfan 0.05 0.20 0.69

CHARM Clinical Evaluation

Expert-specific models achieved up to 95.0% performance overlap with benchmark RLHF models on CHARM metrics (Clarity, Helpfulness, Accuracy, Redundancy, Misleading) despite using 58โ€“239ร— less training data.

โš ๏ธ Intended Use & Limitations

Intended Use

  • Research purposes in medical AI and radiology NLP
  • Educational demonstrations of expert-specific fine-tuning approaches
  • Baseline comparisons for radiology report generation systems

๐Ÿ“œ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

๐Ÿ™ Acknowledgements

This work was supported by:

  • National Natural Science Foundation of China (82472065, W2432049)
  • National Key Research and Development Program of China (2022YFC2409501)
  • National Center for Translational Medicine Shanghai (NRCTM(SH)-2025-11)
  • Shanghai Explorer Program (24TS1414900)
  • Shanghai Pujiang Program (2023PJD053)