Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support

Paper Code HuggingFace Model Dataset

Authors

Alif Munim* 1, Jun Ma* 1,2, Omar Ibrahim* 1, Alhusain Abdalla* 1, Shuolin Yin3, Leo Chen4, Bo Wang† 1,5,6,7,8

* Equal contribution     Corresponding author

1AI Collaborative Centre, University Health Network, Toronto, Canada
2Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
3Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
4Division of Urology, Department of Surgery, St. Michael's Hospital, Unity Health Toronto and University of Toronto, Toronto, Canada
5Peter Munk Cardiac Centre, University Health Network, Toronto, Canada
6Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada
7Department of Computer Science, University of Toronto, Toronto, Canada
8Vector Institute for Artificial Intelligence, Toronto, Canada

Highlights

  • LoRA fine-tuned GPT-OSS 20B for structured radiology differential diagnosis
  • Trained on 1,894 EuroRad medical cases spanning diverse imaging modalities and specialties
  • Generates systematic chain-of-thought reasoning: symptom mapping → differential analysis → diagnosis
  • Lightweight adapter (2.27 GB) compatible with 4-bit quantization for on-device deployment
  • Part of a broader benchmark study comparing on-device LLMs across medical tasks

Model Overview

This model is a LoRA fine-tuned version of unsloth/gpt-oss-20b for medical radiology diagnosis, developed as part of a study benchmarking and adapting on-device large language models for clinical decision support. Trained on EuroRad clinical cases, it generates step-by-step diagnostic reasoning from patient history and imaging findings, mapping symptoms to differentials and converging on a final diagnosis with supporting evidence.

The model employs a systematic diagnostic framework: (1) relating clinical history to imaging findings, (2) mapping findings to each differential, (3) systematic elimination of alternatives, and (4) converging on a final diagnosis with confidence reasoning.

Model Details

Base Model unsloth/gpt-oss-20b
Fine-tuning Method LoRA (Low-Rank Adaptation)
Training Framework Unsloth
Task Medical diagnosis from radiology reports
Training Dataset wanglab/eurorad-gpt-oss-training-data
Sequence Length 4,096 tokens
Quantization 4-bit
Adapter Size 2.27 GB
License Apache-2.0 (research use only)

Installation

pip install unsloth peft transformers accelerate bitsandbytes

Usage

from unsloth import FastLanguageModel
from peft import PeftModel

# Load base model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/gpt-oss-20b",
    dtype=None,
    max_seq_length=4096,
    load_in_4bit=True,
    full_finetuning=False,
)

# Load LoRA adapter
model = PeftModel.from_pretrained(
    model,
    "wanglab/on-device-LLM-gpt-oss-20b",
    is_trainable=False
)

# Enable inference mode
FastLanguageModel.for_inference(model)

# Example inference
prompt = """You are an expert radiologist demonstrating step-by-step diagnostic reasoning.

Case presentation:
{combined_description}

Differential diagnoses to consider:
{dd_formatted}

Generate systematic Chain-of-Thought reasoning that shows how clinicians think through cases:
1. **Connect symptoms to findings**: Link clinical presentation with imaging observations
2. **Map to differentials**: Show how findings support or contradict each differential diagnosis
3. **Systematic elimination**: Explicitly rule out less likely options with reasoning
4. **Converge to answer**: Demonstrate the logical path to the correct diagnosis"""

inputs = tokenizer(prompt.format(
    combined_description="...",  # clinical history + imaging findings
    dd_formatted="Diagnosis A, Diagnosis B, Diagnosis C"
), return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Framework Unsloth
Dataset wanglab/eurorad-gpt-oss-training-data
Cases 1,894 EuroRad radiology cases
Input Text only (clinical history + imaging report)
Optimization 4-bit quantization + LoRA
Sequence Length 4,096 tokens

Citation

Citation will be updated upon arXiv submission and journal publication.

@article{munim2025ondevice,
    title={Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support},
    author={Munim, Alif and Ma, Jun and Ibrahim, Omar and Abdalla, Alhusain and Yin, Shuolin and Chen, Leo and Wang, Bo},
    journal={},
    year={2025}
}

Limitations

  • Clinical Validation Required: This model has not been clinically validated and should not be used for actual patient diagnosis
  • Research Purposes Only: Designed for research in medical AI and diagnostic systems
  • Not for Clinical Use: Not intended for direct patient care without clinical validation
  • May reflect biases present in the EuroRad training data
  • Performance may vary across imaging modalities and medical specialties
  • Like all LLMs, may generate plausible but incorrect information ("hallucinations")

Contact

For issues and questions, please open a discussion in this repository.
Corresponding author: Bo Wang — bowang@vectorinstitute.ai

Disclaimer: This model is for research purposes only and has not been approved for clinical use. Always consult qualified healthcare professionals for medical decisions.

Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wanglab/on-device-LLM-gpt-oss-20b

Base model

openai/gpt-oss-20b
Adapter
(54)
this model