Model Card for medgemma1.5-CXR

medgemma1.5-CXR is my second attempt at fine tuning an open-weights vision-language model for chest X-ray structured report generation (for first attempt, please see Llama-3.2-11B-CXR. The model has been fine-tuned to generate radiological reports in a structured JSON format.

{
"Support devices": "None.",
"Cardiomediastinum": "Within normal limits.",
"Lungs": "Lungs are clear.",
"Pleura": "No pleural effusion or pneumothorax.",
"Skeleton": "No acute findings.",
"Upper abdomen": "No acute findings."
}

Model Details

Model Description

These are adapters for google/medgemma-1.5-4b-it, obtained through supervised fine-tuning (SFT) with low-rank adapters (LoRA) using a custom subset of publicly available frontal chest x-rays from the romprr/CXR_BioXAi_Hackathon_2024 dataset.

Uses

This model is SOLELY intended for research and development purposes. It is by no means ready or meant for clinical use, nor has it been validated in a clinical setting.

Out-of-Scope Use

This model has NOT been validated for clinical use or evaluated by any regulatory bodies and may experience hallucinations as well as missed findings. It is intended for research and developmental use ONLY. The models outputs are not intended to directly inform clinical diagnosis, patient management decisions, treatment recommendations, or any other direct clinical practice applications. All model outputs require independent verification and further investigation through established scientific research and development methodologies.

Bias, Risks, and Limitations

Results and model outputs are heavily dependent upon the specific prompt/instruction as well as inferencing techniques (temperature, top_p, min_p, etc.). The model has been optimized only for single-turn, single-image evaluation. The model may also suffer from data contamination/leakage, where the model may have been exposed to evaluation data during pre-training of fine-tuning, which may lead to overestimation of its true capabilities. Therefore, the model requires validation on datasets specific to each individual's/institutions use case.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForVision2Seq

base_model = "google/medgemma-1.5-4b-it"
adapter_id = "DeepRadiology/medgemma-1.5-4b-it"

model = AutoModelForVision2Seq.from_pretrained(
    base_model,
    device_map='auto',
    torch_dtype=torch.bfloat16,
)

adapter_name = model.load_adapter(adapter_id)
model.active_adapters = adapter_name
processor = AutoProcessor.from_pretrained(base_model)
image = Image.open("cxr.jpeg") # replace with your own example image

instruction = """You are an expert chest radiologist. Describe accurately what you see in this image. Use a \
structured report template with fields for: Support devices, Cardiomediastinum, Lungs, Pleura, Skeleton, and Upper \
abdomen. If there are no support devices, then report "None." for that field, if there are no pertinent \
Cardiomediastinal findings, report "Within normal limits." for that field. If there are no abnormal lung findings \
report "Lungs are clear." If there are no pertinent pleural findings, report "No pleural effusion or pneumothorax." \
For all other fields, if there are no pertinent findings, report "No acute findings." You must always generate a report\
 with the required fields."""

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": instruction}
    ]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt"
).to(model.device)

output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, min_p=0.1)
print(processor.decode(output[0]))

Training Details

Training Data

romprr/CXR_BioXAi_Hackathon_2024.

Training Procedure

Preprocessing

Dataset was filtered using meta-llama/Llama-3.3-70B-Instruct to remove reports with references to priors (although this was not 100% successful). The remaining free-text reports were then converted into a structured report format, again using meta-llama/Llama-3.3-70B-Instruct. The final training set was approximately 33k x-rays.

Training Hyperparameters

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation was performed using publically available IU-Xray and MIMIC-CXR datasets, using 'test' splits and frontal x-rays only as defined by RexRank.

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Presenting Llama-3.2-CXR-11B, a multi-modal open-weights vision language model (VLM) fine-tuned for chest x-ray report generation! The primary goal of this exercise was to demonstrate the potential for general purpose VLM's to be re-purposed for medical imaging tasks on consumer grade hardware with publicly available datasets.

Data Citations

MIMIC-CXR:

Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. (2019). MIMIC-CXR Database (version 2.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/C2JT1Q

Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data 6, 317 (2019). https://doi.org/10.1038/s41597-019-0322-0

IU-Xray:

Demner-Fushman D, Kohli MD, Rosenman MB, Shooshan SE, Rodriguez L, Antani S, Thoma GR, McDonald CJ. Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc. 2016 Mar;23(2):304-10. doi: 10.1093/jamia/ocv080. Epub 2015 Jul 1. PMID: 26133894; PMCID: PMC5009925.

Model Card Contact

Nakul Gupta

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DeepRadiology/medgemma1.5-CXR

Adapter
(45)
this model

Dataset used to train DeepRadiology/medgemma1.5-CXR