Instructions to use sh3hryarkhan/MedGemma-TI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sh3hryarkhan/MedGemma-TI with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/medgemma-4b-it")
model = PeftModel.from_pretrained(base_model, "sh3hryarkhan/MedGemma-TI")

Transformers

How to use sh3hryarkhan/MedGemma-TI with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="sh3hryarkhan/MedGemma-TI")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("sh3hryarkhan/MedGemma-TI", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use sh3hryarkhan/MedGemma-TI with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sh3hryarkhan/MedGemma-TI"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sh3hryarkhan/MedGemma-TI",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/sh3hryarkhan/MedGemma-TI

SGLang

How to use sh3hryarkhan/MedGemma-TI with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sh3hryarkhan/MedGemma-TI" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sh3hryarkhan/MedGemma-TI",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sh3hryarkhan/MedGemma-TI" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sh3hryarkhan/MedGemma-TI",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use sh3hryarkhan/MedGemma-TI with Docker Model Runner:
```
docker model run hf.co/sh3hryarkhan/MedGemma-TI
```

MedGemma-TI - Temporal Intelligence for Chest X-Ray Progression

MedGemma-TI is a QLoRA adapter for google/medgemma-4b-it that adds temporal progression assessment to a model that already speaks the language of medicine. Given two or more chronologically ordered chest X-rays, it compares PRIOR to CURRENT anatomy and outputs a structured report concluding with a single assessment: IMPROVED / STABLE / WORSENED / MIXED.

⚠️ Research prototype. Not validated for clinical use. See Limitations.

What It Does

Base MedGemma-4B-IT was not trained to reason across time. When we tested it on sequential chest X-ray comparison:

Metric	Base MedGemma	MedGemma-TI
Accuracy (17,802 samples)	23.7%	44.0% (+20.3pp)
Macro F1	0.140	0.343 (2.45×)
Worsening Recall	32.8%	54.8% (+22.0pp)
Missed Worsening	67.2%	45.2% - 1.49× safer
Temporal Coherence (flip test)	1.3%	26.4% (19.7×)

Temporal coherence: swapping PRIOR ↔ CURRENT images flips the model's assessment (IMPROVED ↔ WORSENED) as expected.

Loading the Model

This repository contains LoRA adapter weights only. You must load the base model first, then apply the adapter.

import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel

# 1. Load base model (4-bit quantization for <4 GB VRAM)
base_model = AutoModelForImageTextToText.from_pretrained(
    "google/medgemma-4b-it",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# 2. Apply the LoRA adapter
model = PeftModel.from_pretrained(base_model, "sh3hryarkhan/MedGemma-TI")
model.eval()

# 3. Load processor from the base model
processor = AutoProcessor.from_pretrained("google/medgemma-4b-it")

Note: Use AutoModelForImageTextToText, NOT AutoModelForCausalLM. The latter lacks the vision encoder.

Inference - Structured Temporal Report

The model was trained with a specific prompt format. Use this format exactly to get structured temporal reports. Deviating from it (e.g. using lowercase headers or a different section order) will reduce output quality.

Prompt Structure

Images must be passed as actual image inputs. The <image_N> tags inside IMAGING TIMELINE are literal text markers the model associates with image positions - they are not image tokens.

PATIENT CONTEXT:
Age: {age} years | Sex: {sex}

IMAGING TIMELINE:
[IMAGE_1 | Date: {YYYY-MM-DD} | Role: Baseline]
<image_1>

[IMAGE_2 | Date: {YYYY-MM-DD} | Role: Current]
<image_2>

CLINICAL REQUEST:
{physician's question or "Compare these studies and assess interval change."}

TASK: Analyze the current study compared to the prior study and identify any interval changes. Conclude your analysis with a clear overall assessment using one of these terms: IMPROVED, STABLE, WORSENED, or MIXED.

Role labels follow this convention:

2 images: Baseline, Current
3+ images: Baseline, Intermediate, …, Current
1 image: Current (falls back to single-image description; temporal comparison is not meaningful)

Optional sections (insert after IMAGING TIMELINE, before CLINICAL REQUEST):

CLINICAL ALERT:
{free-text alert, e.g. "Patient on anticoagulation therapy"}

PATIENT NOTES:
({date})
{note text}

Previous findings (insert after CLINICAL REQUEST, before TASK):

PREVIOUS FINDINGS:
{prior radiology read for the baseline image}

Python Example (Two Images)

from PIL import Image
import torch

prior_image = Image.open("prior.jpg").convert("RGB")
current_image = Image.open("current.jpg").convert("RGB")

prompt_text = """PATIENT CONTEXT:
Age: 58 years | Sex: Female

IMAGING TIMELINE:
[IMAGE_1 | Date: 2024-11-01 | Role: Baseline]
<image_1>

[IMAGE_2 | Date: 2025-01-15 | Role: Current]
<image_2>

CLINICAL REQUEST:
Compare these two chest X-rays and assess interval change.

TASK: Analyze the current study compared to the prior study and identify any interval changes. Conclude your analysis with a clear overall assessment using one of these terms: IMPROVED, STABLE, WORSENED, or MIXED."""

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": prior_image},
            {"type": "image", "image": current_image},
            {"type": "text", "text": prompt_text},
        ],
    }
]

# Two-step processing (required for multi-image inputs)
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(
    text=[text],
    images=[prior_image, current_image],
    return_tensors="pt",
    padding=True,
).to(model.device)

with torch.inference_mode():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.2,
        do_sample=True,
    )

# Decode only the newly generated tokens
input_length = inputs["input_ids"].shape[1]
generated = output_ids[0][input_length:]
report = processor.tokenizer.decode(generated, skip_special_tokens=True)
print(report)

Expected Output Format

The model produces a structured report in four sections:

PRIOR:
Moderate left pleural effusion. Mild cardiomegaly. Patchy bilateral opacities
consistent with pulmonary edema. No pneumothorax.

CURRENT:
Small residual left pleural effusion, markedly reduced from prior. Cardiac
silhouette improved. Bilateral opacities substantially decreased. Lung fields
otherwise clearer.

CHANGES:
- Left pleural effusion: markedly decreased
- Pulmonary edema: substantially improved
- Cardiomegaly: mild improvement
- No new findings

IMPRESSION: IMPROVED

The final line of the IMPRESSION section will contain exactly one of: IMPROVED, STABLE, WORSENED, or MIXED.

Training Details

Component	Detail
Base model	`google/medgemma-4b-it`
Method	QLoRA - 4-bit quantization, r=16, α=16
Target modules	All linear layers + `lm_head`, `embed_tokens`
Loss	Response-only cross-entropy (prompt tokens masked to −100)
Epochs	4
Learning rate	2e-4 with 5% warmup
Gradient clip	0.3
Adapter size	~2.6 GB
Inference VRAM	<4 GB (with 4-bit quantization)
Hardware	Virginia Tech ARC cluster (multi-GPU, `torchrun`)

Training Data

Source	Training Examples	Population
CheXpert (Stanford)	~56,320 temporal pairs	Effusion, pneumonia, cardiomegaly, nodules, atelectasis
RICORD-1C	963 (321 pairs × 3×)	COVID-19 ICU, viral pneumonia, ARDS
Total	57,283	-

Patient-level train/val/test splits with audited zero cross-split leakage
Test set (17,802 samples): evaluated without oversampling or augmentation
IMPROVED class underrepresentation corrected by 1.61× upsampling in train split only

Evaluation

Test set: 17,802 samples (17,679 CheXpert + 123 RICORD), natural class distribution.

Per-Class Results (MedGemma-TI)

Class	Precision	Recall	F1
IMPROVED	0.380	0.326	0.351
STABLE	0.328	0.653	0.436
WORSENED	0.484	0.548	0.514
MIXED	0.688	0.295	0.413
Macro avg	-	-	0.343

WORSENED recall (54.8%) is the clinically critical metric - catching deteriorating patients before they are missed.

Temporal Coherence - Flip Test

A novel evaluation measuring whether the model genuinely reasons about image order. Paired test cases swap PRIOR ↔ CURRENT images while updating all text metadata to match. A model reading only text labels would produce identical outputs (0% coherence). Only genuine visual comparison yields non-zero coherence.

Model	Coherence Rate
Base MedGemma-4B-IT	1.3%
MedGemma-TI	26.4% (19.7×)

Limitations

Not validated externally. Results are from same-distribution test splits (CheXpert/RICORD). Performance on other hospital systems (MIMIC-CXR, UK Biobank, etc.) is unknown.
Chest X-rays only. CT, MRI, ultrasound, and other modalities are not supported.
Temporal coherence is a proof-of-concept. 26.4% is not a deployable threshold.
MIXED class is inherently ambiguous. Simultaneous regional improvement and worsening is harder to label consistently.
Not for clinical use. This is a research prototype and has not been validated for clinical decision-making.

Citation

@misc{khan2025medgemma_ti,
  title        = {MedGemma-TI: Teaching Temporal Reasoning to Medical Vision-Language Models},
  author       = {Muhammad Shehryar Khan and Abdullah Al Muhit},
  year         = {2025},
  institution  = {Virginia Tech},
  howpublished = {\url{https://huggingface.co/sh3hryarkhan/MedGemma-TI}},
}

Acknowledgments

Computational resources provided by Advanced Research Computing at Virginia Tech.

Base model: google/medgemma-4b-it. Training datasets: (Compressed)CheXpert (Stanford ML Group) and RICORD-1C (RSNA). Note: Original CheXpert can be accessed here: Original CheXpert

Downloads last month: 1

Model tree for sh3hryarkhan/MedGemma-TI

Base model

google/gemma-3-4b-pt

Finetuned

google/medgemma-4b-pt

Finetuned

google/medgemma-4b-it

Adapter

(99)

this model