| | --- |
| | license: other |
| | license_name: health-ai-developer-foundations |
| | license_link: https://developers.google.com/health-ai-developer-foundations/terms |
| | library_name: transformers |
| | pipeline_tag: image-text-to-text |
| | language: en |
| | extra_gated_heading: Access MedGemma on Hugging Face |
| | extra_gated_prompt: >- |
| | To access MedGemma on Hugging Face, you're required to review and |
| | agree to [Health AI Developer Foundation's terms of use](https://developers.google.com/health-ai-developer-foundations/terms). |
| | To do this, please ensure you're logged in to Hugging Face and click below. |
| | Requests are processed immediately. |
| | extra_gated_button_content: Acknowledge license |
| | tags: |
| | - medical |
| | - x-ray |
| | - pathology |
| | - dermatology |
| | - fundus |
| | - radiology report generation |
| | - chest-x-ray |
| | - medical-embeddings |
| | - image-classification |
| | - zero-shot-image-classification |
| | - image-feature-extraction |
| | - image-text-to-text |
| | base_model: google/gemma-3-27b-pt |
| | --- |
| | |
| | # MedGemma model card |
| |
|
| | **Model documentation:** [MedGemma](https://developers.google.com/health-ai-developer-foundations/medgemma) |
| |
|
| | **Resources:** |
| |
|
| | * Model on Google Cloud Model Garden: [MedGemma](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/medgemma) |
| | * Models on Hugging Face: [Collection](https://huggingface.co/collections/google/medgemma-release-680aade845f90bec6a3f60c4) |
| | * GitHub repository (supporting code, Colab notebooks, discussions, and |
| | issues): [MedGemma](https://github.com/google-health/medgemma) |
| | * Quick start notebook: [GitHub](https://github.com/google-health/medgemma/blob/main/notebooks/quick_start_with_hugging_face.ipynb) |
| | * Fine-tuning notebook: [GitHub](https://github.com/google-health/medgemma/blob/main/notebooks/fine_tune_with_hugging_face.ipynb) |
| | * Concept applications built using MedGemma: [Collection](https://huggingface.co/collections/google/medgemma-concept-apps-686ea036adb6d51416b0928a) |
| | * Support: See [Contact](https://developers.google.com/health-ai-developer-foundations/medgemma/get-started.md#contact) |
| | * License: The use of MedGemma is governed by the [Health AI Developer |
| | Foundations terms of |
| | use](https://developers.google.com/health-ai-developer-foundations/terms). |
| | |
| | **Author:** Google |
| |
|
| | ## Model information |
| |
|
| | This section describes the MedGemma model and how to use it. |
| |
|
| | ### Description |
| |
|
| | MedGemma is a collection of [Gemma 3](https://ai.google.dev/gemma/docs/core) |
| | variants that are trained for performance on medical text and image |
| | comprehension. Developers can use MedGemma to accelerate building |
| | healthcare-based AI applications. MedGemma currently comes in three variants: a |
| | 4B multimodal version and 27B text-only and multimodal versions. |
| |
|
| | Both MedGemma multimodal versions utilize a |
| | [SigLIP](https://arxiv.org/abs/2303.15343) image encoder that has been |
| | specifically pre-trained on a variety of de-identified medical data, including |
| | chest X-rays, dermatology images, ophthalmology images, and histopathology |
| | slides. Their LLM components are trained on a diverse set of medical data, |
| | including medical text, medical question-answer pairs, FHIR-based electronic |
| | health record data (27B multimodal only), radiology images, histopathology |
| | patches, ophthalmology images, and dermatology images. |
| |
|
| | MedGemma 4B is available in both pre-trained (suffix: `-pt`) and |
| | instruction-tuned (suffix `-it`) versions. The instruction-tuned version is a |
| | better starting point for most applications. The pre-trained version is |
| | available for those who want to experiment more deeply with the models. |
| |
|
| | MedGemma 27B multimodal has pre-training on medical image, medical record and |
| | medical record comprehension tasks. MedGemma 27B text-only has been trained |
| | exclusively on medical text. Both models have been optimized for inference-time |
| | computation on medical reasoning. This means it has slightly higher performance |
| | on some text benchmarks than MedGemma 27B multimodal. Users who want to work |
| | with a single model for both medical text, medical record and medical image |
| | tasks are better suited for MedGemma 27B multimodal. Those that only need text |
| | use-cases may be better served with the text-only variant. Both MedGemma 27B |
| | variants are only available in instruction-tuned versions. |
| |
|
| | MedGemma variants have been evaluated on a range of clinically relevant |
| | benchmarks to illustrate their baseline performance. These evaluations are based |
| | on both open benchmark datasets and curated datasets. Developers can fine-tune |
| | MedGemma variants for improved performance. Consult the [Intended |
| | use](#intended-use) section below for more details. |
| |
|
| | MedGemma is optimized for medical applications that involve a text generation |
| | component. For medical image-based applications that do not involve text |
| | generation, such as data-efficient classification, zero-shot classification, or |
| | content-based or semantic image retrieval, the [MedSigLIP image |
| | encoder](https://developers.google.com/health-ai-developer-foundations/medsiglip/model-card) |
| | is recommended. MedSigLIP is based on the same image encoder that powers |
| | MedGemma. |
| |
|
| | Please consult the [MedGemma Technical Report](https://arxiv.org/abs/2507.05201) |
| | for more details. |
| |
|
| | ### How to use |
| |
|
| | Below are some example code snippets to help you quickly get started running the |
| | model locally on GPU. If you want to use the model at scale, we recommend that |
| | you create a production version using [Model |
| | Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/medgemma). |
| |
|
| | First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0. |
| |
|
| | ```shell |
| | $ pip install -U transformers |
| | ``` |
| |
|
| | **Run model with the `pipeline` API** |
| |
|
| | ```py |
| | from transformers import pipeline |
| | from PIL import Image |
| | import requests |
| | import torch |
| | |
| | pipe = pipeline( |
| | "image-text-to-text", |
| | model="google/medgemma-27b-it", |
| | torch_dtype=torch.bfloat16, |
| | device="cuda", |
| | ) |
| | |
| | messages = [ |
| | { |
| | "role": "system", |
| | "content": [{"type": "text", "text": "You are a helpful medical assistant."}] |
| | }, |
| | { |
| | "role": "user", |
| | "content": [{"type": "text", "text": "How do you differentiate bacterial from viral pneumonia?"}] |
| | } |
| | ] |
| | |
| | output = pipe(text=messages, max_new_tokens=200) |
| | print(output[0]["generated_text"][-1]["content"]) |
| | |
| | # Image attribution: Stillwaterising, CC0, via Wikimedia Commons |
| | image_url = "[https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png](https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png)" |
| | image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw) |
| | |
| | messages = [ |
| | { |
| | "role": "system", |
| | "content": [{"type": "text", "text": "You are an expert radiologist."}] |
| | }, |
| | { |
| | "role": "user", |
| | "content": [ |
| | {"type": "text", "text": "Describe this X-ray"}, |
| | {"type": "image", "image": image} |
| | ] |
| | } |
| | ] |
| | |
| | output = pipe(text=messages, max_new_tokens=200) |
| | print(output[0]["generated_text"][-1]["content"]) |
| | ``` |
| |
|
| | **Run the model directly** |
| |
|
| | ```py |
| | # pip install accelerate |
| | from transformers import AutoProcessor, AutoModelForImageTextToText |
| | from PIL import Image |
| | import requests |
| | import torch |
| | |
| | model_id = "google/medgemma-27b-it" |
| | |
| | model = AutoModelForImageTextToText.from_pretrained( |
| | model_id, |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto", |
| | ) |
| | processor = AutoProcessor.from_pretrained(model_id) |
| | |
| | messages = [ |
| | { |
| | "role": "system", |
| | "content": [{"type": "text", "text": "You are a helpful medical assistant."}] |
| | }, |
| | { |
| | "role": "user", |
| | "content": [{"type": "text", "text": "How do you differentiate bacterial from viral pneumonia?"}] |
| | } |
| | ] |
| | |
| | inputs = processor.apply_chat_template( |
| | messages, add_generation_prompt=True, tokenize=True, |
| | return_dict=True, return_tensors="pt" |
| | ).to(model.device, dtype=torch.bfloat16) |
| | |
| | input_len = inputs["input_ids"].shape[-1] |
| | |
| | with torch.inference_mode(): |
| | generation = model.generate(**inputs, max_new_tokens=200, do_sample=False) |
| | generation = generation[0][input_len:] |
| | |
| | decoded = processor.decode(generation, skip_special_tokens=True) |
| | print(decoded) |
| | |
| | # Image attribution: Stillwaterising, CC0, via Wikimedia Commons |
| | image_url = "[https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png](https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png)" |
| | image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw) |
| | |
| | messages = [ |
| | { |
| | "role": "system", |
| | "content": [{"type": "text", "text": "You are an expert radiologist."}] |
| | }, |
| | { |
| | "role": "user", |
| | "content": [ |
| | {"type": "text", "text": "Describe this X-ray"}, |
| | {"type": "image", "image": image} |
| | ] |
| | } |
| | ] |
| | |
| | inputs = processor.apply_chat_template( |
| | messages, add_generation_prompt=True, tokenize=True, |
| | return_dict=True, return_tensors="pt" |
| | ).to(model.device, dtype=torch.bfloat16) |
| | |
| | input_len = inputs["input_ids"].shape[-1] |
| | |
| | with torch.inference_mode(): |
| | generation = model.generate(**inputs, max_new_tokens=200, do_sample=False) |
| | generation = generation[0][input_len:] |
| | |
| | decoded = processor.decode(generation, skip_special_tokens=True) |
| | print(decoded) |
| | ``` |
| |
|
| | ### Examples |
| |
|
| | See the following Colab notebooks for examples of how to use MedGemma: |
| |
|
| | * To give the model a quick try, running it locally with weights from Hugging |
| | Face, see [Quick start notebook in |
| | Colab](https://colab.research.google.com/github/google-health/medgemma/blob/main/notebooks/quick_start_with_hugging_face.ipynb). |
| | Note that you will need to use Colab Enterprise to obtain adequate GPU |
| | resources to run either 27B model without quantization. |
| | |
| | * For an example of fine-tuning the 4B model, see the [Fine-tuning notebook in |
| | Colab](https://colab.research.google.com/github/google-health/medgemma/blob/main/notebooks/fine_tune_with_hugging_face.ipynb). |
| | The 27B models can be fine tuned in a similar manner but will require more |
| | time and compute resources than the 4B model. |
| | |
| | ### Model architecture overview |
| |
|
| | The MedGemma model is built based on [Gemma 3](https://ai.google.dev/gemma/) and |
| | uses the same decoder-only transformer architecture as Gemma 3. To read more |
| | about the architecture, consult the Gemma 3 [model |
| | card](https://ai.google.dev/gemma/docs/core/model_card_3). |
| |
|
| | ### Technical specifications |
| |
|
| | * **Model type**: Decoder-only Transformer architecture, see the [Gemma 3 |
| | Technical |
| | Report](https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf) |
| | * **Input modalities**: **4B and 27B multimodal**: Text, vision; **27B text**: Text only |
| | * **Output modality:** Text only (all models) |
| | * **Attention mechanism**: Grouped-query attention (GQA) |
| | * **Context length**: Supports long context, at least 128K tokens |
| | * **Key publication**: [https://arxiv.org/abs/2507.05201](https://arxiv.org/abs/2507.05201) |
| | * **Model created**: July 9, 2025 |
| | * **Model version**: 1.0.0 |
| | |
| | ### Citation |
| |
|
| | When using this model, please cite: Sellergren et al. "MedGemma Technical |
| | Report." *arXiv preprint arXiv:2507.05201* (2025). |
| |
|
| | ```none |
| | @article{sellergren2025medgemma, |
| | title={MedGemma Technical Report}, |
| | author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and Traverse, Madeleine and Kohlberger, Timo and Xu, Shawn and Jamil, Fayaz and Hughes, Cían and Lau, Charles and others}, |
| | journal={arXiv preprint arXiv:2507.05201}, |
| | year={2025} |
| | } |
| | ``` |
| |
|
| | ### Inputs and outputs |
| |
|
| | **Input**: |
| |
|
| | * Text string, such as a question or prompt |
| | * Images, normalized to 896 x 896 resolution and encoded to 256 tokens each |
| | * Total input length of 128K tokens |
| |
|
| | **Output**: |
| |
|
| | * Generated text in response to the input, such as an answer to a question, |
| | analysis of image content, or a summary of a document |
| | * Total output length of 8192 tokens |
| | |
| | ### Performance and validation |
| |
|
| | MedGemma was evaluated across a range of different multimodal classification, |
| | report generation, visual question answering, and text-based tasks. |
| |
|
| | ### Key performance metrics |
| |
|
| | #### Imaging evaluations |
| |
|
| | The multimodal performance of MedGemma 4B and 27B multimodal was evaluated |
| | across a range of benchmarks, focusing on radiology, dermatology, |
| | histopathology, ophthalmology, and multimodal clinical reasoning. |
| |
|
| | MedGemma 4B outperforms the base Gemma 3 4B model across all tested multimodal |
| | health benchmarks. |
| |
|
| | | Task and metric | Gemma 3 4B | MedGemma 4B | Gemma 3 27B | MedGemma 27B multimodal | |
| | | :---- | :---- | :---- | :---- | :---- | |
| | | **Medical image classification** | | | | | |
| | | MIMIC CXR** - macro F1 for top 5 conditions | 81.2 | 88.9 | 71.7 | 90.0 | |
| | | CheXpert CXR - macro F1 for top 5 conditions | 32.6 | 48.1 | 26.2 | 49.9 | |
| | | CXR14 - macro F1 for 3 conditions | 32.0 | 50.1 | 31.4 | 45.3 | |
| | | PathMCQA* (histopathology, internal**) - Accuracy | 37.1 | 69.8 | 42.2 | 71.6 | |
| | | US-DermMCQA* - Accuracy | 52.5 | 71.8 | 66.9 | 71.7 | |
| | | EyePACS* (fundus, internal) - Accuracy | 14.4 | 64.9 | 20.3 | 75.3 | |
| | | **Visual question answering** | | | | | |
| | | SLAKE (radiology) - Tokenized F1 | 40.2 | 72.3 | 42.5 | 70.0 | |
| | | VQA-RAD*** (radiology) - Tokenized F1 | 33.6 | 49.9 | 42.7 | 46.7 | |
| | | **Knowledge and reasoning** | | | | | |
| | | MedXpertQA (text + multimodal questions) - Accuracy | 16.4 | 18.8 | 22.0 | 26.8 | |
| |
|
| | *Internal datasets. US-DermMCQA is described in [Liu (2020, Nature |
| | medicine)](https://www.nature.com/articles/s41591-020-0842-3), presented as a |
| | 4-way MCQ per example for skin condition classification. PathMCQA is based on |
| | multiple datasets, presented as 3-9 way MCQ per example for identification, |
| | grading, and subtype for breast, cervical, and prostate cancer. EyePACS is a |
| | dataset of fundus images with classification labels based on 5-level diabetic |
| | retinopathy severity (None, Mild, Moderate, Severe, Proliferative). More details |
| | in the [MedGemma Technical Report](https://arxiv.org/abs/2507.05201). |
| | |
| | **Based on radiologist adjudicated labels, described in [Yang (2024, |
| | arXiv)](https://arxiv.org/pdf/2405.03162) Section A.1.1. |
| | |
| | ***Based on "balanced split," described in [Yang (2024, |
| | arXiv)](https://arxiv.org/pdf/2405.03162). |
| |
|
| | #### Chest X-ray report generation |
| |
|
| | MedGemma chest X-ray (CXR) report generation performance was evaluated on |
| | [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.1.0/) using the [RadGraph |
| | F1 metric](https://arxiv.org/abs/2106.14463). We compare the MedGemma |
| | pre-trained checkpoint with our previous best model for CXR report generation, |
| | [PaliGemma 2](https://arxiv.org/abs/2412.03555). |
| |
|
| | | Metric | MedGemma 4B (pre-trained) | MedGemma 4B (tuned for CXR) | MedGemma 27B multimodal (pre-trained)* | PaliGemma 2 3B (tuned for CXR) | PaliGemma 2 10B (tuned for CXR) | |
| | | :---- | :---- | :---- | :---- | :---- | :---- | |
| | | **Chest X-ray report generation** | | | | | | |
| | | MIMIC CXR - RadGraph F1 | 29.5 | 30.3 | 27.0 | 28.8 | 29.5 | |
| |
|
| | *Not released |
| | |
| | The instruction-tuned versions of MedGemma 4B and MedGemma 27B achieve lower |
| | scores (21.9 and 21.3, respectively) due to the differences in reporting style |
| | compared to the MIMIC ground truth reports. Further fine-tuning on MIMIC reports |
| | enables users to achieve improved performance, as shown by the improved |
| | performance of the MedGemma 4B model that was tuned for CXR. |
| | |
| | #### Text evaluations |
| | |
| | MedGemma 4B and text-only MedGemma 27B were evaluated across a range of |
| | text-only benchmarks for medical knowledge and reasoning. |
| | |
| | The MedGemma models outperform their respective base Gemma models across all |
| | tested text-only health benchmarks. |
| | |
| | | Metric | Gemma 3 4B | MedGemma 4B | Gemma 3 27B | MedGemma 27B text-only | MedGemma 27B multimodal | |
| | | :---- | :---- | :---- | :---- | :---- | :---- | |
| | | MedQA (4-op) | 50.7 | 64.4 | 74.9 | 89.8 (best-of-5) 87.7 (0-shot) | 87.0 (best-of-5) 85.3 (0-shot) | |
| | | MedMCQA | 45.4 | 55.7 | 62.6 | 74.2 | 70.2 | |
| | | PubMedQA | 68.4 | 73.4 | 73.4 | 76.8 | 77.2 | |
| | | MMLU Med | 67.2 | 70.0 | 83.3 | 87.0 | 86.2 | |
| | | MedXpertQA (text only) | 11.6 | 14.2 | 15.7 | 25.7 | 23.7 | |
| | | AfriMed-QA (25 question test set) | 48.0 | 52.0 | 72.0 | 84.0 | 72.0 | |
| | |
| | For all MedGemma 27B results, [test-time |
| | scaling](https://arxiv.org/abs/2501.19393) is used to improve performance. |
| | |
| | #### Medical record evaluations |
| | |
| | All models were evaluated on a question answer dataset from synthetic FHIR data |
| | to answer questions about patient records. MedGemma 27B multimodal's |
| | FHIR-specific training gives it significant improvement over other MedGemma and |
| | Gemma models. |
| | |
| | | Metric | Gemma 3 4B | MedGemma 4B | Gemma 3 27B | MedGemma 27B text-only | MedGemma 27B multimodal | |
| | | :---- | :---- | :---- | :---- | :---- | :---- | |
| | | EHRQA | 70.9 | 67.6 | 84.2 | 86.3 | 90.5 | |
| | |
| | ### Ethics and safety evaluation |
| | |
| | #### Evaluation approach |
| | |
| | Our evaluation methods include structured evaluations and internal red-teaming |
| | testing of relevant content policies. Red-teaming was conducted by a number of |
| | different teams, each with different goals and human evaluation metrics. These |
| | models were evaluated against a number of different categories relevant to |
| | ethics and safety, including: |
| | |
| | * **Child safety**: Evaluation of text-to-text and image-to-text prompts |
| | covering child safety policies, including child sexual abuse and |
| | exploitation. |
| | * **Content safety**: Evaluation of text-to-text and image-to-text prompts |
| | covering safety policies, including harassment, violence and gore, and hate |
| | speech. |
| | * **Representational harms**: Evaluation of text-to-text and image-to-text |
| | prompts covering safety policies, including bias, stereotyping, and harmful |
| | associations or inaccuracies. |
| | * **General medical harms**: Evaluation of text-to-text and image-to-text |
| | prompts covering safety policies, including information quality and harmful |
| | associations or inaccuracies. |
| | |
| | In addition to development level evaluations, we conduct "assurance evaluations" |
| | which are our "arms-length" internal evaluations for responsibility governance |
| | decision making. They are conducted separately from the model development team, |
| | to inform decision making about release. High-level findings are fed back to the |
| | model team, but prompt sets are held out to prevent overfitting and preserve the |
| | results' ability to inform decision making. Notable assurance evaluation results |
| | are reported to our Responsibility & Safety Council as part of release review. |
| |
|
| | #### Evaluation results |
| |
|
| | For all areas of safety testing, we saw safe levels of performance across the |
| | categories of child safety, content safety, and representational harms. All |
| | testing was conducted without safety filters to evaluate the model capabilities |
| | and behaviors. For text-to-text, image-to-text, and audio-to-text, and across |
| | both MedGemma model sizes, the model produced minimal policy violations. A |
| | limitation of our evaluations was that they included primarily English language |
| | prompts. |
| |
|
| | ## Data card |
| |
|
| | ### Dataset overview |
| |
|
| | #### Training |
| |
|
| | The base Gemma models are pre-trained on a large corpus of text and code data. |
| | MedGemma multimodal variants utilize a |
| | [SigLIP](https://arxiv.org/abs/2303.15343) image encoder that has been |
| | specifically pre-trained on a variety of de-identified medical data, including |
| | radiology images, histopathology images, ophthalmology images, and dermatology |
| | images. Their LLM component is trained on a diverse set of medical data, |
| | including medical text, medical question-answer pairs, FHIR-based electronic |
| | health record data (27B multimodal only), radiology images, histopathology |
| | patches, ophthalmology images, and dermatology images. |
| |
|
| | #### Evaluation |
| |
|
| | MedGemma models have been evaluated on a comprehensive set of clinically |
| | relevant benchmarks, including over 22 datasets across 6 different tasks and 4 |
| | medical image modalities. These benchmarks include both open and internal |
| | datasets. |
| |
|
| | #### Source |
| |
|
| | MedGemma utilizes a combination of public and private datasets. |
| |
|
| | This model was trained on diverse public datasets including MIMIC-CXR (chest |
| | X-rays and reports), ChestImaGenome: Set of bounding boxes linking image |
| | findings with anatomical regions for MIMIC-CXR (MedGemma 27B multimodal only), |
| | SLAKE (multimodal medical images and questions), PAD-UFES-20 (skin lesion images |
| | and data), SCIN (dermatology images), TCGA (cancer genomics data), CAMELYON |
| | (lymph node histopathology images), PMC-OA (biomedical literature with images), |
| | and Mendeley Digital Knee X-Ray (knee X-rays). |
| |
|
| | Additionally, multiple diverse proprietary datasets were licensed and |
| | incorporated (described next). |
| |
|
| | ### Data ownership and documentation |
| |
|
| | * [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.1.0/): MIT Laboratory |
| | for Computational Physiology and Beth Israel Deaconess Medical Center |
| | (BIDMC). |
| | * [Slake-VQA](https://www.med-vqa.com/slake/): The Hong Kong Polytechnic |
| | University (PolyU), with collaborators including West China Hospital of |
| | Sichuan University and Sichuan Academy of Medical Sciences / Sichuan |
| | Provincial People's Hospital. |
| | * [PAD-UFES-20](https://pmc.ncbi.nlm.nih.gov/articles/PMC7479321/): Federal |
| | University of Espírito Santo (UFES), Brazil, through its Dermatological and |
| | Surgical Assistance Program (PAD). |
| | * [SCIN](https://github.com/google-research-datasets/scin): A collaboration |
| | between Google Health and Stanford Medicine. |
| | * [TCGA](https://portal.gdc.cancer.gov/) (The Cancer Genome Atlas): A joint |
| | effort of National Cancer Institute and National Human Genome Research |
| | Institute. Data from TCGA are available via the Genomic Data Commons (GDC) |
| | * [CAMELYON](https://camelyon17.grand-challenge.org/Data/): The data was |
| | collected from Radboud University Medical Center and University Medical |
| | Center Utrecht in the Netherlands. |
| | * [PMC-OA (PubMed Central Open Access |
| | Subset)](https://catalog.data.gov/dataset/pubmed-central-open-access-subset-pmc-oa): |
| | Maintained by the National Library of Medicine (NLM) and National Center for |
| | Biotechnology Information (NCBI), which are part of the NIH. |
| | * [MedQA](https://arxiv.org/pdf/2009.13081): This dataset was created by a |
| | team of researchers led by Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung |
| | Weng, Hanyi Fang, and Peter Szolovits |
| | * [Mendeley Digital Knee |
| | X-Ray](https://data.mendeley.com/datasets/t9ndx37v5h/1): This dataset is |
| | from Rani Channamma University, and is hosted on Mendeley Data. |
| | * [AfriMed-QA](https://afrimedqa.com/): This data was developed and led by |
| | multiple collaborating organizations and researchers include key |
| | contributors: Intron Health, SisonkeBiotik, BioRAMP, Georgia Institute of |
| | Technology, and MasakhaneNLP. |
| | * [VQA-RAD](https://www.nature.com/articles/sdata2018251): This dataset was |
| | created by a research team led by Jason J. Lau, Soumya Gayen, Asma Ben |
| | Abacha, and Dina Demner-Fushman and their affiliated institutions (the US |
| | National Library of Medicine and National Institutes of Health) |
| | * [Chest ImaGenome](https://physionet.org/content/chest-imagenome/1.0.0/): IBM |
| | Research. |
| | * [MedExpQA](https://www.sciencedirect.com/science/article/pii/S0933365724001805): |
| | This dataset was created by researchers at the HiTZ Center (Basque Center |
| | for Language Technology and Artificial Intelligence). |
| | * [MedXpertQA](https://huggingface.co/datasets/TsinghuaC3I/MedXpertQA): This |
| | dataset was developed by researchers at Tsinghua University (Beijing, China) |
| | and Shanghai Artificial Intelligence Laboratory (Shanghai, China). |
| | * [HealthSearchQA](https://huggingface.co/datasets/katielink/healthsearchqa): |
| | This dataset consists of consisting of 3,173 commonly searched consumer |
| | questions |
| | |
| | In addition to the public datasets listed above, MedGemma was also trained on |
| | de-identified, licensed datasets or datasets collected internally at Google from |
| | consented participants. |
| |
|
| | * **Radiology dataset 1:** De-identified dataset of different CT studies |
| | across body parts from a US-based radiology outpatient diagnostic center |
| | network. |
| | * **Ophthalmology dataset 1 (EyePACS):** De-identified dataset of fundus |
| | images from diabetic retinopathy screening. |
| | * **Dermatology dataset 1:** De-identified dataset of teledermatology skin |
| | condition images (both clinical and dermatoscopic) from Colombia. |
| | * **Dermatology dataset 2:** De-identified dataset of skin cancer images (both |
| | clinical and dermatoscopic) from Australia. |
| | * **Dermatology dataset 3:** De-identified dataset of non-diseased skin images |
| | from an internal data collection effort. |
| | * **Pathology dataset 1:** De-identified dataset of histopathology H\&E whole |
| | slide images created in collaboration with an academic research hospital and |
| | biobank in Europe. Comprises de-identified colon, prostate, and lymph nodes. |
| | * **Pathology dataset 2:** De-identified dataset of lung histopathology H\&E |
| | and IHC whole slide images created by a commercial biobank in the United |
| | States. |
| | * **Pathology dataset 3:** De-identified dataset of prostate and lymph node |
| | H\&E and IHC histopathology whole slide images created by a contract |
| | research organization in the United States. |
| | * **Pathology dataset 4:** De-identified dataset of histopathology whole slide |
| | images created in collaboration with a large, tertiary teaching hospital in |
| | the United States. Comprises a diverse set of tissue and stain types, |
| | predominantly H\&E. |
| | * **EHR dataset 1:** Question/answer dataset drawn from synthetic FHIR records |
| | created by [Synthea.](https://synthetichealth.github.io/synthea/) The test |
| | set includes 19 unique patients with 200 questions per patient divided into |
| | 10 different categories. |
| | |
| | ### Data citation |
| |
|
| | * **MIMIC-CXR:** Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, |
| | S. (2024). MIMIC-CXR Database (version 2.1.0). PhysioNet. |
| | [https://physionet.org/content/mimic-cxr/2.1.0/](https://physionet.org/content/mimic-cxr/2.1.0/) |
| | *and* Johnson, Alistair E. W., Tom J. Pollard, Seth J. Berkowitz, Nathaniel |
| | R. Greenbaum, Matthew P. Lungren, Chih-Ying Deng, Roger G. Mark, and Steven |
| | Horng. 2019\. "MIMIC-CXR, a de-Identified Publicly Available Database of |
| | Chest Radiographs with Free-Text Reports." *Scientific Data 6* (1): 1–8. |
| | |
| | * **SLAKE:** Liu, Bo, Li-Ming Zhan, Li Xu, Lin Ma, Yan Yang, and Xiao-Ming Wu. |
| | 2021.SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical |
| | Visual Question Answering." |
| | [http://arxiv.org/abs/2102.09542](http://arxiv.org/abs/2102.09542). |
| | |
| | * **PAD-UEFS-20:** Pacheco, Andre GC, et al. "PAD-UFES-20: A skin lesion |
| | dataset composed of patient data and clinical images collected from |
| | smartphones." *Data in brief* 32 (2020): 106221\. |
| | |
| | * **SCIN:** Ward, Abbi, Jimmy Li, Julie Wang, Sriram Lakshminarasimhan, Ashley |
| | Carrick, Bilson Campana, Jay Hartford, et al. 2024\. "Creating an Empirical |
| | Dermatology Dataset Through Crowdsourcing With Web Search Advertisements." |
| | *JAMA Network Open 7* (11): e2446615–e2446615. |
| | |
| | * **TCGA:** The results shown here are in whole or part based upon data |
| | generated by the TCGA Research Network: |
| | [https://www.cancer.gov/tcga](https://www.cancer.gov/tcga). |
| | |
| | * **CAMELYON16:** Ehteshami Bejnordi, Babak, Mitko Veta, Paul Johannes van |
| | Diest, Bram van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen A. W. M. |
| | van der Laak, et al. 2017\. "Diagnostic Assessment of Deep Learning |
| | Algorithms for Detection of Lymph Node Metastases in Women With Breast |
| | Cancer." *JAMA 318* (22): 2199–2210. |
| | |
| | * **Mendeley Digital Knee X-Ray:** Gornale, Shivanand; Patravali, Pooja |
| | (2020), "Digital Knee X-ray Images", Mendeley Data, V1, doi: |
| | 10.17632/t9ndx37v5h.1 |
| | |
| | * **VQA-RAD:** Lau, Jason J., Soumya Gayen, Asma Ben Abacha, and Dina |
| | Demner-Fushman. 2018\. "A Dataset of Clinically Generated Visual Questions |
| | and Answers about Radiology Images." *Scientific Data 5* (1): 1–10. |
| | |
| | * **Chest ImaGenome:** Wu, J., Agu, N., Lourentzou, I., Sharma, A., Paguio, |
| | J., Yao, J. S., Dee, E. C., Mitchell, W., Kashyap, S., Giovannini, A., Celi, |
| | L. A., Syeda-Mahmood, T., & Moradi, M. (2021). Chest ImaGenome Dataset |
| | (version 1.0.0). PhysioNet. RRID:SCR\_007345. |
| | [https://doi.org/10.13026/wv01-y230](https://doi.org/10.13026/wv01-y230) |
| | |
| | * **MedQA:** Jin, Di, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang, |
| | and Peter Szolovits. 2020\. "What Disease Does This Patient Have? A |
| | Large-Scale Open Domain Question Answering Dataset from Medical Exams." |
| | [http://arxiv.org/abs/2009.13081](http://arxiv.org/abs/2009.13081). |
| | |
| | * **AfrimedQA:** Olatunji, Tobi, Charles Nimo, Abraham Owodunni, Tassallah |
| | Abdullahi, Emmanuel Ayodele, Mardhiyah Sanni, Chinemelu Aka, et al. 2024\. |
| | "AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering |
| | Benchmark Dataset." |
| | [http://arxiv.org/abs/2411.15640](http://arxiv.org/abs/2411.15640). |
| | |
| | * **MedExpQA:** Alonso, I., Oronoz, M., & Agerri, R. (2024). MedExpQA: |
| | Multilingual Benchmarking of Large Language Models for Medical Question |
| | Answering. *arXiv preprint arXiv:2404.05590*. Retrieved from |
| | [https://arxiv.org/abs/2404.05590](https://arxiv.org/abs/2404.05590) |
| | |
| | * **MedXpertQA:** Zuo, Yuxin, Shang Qu, Yifei Li, Zhangren Chen, Xuekai Zhu, |
| | Ermo Hua, Kaiyan Zhang, Ning Ding, and Bowen Zhou. 2025\. "MedXpertQA: |
| | Benchmarking Expert-Level Medical Reasoning and Understanding." |
| | [http://arxiv.org/abs/2501.18362](http://arxiv.org/abs/2501.18362). |
| | |
| | ### De-identification/anonymization: |
| |
|
| | Google and its partners utilize datasets that have been rigorously anonymized or |
| | de-identified to ensure the protection of individual research participants and |
| | patient privacy. |
| |
|
| | ## Implementation information |
| |
|
| | Details about the model internals. |
| |
|
| | ### Software |
| |
|
| | Training was done using [JAX](https://github.com/jax-ml/jax). |
| |
|
| | JAX allows researchers to take advantage of the latest generation of hardware, |
| | including TPUs, for faster and more efficient training of large models. |
| |
|
| | ## Use and limitations |
| |
|
| | ### Intended use |
| |
|
| | MedGemma is an open multimodal generative AI model intended to be used as a |
| | starting point that enables more efficient development of downstream healthcare |
| | applications involving medical text and images. MedGemma is intended for |
| | developers in the life sciences and healthcare space. Developers are responsible |
| | for training, adapting and making meaningful changes to MedGemma to accomplish |
| | their specific intended use. MedGemma models can be fine-tuned by developers |
| | using their own proprietary data for their specific tasks or solutions. |
| |
|
| | MedGemma is based on Gemma 3 and has been further trained on medical images and |
| | text. MedGemma enables further development in any medical context (image and |
| | textual), however the model was pre-trained using chest X-ray, pathology, |
| | dermatology, and fundus images. Examples of tasks within MedGemma's training |
| | include visual question answering pertaining to medical images, such as |
| | radiographs, or providing answers to textual medical questions. Full details of |
| | all the tasks MedGemma has been evaluated can be found in the [MedGemma |
| | Technical Report](https://arxiv.org/abs/2507.05201). |
| |
|
| | ### Benefits |
| |
|
| | * Provides strong baseline medical image and text comprehension for models of |
| | its size. |
| | * This strong performance makes it efficient to adapt for downstream |
| | healthcare-based use cases, compared to models of similar size without |
| | medical data pre-training. |
| | * This adaptation may involve prompt engineering, grounding, agentic |
| | orchestration or fine-tuning depending on the use case, baseline validation |
| | requirements, and desired performance characteristics. |
| | |
| | ### Limitations |
| |
|
| | MedGemma is not intended to be used without appropriate validation, adaptation |
| | and/or making meaningful modification by developers for their specific use case. |
| | The outputs generated by MedGemma are not intended to directly inform clinical |
| | diagnosis, patient management decisions, treatment recommendations, or any other |
| | direct clinical practice applications. Performance benchmarks highlight baseline |
| | capabilities on relevant benchmarks, but even for image and text domains that |
| | constitute a substantial portion of training data, inaccurate model output is |
| | possible. All outputs from MedGemma should be considered preliminary and require |
| | independent verification, clinical correlation, and further investigation |
| | through established research and development methodologies. |
| |
|
| | MedGemma's multimodal capabilities have been primarily evaluated on single-image |
| | tasks. MedGemma has not been evaluated in use cases that involve comprehension |
| | of multiple images. |
| |
|
| | MedGemma has not been evaluated or optimized for multi-turn applications. |
| |
|
| | MedGemma's training may make it more sensitive to the specific prompt used than |
| | Gemma 3. |
| |
|
| | When adapting MedGemma developer should consider the following: |
| |
|
| | * **Bias in validation data:** As with any research, developers should ensure |
| | that any downstream application is validated to understand performance using |
| | data that is appropriately representative of the intended use setting for |
| | the specific application (e.g., age, sex, gender, condition, imaging device, |
| | etc). |
| | * **Data contamination concerns**: When evaluating the generalization |
| | capabilities of a large model like MedGemma in a medical context, there is a |
| | risk of data contamination, where the model might have inadvertently seen |
| | related medical information during its pre-training, potentially |
| | overestimating its true ability to generalize to novel medical concepts. |
| | Developers should validate MedGemma on datasets not publicly available or |
| | otherwise made available to non-institutional researchers to mitigate this |
| | risk. |
| | |