---
base_model: google/medgemma-1.5-4b-it
library_name: peft
tags:
- base_model:adapter:google/medgemma-1.5-4b-it
- lora
- transformers
- medical
- dermatology
- multimodal
- vision-language
- change-detection
- temporal-analysis
license: other
datasets:
- dunktra/dermacheck-temporal-pairs
language:
- en
metrics:
- f1
- precision
- accuracy
- recall
pipeline_tag: image-text-to-text
---

# MedGemma Temporal Change Detection (LoRA Adapter)

This repository provides **LoRA adapters** fine-tuned on top of **google/medgemma-1.5-4b-it** for exploring **temporal change detection in dermatoscopic image pairs**.
The project investigates whether lightweight parameter-efficient fine-tuning can adapt a multimodal medical foundation model to a **novel temporal reasoning task**.


## Model Details

### Model Description

This repository contains LoRA adapters only, not a full model checkpoint.

- **Developed and shared by:** Dung Claire Tran ([@dunktra](https://huggingface.co/dunktra))
- **Base Model:** [google/medgemma-1.5-4b-it](https://huggingface.co/google/medgemma-1.5-4b-it)
- **Fine-Tuning Method:** LoRA (Low-Rank Adaptation, PEFT)
- **Model type:** Vision–Language Model (VLM) adapter
- **Task:** Binary classification of temporal change in skin lesion image pairs
- **Dataset:** [dunktra/dermacheck-temporal-pairs](https://huggingface.co/datasets/dunktra/dermacheck-temporal-pairs) (synthetic temporal pairs)
- **Language(s) (NLP):** English
- **License:** Inherits license from google/medgemma-1.5-4b-it 

### Model Sources

- **Repository:** [Kaggle notebook (training & evaluation)](https://www.kaggle.com/code/dungclairetran/dermacheck-medgemma-lora-fine-tuning)


## Uses


### Direct Use

- Research and experimentation with **temporal reasoning in medical imaging**
- Evaluation of **LoRA fine-tuning feasibility** on multimodal medical foundation models
- Educational and benchmarking purposes


### Out-of-Scope Use

- Clinical diagnosis or medical decision-making
- Deployment in real-world healthcare settings without clinical validation

This model is **not a medical device**.

## Limitations

- Fine-tuning effects may not surface when using **keyword-based label extraction**
- Binary classification may mask improvements in:
  - reasoning structure
  - explanatory language
  - uncertainty expression
- Synthetic temporal data limits real-world generalization
- Inherits all limitations of the base MedGemma model

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

Use the code below to get started with the model.

```
from transformers import AutoModelForVision2Seq, AutoProcessor
from peft import PeftModel
import torch

base_model = AutoModelForVision2Seq.from_pretrained(
    "google/medgemma-1.5-4b-it",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

model = PeftModel.from_pretrained(
    base_model,
    "dunktra/medgemma-temporal-lora"
)

processor = AutoProcessor.from_pretrained(
    "dunktra/medgemma-temporal-lora"
)
```

## Training Details

### Training Data

- **Source:** [dunktra/dermacheck-temporal-pairs](https://huggingface.co/datasets/dunktra/dermacheck-temporal-pairs)
- **Description:** Synthetic before/after dermatoscopic image pairs labeled for temporal change
- **Splits:**
  - **Training:** ~630 pairs
  - **Validation:** ~135 pairs
  - **Test:** 135 pairs
**Note:** *The dataset consists of **synthetic temporal pairs**, not real longitudinal patient data.*

### Training Configuration

- **LoRA Rank (r):** 8
- **LoRA Alpha:** 16
- **Target Modules:** q_proj, k_proj, v_proj, o_proj
- **LoRA Dropout:** 0.05
- **Epochs:** 3
- **Effective Batch Size:** 16
- **Learning Rate:** 2e-4
- **Precision:** bfloat16
- **Frameworks:** Transformers + PEFT

## Evaluation

#### Metrics

- Precision
- Recall
- F1 score (binary classification)

### Results (Test Set: 135 temporal pairs)

| Metric     | Base MedGemma | Fine-Tuned (LoRA) | Change |
|------------|---------------|-------------------|--------|
| F1 Score   | 0.8797        | 0.8797            | +0.00% |
| Precision  | 0.7852        | 0.7852            | +0.00% |
| Recall     | 1.0000        | 1.0000            | +0.00% |

LoRA fine-tuning **did not** yield measurable improvements under the current evaluation protocol.

**Note:** Although LoRA fine-tuning did not improve aggregate F1 on the held-out test set, analysis revealed that both the base and fine-tuned models collapsed to a high-recall regime, predicting “change” for all examples. This indicates that the primary performance bottleneck lies in task framing and decision extraction rather than model capacity. The experiment demonstrates stable LoRA adaptation without regression and highlights the importance of evaluation design in generative medical VLMs.

### Qualitative Analysis

- No test cases were found where the fine-tuned model corrected errors made by the base model.
- Fine-tuning did not alter binary decision outcomes given the current response-parsing heuristic.
  

## License

- This adapter inherits the license and usage restrictions of:
  - **google/medgemma-1.5-4b-it**
  - Underlying datasets used by the base model
- Non-commercial research use only.

## Acknowledgements

- Google MedGemma team
- PEFT / Hugging Face ecosystem
*Created for the **MedGemma Impact Challenge 2026 – Novel Task Exploration**.*

## Model Card Contact

[dunktra](https://huggingface.co/dunktra)
### Framework versions

- PEFT 0.18.1