---
license: apache-2.0
base_model: google/medgemma-4b-it
tags:
- medical
- ecg
- cardiology
- vision-language
- medgemma
datasets:
- PULSE-ECG/ECGInstruct
language:
- en
metrics:
- accuracy
library_name: transformers
pipeline_tag: image-text-to-text
---

# MedGemma-4B ECGInstruct

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/19VGxD03skunSLLRe7gIMs_zHMj9_TolQ?usp=sharing)

Fine-tuned version of Google's MedGemma-4B-it model on the ECGInstruct dataset for automated ECG interpretation.

## Model Description

This is a fully merged fine-tuned version of [google/medgemma-4b-it](https://huggingface.co/google/medgemma-4b-it) trained on the [PULSE-ECG/ECGInstruct](https://huggingface.co/datasets/PULSE-ECG/ECGInstruct) dataset containing 1.15M ECG instruction-following examples. The LoRA adapter has been merged into the base model for easier deployment.

**Developed by:** ConvAI Innovations  
**Base Model:** google/medgemma-4b-it  
**Training Infrastructure:** AIRAWAT (C-DAC) - 8x NVIDIA A100 40GB GPUs  
**Training Duration:** 72 hours (3 days)  
**Final Token Accuracy:** 86.83%  
**Final Training Loss:** 0.6188  
**GPU-Hours:** 576  
**Model Size:** ~8.5 GB

## Training Details

### Training Data
- **Dataset:** PULSE-ECG/ECGInstruct (1.15M samples)
- **Samples:** 1,156,110 ECG image-text pairs
- **Image Sources:** MIMIC-IV-ECG (~800K), PTB-XL (22K), CODE-15% (346K), ChapmanShaoxing
- **Task:** Vision-language instruction following for ECG interpretation
- **Demographics:** Age range 0-95 years, 52% male / 48% female
- **Disease Classes:** 5 superclasses (NORM, MI, STTC, CD, HYP), 24 subclasses

### Training Procedure

**Hardware:**
- 8x NVIDIA A100 40GB GPUs (AIRAWAT supercomputer)
- Distributed training with PyTorch DDP

**Training Configuration:**
- Fine-tuning method: LoRA (r=32, alpha=64, dropout=0.05)
- Target modules: all-linear (including vision encoder)
- Learning rate: 1.2e-5 with cosine decay
- Batch size: 192 effective (4 per GPU × 8 GPUs × 6 gradient accumulation)
- Optimizer: AdamW (fused)
- Precision: bfloat16
- Gradient checkpointing: Enabled
- Max sequence length: 2048 tokens
- Max new tokens: 512

**Training Metrics:**
- Final training loss: 0.6188
- Mean token accuracy: 86.83%
- Training throughput: ~9.67 samples/sec
- Total tokens processed: 103M+

## Usage

### Installation

```bash
pip install transformers pillow torch
```

### Loading the Model

```python
from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image

# Load model and processor
model_id = "convaiinnovations/medgemma-4b-ecginstruct"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)
```

### Inference Example

```python
# Load ECG image
image = Image.open("ecg_image.png").convert("RGB")

# Prepare prompt using chat template
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Analyze this ECG and provide a detailed interpretation."}
        ]
    }
]

# Process input
text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = processor(text=[text], images=[[image]], return_tensors="pt", padding=True)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

# Generate interpretation
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=False
)

# Decode and print
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Example Prompts

```python
# Detailed interpretation
"Analyze this ECG and provide a detailed interpretation."

# Specific abnormality detection
"What abnormalities are present in this ECG?"

# Diagnosis suggestion
"Based on this ECG, what is the most likely diagnosis?"

# Question answering
"Does this ECG show signs of atrial fibrillation?"

# Rate and rhythm
"What is the heart rate and rhythm in this ECG?"
```

## Model Capabilities

This model can:
- ✅ Interpret 12-lead ECG images
- ✅ Identify cardiac abnormalities (arrhythmias, ischemia, hypertrophy, conduction blocks, etc.)
- ✅ Generate detailed clinical reports
- ✅ Answer specific questions about ECG findings
- ✅ Provide diagnostic suggestions
- ✅ Assess heart rate, rhythm, and axis
- ✅ Detect ST-segment changes and T-wave abnormalities

## Performance

**Training Metrics:**
| Metric | Value |
|--------|-------|
| Token Accuracy | 86.83% |
| Final Loss | 0.6188 |
| Training Time | 72 hours |
| GPU-Hours | 576 |

**Inference Metrics (A100 GPU):**
| Metric | Value |
|--------|-------|
| TTFT (Time to First Token) | ~150ms |
| ISL (Input Sequence Length) | 2048 tokens |
| OSL (Output Sequence Length) | 512 tokens |
| End-to-End Latency | 2-3 seconds |
| Throughput | ~45 tokens/sec |

## Limitations

- Trained primarily on adult ECG data
- Performance may vary on pediatric ECGs
- Should not replace professional medical diagnosis
- Requires high-quality ECG images for optimal results
- May struggle with very rare or unusual ECG patterns
- Limited to English language outputs

## Ethical Considerations

> ⚠️ **MEDICAL DISCLAIMER**
> 
> **This model is for RESEARCH AND EDUCATIONAL PURPOSES ONLY.**
> 
> - ❌ NOT validated for clinical use
> - ❌ NOT FDA/CE approved
> - ❌ NOT a substitute for professional medical diagnosis
> - ❌ Should NOT be used for patient care decisions
> 
> **Always consult qualified healthcare professionals for medical decisions.**

**Important Notes:**
- This is an AI model and can make mistakes
- ECG interpretation requires clinical context
- Model outputs should be verified by trained clinicians
- Not approved for clinical use or diagnostic purposes
- Use responsibly and within appropriate medical oversight
- Has not been tested on external clinical datasets

## Intended Use

**Appropriate Uses:**
- Research in medical AI and computer vision
- Educational demonstrations of ECG interpretation
- Development of clinical decision support prototypes
- Benchmarking ECG analysis algorithms

**Inappropriate Uses:**
- Direct patient diagnosis without physician review
- Replacement of trained medical professionals
- Use in emergency or critical care settings without oversight
- Commercial deployment without proper validation

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{medgemma-ecginstruct,
  author = {convaiinnovations},
  title = {MedGemma-4B ECGInstruct: Fine-tuned ECG Interpretation Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/convaiinnovations/medgemma-4b-ecginstruct}}
}
```

## Acknowledgments

- **Base Model:** Google's MedGemma team for the foundation model
- **Dataset:** PULSE-ECG team for the ECGInstruct dataset
- **Infrastructure:** AIRAWAT AI Innovation Challenge by C-DAC (Centre for Development of Advanced Computing)
- **Frameworks:** HuggingFace Transformers, PEFT, TRL, PyTorch

## Related Resources

- **LoRA Adapter Version:** [convaiinnovations/medgemma-4b-ecginstruct-lora](https://huggingface.co/convaiinnovations/medgemma-4b-ecginstruct-lora)
- **Base Model:** [google/medgemma-4b-it](https://huggingface.co/google/medgemma-4b-it)
- **Training Dataset:** [PULSE-ECG/ECGInstruct](https://huggingface.co/datasets/PULSE-ECG/ECGInstruct)

## License

Apache 2.0 (following base model license)

## Contact

For questions or issues, please open an issue on the model repository or contact the maintainers.