MedGemma-4B ECGInstruct

Open In Colab

Fine-tuned version of Google's MedGemma-4B-it model on the ECGInstruct dataset for automated ECG interpretation.

Model Description

This is a fully merged fine-tuned version of google/medgemma-4b-it trained on the PULSE-ECG/ECGInstruct dataset containing 1.15M ECG instruction-following examples. The LoRA adapter has been merged into the base model for easier deployment.

Developed by: ConvAI Innovations
Base Model: google/medgemma-4b-it
Training Infrastructure: AIRAWAT (C-DAC) - 8x NVIDIA A100 40GB GPUs
Training Duration: 72 hours (3 days)
Final Token Accuracy: 86.83%
Final Training Loss: 0.6188
GPU-Hours: 576
Model Size: ~8.5 GB

Training Details

Training Data

  • Dataset: PULSE-ECG/ECGInstruct (1.15M samples)
  • Samples: 1,156,110 ECG image-text pairs
  • Image Sources: MIMIC-IV-ECG (~800K), PTB-XL (22K), CODE-15% (346K), ChapmanShaoxing
  • Task: Vision-language instruction following for ECG interpretation
  • Demographics: Age range 0-95 years, 52% male / 48% female
  • Disease Classes: 5 superclasses (NORM, MI, STTC, CD, HYP), 24 subclasses

Training Procedure

Hardware:

  • 8x NVIDIA A100 40GB GPUs (AIRAWAT supercomputer)
  • Distributed training with PyTorch DDP

Training Configuration:

  • Fine-tuning method: LoRA (r=32, alpha=64, dropout=0.05)
  • Target modules: all-linear (including vision encoder)
  • Learning rate: 1.2e-5 with cosine decay
  • Batch size: 192 effective (4 per GPU × 8 GPUs × 6 gradient accumulation)
  • Optimizer: AdamW (fused)
  • Precision: bfloat16
  • Gradient checkpointing: Enabled
  • Max sequence length: 2048 tokens
  • Max new tokens: 512

Training Metrics:

  • Final training loss: 0.6188
  • Mean token accuracy: 86.83%
  • Training throughput: ~9.67 samples/sec
  • Total tokens processed: 103M+

Usage

Installation

pip install transformers pillow torch

Loading the Model

from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image

# Load model and processor
model_id = "convaiinnovations/medgemma-4b-ecginstruct"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

Inference Example

# Load ECG image
image = Image.open("ecg_image.png").convert("RGB")

# Prepare prompt using chat template
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Analyze this ECG and provide a detailed interpretation."}
        ]
    }
]

# Process input
text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = processor(text=[text], images=[[image]], return_tensors="pt", padding=True)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

# Generate interpretation
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=False
)

# Decode and print
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)

Example Prompts

# Detailed interpretation
"Analyze this ECG and provide a detailed interpretation."

# Specific abnormality detection
"What abnormalities are present in this ECG?"

# Diagnosis suggestion
"Based on this ECG, what is the most likely diagnosis?"

# Question answering
"Does this ECG show signs of atrial fibrillation?"

# Rate and rhythm
"What is the heart rate and rhythm in this ECG?"

Model Capabilities

This model can:

  • ✅ Interpret 12-lead ECG images
  • ✅ Identify cardiac abnormalities (arrhythmias, ischemia, hypertrophy, conduction blocks, etc.)
  • ✅ Generate detailed clinical reports
  • ✅ Answer specific questions about ECG findings
  • ✅ Provide diagnostic suggestions
  • ✅ Assess heart rate, rhythm, and axis
  • ✅ Detect ST-segment changes and T-wave abnormalities

Performance

Training Metrics:

Metric Value
Token Accuracy 86.83%
Final Loss 0.6188
Training Time 72 hours
GPU-Hours 576

Inference Metrics (A100 GPU):

Metric Value
TTFT (Time to First Token) ~150ms
ISL (Input Sequence Length) 2048 tokens
OSL (Output Sequence Length) 512 tokens
End-to-End Latency 2-3 seconds
Throughput ~45 tokens/sec

Limitations

  • Trained primarily on adult ECG data
  • Performance may vary on pediatric ECGs
  • Should not replace professional medical diagnosis
  • Requires high-quality ECG images for optimal results
  • May struggle with very rare or unusual ECG patterns
  • Limited to English language outputs

Ethical Considerations

⚠️ MEDICAL DISCLAIMER

This model is for RESEARCH AND EDUCATIONAL PURPOSES ONLY.

  • ❌ NOT validated for clinical use
  • ❌ NOT FDA/CE approved
  • ❌ NOT a substitute for professional medical diagnosis
  • ❌ Should NOT be used for patient care decisions

Always consult qualified healthcare professionals for medical decisions.

Important Notes:

  • This is an AI model and can make mistakes
  • ECG interpretation requires clinical context
  • Model outputs should be verified by trained clinicians
  • Not approved for clinical use or diagnostic purposes
  • Use responsibly and within appropriate medical oversight
  • Has not been tested on external clinical datasets

Intended Use

Appropriate Uses:

  • Research in medical AI and computer vision
  • Educational demonstrations of ECG interpretation
  • Development of clinical decision support prototypes
  • Benchmarking ECG analysis algorithms

Inappropriate Uses:

  • Direct patient diagnosis without physician review
  • Replacement of trained medical professionals
  • Use in emergency or critical care settings without oversight
  • Commercial deployment without proper validation

Citation

If you use this model in your research, please cite:

@misc{medgemma-ecginstruct,
  author = {convaiinnovations},
  title = {MedGemma-4B ECGInstruct: Fine-tuned ECG Interpretation Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/convaiinnovations/medgemma-4b-ecginstruct}}
}

Acknowledgments

  • Base Model: Google's MedGemma team for the foundation model
  • Dataset: PULSE-ECG team for the ECGInstruct dataset
  • Infrastructure: AIRAWAT AI Innovation Challenge by C-DAC (Centre for Development of Advanced Computing)
  • Frameworks: HuggingFace Transformers, PEFT, TRL, PyTorch

Related Resources

License

Apache 2.0 (following base model license)

Contact

For questions or issues, please open an issue on the model repository or contact the maintainers.

Downloads last month
47
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for convaiinnovations/medgemma-4b-ecginstruct

Finetuned
(479)
this model

Dataset used to train convaiinnovations/medgemma-4b-ecginstruct