|
|
--- |
|
|
base_model: |
|
|
- google/gemma-3n-E2B-it |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- gemma3n |
|
|
- medical |
|
|
- vision-language |
|
|
- gemma |
|
|
- ecg |
|
|
- cardiology |
|
|
- healthcare |
|
|
license: cc-by-4.0 |
|
|
datasets: |
|
|
- yasserrmd/pulse-ecg-instruct-subset |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
# GemmaECG-Vision |
|
|
|
|
|
<img src="GemmaECG Vision_ Future of Health.png" width="800" /> |
|
|
|
|
|
`GemmaECG-Vision` is a fine-tuned vision-language model built on `google/gemma-3n-e2b`, designed for ECG image interpretation tasks. The model accepts a medical ECG image along with a clinical instruction prompt and generates a structured analysis suitable for triage or documentation use cases. |
|
|
|
|
|
This model was developed using **Unsloth** for efficient fine-tuning and supports **image + text** inputs with medical task-specific prompt formatting. It is designed to run in **offline or edge environments**, enabling healthcare triage in resource-constrained settings. |
|
|
|
|
|
## Model Objective |
|
|
|
|
|
To assist healthcare professionals and emergency responders by providing AI-generated ECG analysis directly from medical images, without requiring internet access or cloud resources. |
|
|
|
|
|
## Usage |
|
|
|
|
|
This model expects: |
|
|
- An ECG image (`PIL.Image`) |
|
|
- A textual instruction such as: |
|
|
|
|
|
``` |
|
|
|
|
|
You are a clinical assistant specialized in ECG interpretation. Given an ECG image, generate a concise, structured, and medically accurate report. |
|
|
|
|
|
Use this exact format: |
|
|
|
|
|
Rhythm: |
|
|
PR Interval: |
|
|
QRS Duration: |
|
|
Axis: |
|
|
Bundle Branch Blocks: |
|
|
Atrial Abnormalities: |
|
|
Ventricular Hypertrophy: |
|
|
Q Wave or QS Complexes: |
|
|
T Wave Abnormalities: |
|
|
ST Segment Changes: |
|
|
Final Impression: |
|
|
|
|
|
```` |
|
|
|
|
|
### Inference Example (Python) |
|
|
|
|
|
```python |
|
|
from transformers import AutoProcessor, Gemma3nForConditionalGeneration |
|
|
from PIL import Image |
|
|
import torch |
|
|
|
|
|
model_id = "yasserrmd/GemmaECG-Vision" |
|
|
model = Gemma3nForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.bfloat16).eval().to("cuda") |
|
|
processor = AutoProcessor.from_pretrained(model_id) |
|
|
|
|
|
image = Image.open("example_ecg.png").convert("RGB") |
|
|
|
|
|
messages = [ |
|
|
{ |
|
|
"role": "user", |
|
|
"content": [ |
|
|
{"type": "image"}, |
|
|
{"type": "text", "text": "Interpret this ECG and provide a structured triage report."} |
|
|
] |
|
|
} |
|
|
] |
|
|
|
|
|
prompt = processor.apply_chat_template(messages, add_generation_prompt=True) |
|
|
|
|
|
inputs = processor(image, prompt, return_tensors="pt").to("cuda") |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=256, |
|
|
temperature=1.0, |
|
|
top_p=0.95, |
|
|
top_k=64, |
|
|
use_cache=True |
|
|
) |
|
|
|
|
|
result = processor.decode(outputs[0], skip_special_tokens=True) |
|
|
print(result) |
|
|
```` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
* **Framework**: Unsloth + TRL SFTTrainer |
|
|
* **Hardware**: Google Colab Pro (L4) |
|
|
* **Batch Size**: 2 |
|
|
* **Epochs**: 1 |
|
|
* **Learning Rate**: 2e-4 |
|
|
* **Scheduler**: Cosine |
|
|
* **Loss**: CrossEntropy |
|
|
* **Precision**: bfloat16 |
|
|
|
|
|
## Dataset |
|
|
|
|
|
The training dataset is a curated subset of the [PULSE-ECG/ECGInstruct](https://huggingface.co/datasets/PULSE-ECG/ECGInstruct) dataset, reformatted for VLM instruction tuning. |
|
|
|
|
|
* 3,272 samples of ECG image + structured instruction + clinical output |
|
|
* Focused on realistic and medically relevant triage cases |
|
|
|
|
|
Dataset link: [`yasserrmd/pulse-ecg-instruct-subset`](https://huggingface.co/datasets/yasserrmd/pulse-ecg-instruct-subset) |
|
|
|
|
|
|
|
|
|
|
|
### **Training Loss Summary** |
|
|
|
|
|
<img src="tl.png" > |
|
|
|
|
|
The model was fine-tuned over 409 steps using the `pulse-ecg-instruct-subset` dataset. The training loss started above **9.5** and steadily declined to below **0.5**, showing consistent convergence and learning throughout the single epoch. The loss curve demonstrates a stable optimization process without overfitting spikes. The chart below visualizes this progression, highlighting the model’s ability to adapt quickly to the ECG image-to-text task. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Intended Use |
|
|
|
|
|
* Emergency triage in offline settings |
|
|
* On-device ECG assessment |
|
|
* Integration with medical edge devices (Jetson, Pi, Android) |
|
|
* Rapid analysis during disaster response |
|
|
|
|
|
## Limitations |
|
|
|
|
|
* Not intended to replace licensed medical professionals |
|
|
* Accuracy may vary depending on image quality |
|
|
* Model outputs should be reviewed by a clinician before action |
|
|
|
|
|
## License |
|
|
|
|
|
This model is licensed under **CC BY 4.0**. You are free to use, modify, and distribute it with attribution. |
|
|
|
|
|
## Author |
|
|
|
|
|
Mohamed Yasser |
|
|
[Hugging Face Profile](https://huggingface.co/yasserrmd) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |