GemmaECG-Vision / README.md

Update README.md

6fd28d4 verified 5 months ago

4.52 kB

	---
	base_model:
	- google/gemma-3n-E2B-it
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- gemma3n
	- medical
	- vision-language
	- gemma
	- ecg
	- cardiology
	- healthcare
	license: cc-by-4.0
	datasets:
	- yasserrmd/pulse-ecg-instruct-subset
	language:
	- en
	---



	# GemmaECG-Vision

	<img src="GemmaECG Vision_ Future of Health.png" width="800" />

	`GemmaECG-Vision` is a fine-tuned vision-language model built on `google/gemma-3n-e2b`, designed for ECG image interpretation tasks. The model accepts a medical ECG image along with a clinical instruction prompt and generates a structured analysis suitable for triage or documentation use cases.

	This model was developed using Unsloth for efficient fine-tuning and supports image + text inputs with medical task-specific prompt formatting. It is designed to run in offline or edge environments, enabling healthcare triage in resource-constrained settings.

	## Model Objective

	To assist healthcare professionals and emergency responders by providing AI-generated ECG analysis directly from medical images, without requiring internet access or cloud resources.

	## Usage

	This model expects:
	- An ECG image (`PIL.Image`)
	- A textual instruction such as:

	```

	You are a clinical assistant specialized in ECG interpretation. Given an ECG image, generate a concise, structured, and medically accurate report.

	Use this exact format:

	Rhythm:
	PR Interval:
	QRS Duration:
	Axis:
	Bundle Branch Blocks:
	Atrial Abnormalities:
	Ventricular Hypertrophy:
	Q Wave or QS Complexes:
	T Wave Abnormalities:
	ST Segment Changes:
	Final Impression:

	````

	### Inference Example (Python)

	```python
	from transformers import AutoProcessor, Gemma3nForConditionalGeneration
	from PIL import Image
	import torch

	model_id = "yasserrmd/GemmaECG-Vision"
	model = Gemma3nForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.bfloat16).eval().to("cuda")
	processor = AutoProcessor.from_pretrained(model_id)

	image = Image.open("example_ecg.png").convert("RGB")

	messages = [
	{
	"role": "user",
	"content": [
	{"type": "image"},
	{"type": "text", "text": "Interpret this ECG and provide a structured triage report."}
	]
	}
	]

	prompt = processor.apply_chat_template(messages, add_generation_prompt=True)

	inputs = processor(image, prompt, return_tensors="pt").to("cuda")

	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	temperature=1.0,
	top_p=0.95,
	top_k=64,
	use_cache=True
	)

	result = processor.decode(outputs[0], skip_special_tokens=True)
	print(result)
	````

	## Training Details

	* Framework: Unsloth + TRL SFTTrainer
	* Hardware: Google Colab Pro (L4)
	* Batch Size: 2
	* Epochs: 1
	* Learning Rate: 2e-4
	* Scheduler: Cosine
	* Loss: CrossEntropy
	* Precision: bfloat16

	## Dataset

	The training dataset is a curated subset of the [PULSE-ECG/ECGInstruct](https://huggingface.co/datasets/PULSE-ECG/ECGInstruct) dataset, reformatted for VLM instruction tuning.

	* 3,272 samples of ECG image + structured instruction + clinical output
	* Focused on realistic and medically relevant triage cases

	Dataset link: [`yasserrmd/pulse-ecg-instruct-subset`](https://huggingface.co/datasets/yasserrmd/pulse-ecg-instruct-subset)



	### Training Loss Summary

	<img src="tl.png" >

	The model was fine-tuned over 409 steps using the `pulse-ecg-instruct-subset` dataset. The training loss started above 9.5 and steadily declined to below 0.5, showing consistent convergence and learning throughout the single epoch. The loss curve demonstrates a stable optimization process without overfitting spikes. The chart below visualizes this progression, highlighting the model’s ability to adapt quickly to the ECG image-to-text task.




	## Intended Use

	* Emergency triage in offline settings
	* On-device ECG assessment
	* Integration with medical edge devices (Jetson, Pi, Android)
	* Rapid analysis during disaster response

	## Limitations

	* Not intended to replace licensed medical professionals
	* Accuracy may vary depending on image quality
	* Model outputs should be reviewed by a clinician before action

	## License

	This model is licensed under CC BY 4.0. You are free to use, modify, and distribute it with attribution.

	## Author

	Mohamed Yasser
	[Hugging Face Profile](https://huggingface.co/yasserrmd)




	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)