MedGemma TB IT LoRA (Vision Fine-Tuned)
Model Details
Model Description
mnbvcxzz4869/medgemma-tb-it-lora-tb is a LoRA fine-tuned vision adapter built on top of MedGemma-4B-IT, designed to improve the model’s ability to analyze chest X-ray images for Tuberculosis (TB)-related radiographic patterns.
The fine-tuning process is applied exclusively to the vision component of the multimodal model. The language model and its general medical knowledge remain unchanged. Multimodal reasoning (text + image) is achieved at inference time through prompt engineering, not through multimodal retraining.
This model is intended for research and clinical decision support system (CDSS) prototyping, particularly for TB screening workflows.
This model is derived from MedGemma and is subject to the Health AI Developer Foundations terms of use governing the base model.
- Developed by: mnbvcxzz4869
- Model type: Vision-Language Model (LoRA adapter for vision encoder)
- Base model: google/medgemma-4b-it
- License: Apache 2.0 (adapter weights); base model terms apply
- Fine-tuning method: LoRA (vision-only)
Uses
Direct Use
- Analysis of chest X-ray images with a focus on identifying visual patterns associated with pulmonary tuberculosis
- Research and prototyping of AI-assisted TB screening tools
- Integration into chatbot-based CDSS systems using prompt-based multimodal reasoning
Downstream Use
- Clinical decision support research
- Educational tools for radiology and infectious disease
- Integration into web-based medical AI applications (e.g., Streamlit-based systems)
Out-of-Scope Use
- Automated or standalone medical diagnosis
- Treatment recommendation or prescription
- Use for diseases outside tuberculosis without further validation
- Clinical deployment or patient management without regulatory approval
Bias, Risks, and Limitations
- The model is trained on public chest X-ray datasets, which may not represent all populations, imaging devices, or clinical environments.
- Radiographic findings alone are not sufficient for definitive TB diagnosis and must be interpreted alongside clinical and laboratory data.
- The model may generate inaccurate or incomplete interpretations, especially for low-quality or out-of-distribution images.
- As this model builds upon MedGemma, it inherits the base model’s limitations, including sensitivity to prompt formulation and lack of clinical validation. The model has not been evaluated for multi-image or multi-turn clinical reasoning scenarios.
Evaluation Results
| Metric | Value |
|---|---|
| Eval Loss | 2.2045 |
| Eval Model Preparation Time | 0.0197 |
| Eval Runtime | 94.6703 |
| Eval Samples Per Second | 8.8730 |
| Eval Steps Per Second | 1.1090 |
How to Get Started with the Model
import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel
base_model_id = "google/medgemma-4b-it"
adapter_id = "mnbvcxzz4869/medgemma-tb-it-lora-tb"
processor = AutoProcessor.from_pretrained(base_model_id)
model = AutoModelForImageTextToText.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_id)