MedVision Edge v4 โ€” Chest X-ray Screening (LoRA Adapter)

Fine-tuned Gemma 4 E4B-it (8B params) for automated chest X-ray pathology detection. Screens 5 conditions simultaneously with validated clinical accuracy, generates WHO-compliant treatment protocols, and outputs in 140+ languages natively.

This repo contains the LoRA adapter weights (660MB). For the merged full-precision model ready for direct inference, see j4rias/medvision-edge-v4-merged.

Resource Link
Live Demo HuggingFace Space
Merged Model j4rias/medvision-edge-v4-merged
Source Code GitHub
Video YouTube (3 min)

Model Details

Model Description

MedVision Edge is an AI-powered chest X-ray screening system designed for underserved communities where 2.2 billion people lack access to medical imaging (WHO, 2023). A community health worker photographs a chest X-ray with any smartphone and receives:

  1. Pathology detection for 5 conditions screened simultaneously
  2. WHO IMCI clinical protocols with evidence-based treatment guidelines (deterministic, zero hallucination)
  3. Weight-based drug dosing from verified lookup tables
  4. Referral urgency assessment with color-coded triage
  5. Native language output in 140+ languages via Gemma 4's built-in multilingual capability

The model was fine-tuned using Unsloth QLoRA on ~23,000 training examples derived from the NIH ChestX-ray14 dataset (112,120 images, 30,805 patients), with oversampling and augmentation for rare pathologies.

  • Developed by: Joel Arias (@j4rias)
  • Model type: Vision-Language Model (LoRA adapter for Gemma 4 E4B-it)
  • Language(s): 140+ languages (Gemma 4 native multilingual)
  • License: Apache 2.0
  • Fine-tuned from: unsloth/gemma-4-e4b-it-unsloth-bnb-4bit (Google Gemma 4 E4B-it)

Model Sources

Uses

Direct Use

Load the adapter with Unsloth for local inference on chest X-ray images:

from unsloth import FastVisionModel
from PIL import Image

# Load model + adapter
model, processor = FastVisionModel.from_pretrained(
    "j4rias/medvision-edge-v4",
    load_in_4bit=True,
)
FastVisionModel.for_inference(model)

# Prepare input
image = Image.open("chest_xray.jpg").convert("RGB")
messages = [
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Analyze this chest X-ray for: Pneumonia, Consolidation, Cardiomegaly, Pleural Effusion, Pulmonary Edema. For each: state YES or NO, then describe findings."},
    ]}
]

inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to("cuda")

output = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
print(processor.decode(output[0], skip_special_tokens=True))

Important: To load this adapter, use FastVisionModel.from_pretrained() from Unsloth. Do not use PeftModel.from_pretrained() โ€” it is incompatible with Gemma 4's Gemma4ClippableLinear layers.

For inference without Unsloth (e.g., on HuggingFace Spaces with ZeroGPU), use the merged model instead: j4rias/medvision-edge-v4-merged.

Downstream Use

  • Offline clinics: Deploy via Ollama or llama.cpp on consumer hardware (text reasoning; vision requires transformers)
  • Telemedicine platforms: Integrate via Gradio API or transformers pipeline
  • Research: Baseline for chest X-ray screening in low-resource settings

Out-of-Scope Use

  • Not a diagnostic tool. This is an AI screening assistant. All findings must be confirmed by a qualified medical professional.
  • Not validated for: CT scans, MRI, ultrasound, or non-chest radiographs.
  • Not intended for: Autonomous clinical decision-making without human oversight.

Bias, Risks, and Limitations

  • Dataset bias: Trained on NIH ChestX-ray14, which over-represents US hospital populations. Performance may vary on radiographs from different demographics, equipment, or imaging protocols.
  • Label noise: NIH labels are NLP-extracted from radiology reports (~15-20% estimated error rate), not radiologist-annotated. This limits ceiling performance, especially for Pneumonia and Consolidation.
  • False positives: The model tends to over-detect Pneumonia (382 FP / 1103 test) and Consolidation (375 FP / 1103 test). In clinical use, this means unnecessary referrals rather than missed diagnoses.
  • Pneumonia detection is weak: AUC 0.617 on NIH, 0.501 on CheXpert (only 11 positives = insufficient statistical power). Active development.
  • Single-view only: Trained on frontal (PA/AP) chest X-rays. Lateral views not supported.
  • Vision via GGUF not supported: The GGUF export does not include the vision encoder (mmproj). Image analysis requires the transformers library.

Recommendations

  • Always use with clinical oversight โ€” this is a screening aid, not a replacement for radiologists.
  • Review false positives carefully before clinical action.
  • For Pneumonia specifically, treat model output as low-confidence and prioritize clinical judgment.
  • Validate on your target population before deployment.

Training Details

Training Data

  • Source: NIH ChestX-ray14 (112,120 frontal chest X-rays, 30,805 patients, CC0/Public Domain)
  • Pathologies trained: Pneumonia, Consolidation, Cardiomegaly, Pleural Effusion, Pulmonary Edema
  • Training split: ~23,000 examples (from 8,821 base images with oversampling + augmentation)
    • 5x oversampling for Pneumonia and Consolidation (rare positives)
    • 3x oversampling for Cardiomegaly
    • Augmentation: brightness, contrast, rotation
  • Label format: Conversation-style (image + structured YES/NO per pathology with radiological descriptions)
  • Response length: Short (~80-120 tokens per response)

Training Procedure

Preprocessing

  • Images resized and normalized per Gemma 4 processor defaults
  • Conversation format with 5 varied prompt templates per pathology
  • Dataset v5: oversampled + augmented, balanced for rare positives

Training Hyperparameters

Parameter Value
LoRA rank (r) 64
LoRA alpha 64
LoRA dropout 0
Target modules all-linear
Vision layers fine-tuned Yes
Language layers fine-tuned Yes
Epochs 2
Learning rate 1e-4
LR scheduler cosine
Warmup ratio 0.1
Batch size 1
Gradient accumulation 8
Max sequence length 1024
Optimizer adamw_8bit
Weight decay 0.01
Max grad norm 0.3
Precision 4-bit (QLoRA via Unsloth)
Training regime bf16 mixed precision

Speeds, Sizes, Times

  • Training time: 4 hours 27 minutes (~16,000 seconds)
  • Steps: ~5,800
  • Speed: ~2.9 samples/sec
  • Hardware: NVIDIA RTX 5070 Ti (16GB VRAM)
  • Peak VRAM: ~10.7 GB
  • Final loss: ~0.089 (avg 0.2009)
  • Trainable parameters: ~82M / 8B total (1.02%)
  • Adapter size: 660 MB

Evaluation

Testing Data, Factors & Metrics

Testing Data

  1. NIH ChestX-ray14 held-out test set: 1,103 images with NLP-extracted ground truth labels
  2. CheXpert gold standard: 500 images annotated by 5 board-certified radiologists (Stanford)

Metrics

  • AUC (Area Under ROC Curve): Primary metric, threshold-independent discrimination ability
  • Sensitivity (Recall): Proportion of true positives correctly identified
  • Specificity: Proportion of true negatives correctly identified
  • Accuracy: Overall correct classification rate

Results

NIH Test Set (N=1,103 held-out images)

Pathology Base AUC Fine-tuned AUC Improvement Sensitivity Specificity
Cardiomegaly 0.490 0.832 +70% 0.826 0.838
Pulm. Edema 0.688 0.753 +9% 0.833 0.673
Pleural Effusion 0.605 0.703 +16% 0.680 0.725
Pneumonia 0.519 0.617 +19% 0.636 0.599
Consolidation 0.599 0.627 +5% 0.684 0.570

3/5 pathologies exceed AUC 0.70. All 5 improved vs. baseline Gemma 4.

CheXpert Gold Standard (N=500, 5-radiologist consensus, Stanford)

Pathology AUC Sensitivity Specificity
Pleural Effusion 0.797 0.952 0.641
Cardiomegaly 0.723 0.656 0.791
Consolidation 0.667 0.897 0.437
Pulm. Edema 0.668 0.500 0.837
Pneumonia* 0.501 0.636 0.366

*Pneumonia: only 11 positives (2.2% prevalence) in CheXpert test set โ€” insufficient statistical power.

Highlight: Pleural Effusion sensitivity of 95.2% on CheXpert โ€” catches 95 out of 100 cases.

Summary

This model demonstrates that fine-tuning Gemma 4 E4B on real clinical images produces genuine visual understanding (not text memorization). The base model scored near-random (AUC ~0.50) on Cardiomegaly; after fine-tuning, it achieves 0.832 โ€” a 70% improvement validated on independent test sets.

Environmental Impact

  • Hardware: 1x NVIDIA RTX 5070 Ti (16GB, consumer GPU)
  • Total GPU hours: ~43 hours (training 18.7h + evaluation 22.4h + misc 2h)
  • Training-only hours: 4.4 hours (v4 final run)
  • Cloud Provider: None (local workstation)
  • Total project cost: < $25 (electricity only)
  • Carbon Emitted: Estimated ~4.3 kg CO2eq (based on Colombia grid factor ~0.1 kg CO2/kWh, RTX 5070 Ti TDP 300W)

Technical Specifications

Model Architecture and Objective

  • Base model: Google Gemma 4 E4B-it (8B parameters with 4.5B effective, vision-language)
  • Fine-tuning method: QLoRA via Unsloth (4-bit quantized base + low-rank adapters)
  • LoRA rank: 64 on all linear layers (vision + language + attention + MLP)
  • Context length: 128K tokens (inherited from Gemma 4)
  • Objective: Supervised fine-tuning (SFT) on chest X-ray analysis conversations

Compute Infrastructure

Hardware

  • NVIDIA RTX 5070 Ti 16GB (local workstation)
  • 64GB system RAM
  • Arch Linux

Software

  • Python 3.14
  • PyTorch 2.10.0+cu128
  • Unsloth (latest)
  • Transformers >= 4.45.0
  • TRL (SFTTrainer)
  • PEFT 0.19.1

Training Iterations

This model is the result of 6 training iterations:

Version Key Change Best AUC Outcome
v1 Simple labels, 1 epoch ~0.50 Random โ€” text memorization
v2 Rich labels, 1 epoch ~0.50 Parser broken, same problem
v3 Short responses, 3 epochs, 3x oversample 0.787 First real learning
v4 +2 epochs from v3 0.807 Overfit, worse overall
v5 r=64, 5x oversample, augmentation 0.832 Best model
v6 RSNA clean labels 0.823 Did not improve โ€” locked v5

Each failure taught us something: long responses dilute gradient signal, low LoRA rank lacks capacity, and clean labels from a different distribution can hurt rather than help.

Citation

BibTeX:

@misc{arias2026medvisionedge,
  title={MedVision Edge: AI Radiology for Everyone},
  author={Arias, Joel},
  year={2026},
  howpublished={\url{https://huggingface.co/j4rias/medvision-edge-v4}},
  note={Fine-tuned Gemma 4 E4B for chest X-ray screening. Gemma 4 Good Hackathon submission.}
}

Acknowledgements

  • Google for the Gemma 4 model family and the Gemma 4 Good Hackathon
  • Unsloth for efficient QLoRA fine-tuning of vision-language models
  • NIH Clinical Center for the ChestX-ray14 dataset (CC0)
  • Stanford AIMI for the CheXpert gold-standard test set
  • WHO for the IMCI clinical protocols

Framework Versions

  • PEFT 0.19.1
  • Transformers >= 4.45.0
  • TRL (latest)
  • Unsloth (latest)
  • PyTorch 2.10.0+cu128
Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support