Instructions to use j4rias/medvision-edge-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use j4rias/medvision-edge-v4 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-e4b-it-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "j4rias/medvision-edge-v4")

Transformers

How to use j4rias/medvision-edge-v4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="j4rias/medvision-edge-v4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("j4rias/medvision-edge-v4", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use j4rias/medvision-edge-v4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "j4rias/medvision-edge-v4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "j4rias/medvision-edge-v4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/j4rias/medvision-edge-v4

SGLang

How to use j4rias/medvision-edge-v4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "j4rias/medvision-edge-v4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "j4rias/medvision-edge-v4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "j4rias/medvision-edge-v4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "j4rias/medvision-edge-v4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use j4rias/medvision-edge-v4 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for j4rias/medvision-edge-v4 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for j4rias/medvision-edge-v4 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for j4rias/medvision-edge-v4 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="j4rias/medvision-edge-v4",
    max_seq_length=2048,
)

Docker Model Runner
How to use j4rias/medvision-edge-v4 with Docker Model Runner:
```
docker model run hf.co/j4rias/medvision-edge-v4
```

MedVision Edge v4 — Chest X-ray Screening (LoRA Adapter)

Fine-tuned Gemma 4 E4B-it (8B params) for automated chest X-ray pathology detection. Screens 5 conditions simultaneously with validated clinical accuracy, generates WHO-compliant treatment protocols, and outputs in 140+ languages natively.

This repo contains the LoRA adapter weights (660MB). For the merged full-precision model ready for direct inference, see j4rias/medvision-edge-v4-merged.

Resource	Link
Live Demo	HuggingFace Space
Merged Model	j4rias/medvision-edge-v4-merged
Source Code	GitHub
Video	YouTube (3 min)

Model Details

Model Description

MedVision Edge is an AI-powered chest X-ray screening system designed for underserved communities where 2.2 billion people lack access to medical imaging (WHO, 2023). A community health worker photographs a chest X-ray with any smartphone and receives:

Pathology detection for 5 conditions screened simultaneously
WHO IMCI clinical protocols with evidence-based treatment guidelines (deterministic, zero hallucination)
Weight-based drug dosing from verified lookup tables
Referral urgency assessment with color-coded triage
Native language output in 140+ languages via Gemma 4's built-in multilingual capability

The model was fine-tuned using Unsloth QLoRA on ~23,000 training examples derived from the NIH ChestX-ray14 dataset (112,120 images, 30,805 patients), with oversampling and augmentation for rare pathologies.

Developed by: Joel Arias (@j4rias)
Model type: Vision-Language Model (LoRA adapter for Gemma 4 E4B-it)
Language(s): 140+ languages (Gemma 4 native multilingual)
License: Apache 2.0
Fine-tuned from: unsloth/gemma-4-e4b-it-unsloth-bnb-4bit (Google Gemma 4 E4B-it)

Model Sources

Repository: github.com/J4rias/medvision-edge
Demo: huggingface.co/spaces/j4rias/MedVision-Edge

Uses

Direct Use

Load the adapter with Unsloth for local inference on chest X-ray images:

from unsloth import FastVisionModel
from PIL import Image

# Load model + adapter
model, processor = FastVisionModel.from_pretrained(
    "j4rias/medvision-edge-v4",
    load_in_4bit=True,
)
FastVisionModel.for_inference(model)

# Prepare input
image = Image.open("chest_xray.jpg").convert("RGB")
messages = [
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Analyze this chest X-ray for: Pneumonia, Consolidation, Cardiomegaly, Pleural Effusion, Pulmonary Edema. For each: state YES or NO, then describe findings."},
    ]}
]

inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to("cuda")

output = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
print(processor.decode(output[0], skip_special_tokens=True))

Important: To load this adapter, use FastVisionModel.from_pretrained() from Unsloth. Do not use PeftModel.from_pretrained() — it is incompatible with Gemma 4's Gemma4ClippableLinear layers.

For inference without Unsloth (e.g., on HuggingFace Spaces with ZeroGPU), use the merged model instead: j4rias/medvision-edge-v4-merged.

Downstream Use

Offline clinics: Deploy via Ollama or llama.cpp on consumer hardware (text reasoning; vision requires transformers)
Telemedicine platforms: Integrate via Gradio API or transformers pipeline
Research: Baseline for chest X-ray screening in low-resource settings

Out-of-Scope Use

Not a diagnostic tool. This is an AI screening assistant. All findings must be confirmed by a qualified medical professional.
Not validated for: CT scans, MRI, ultrasound, or non-chest radiographs.
Not intended for: Autonomous clinical decision-making without human oversight.

Bias, Risks, and Limitations

Dataset bias: Trained on NIH ChestX-ray14, which over-represents US hospital populations. Performance may vary on radiographs from different demographics, equipment, or imaging protocols.
Label noise: NIH labels are NLP-extracted from radiology reports (~15-20% estimated error rate), not radiologist-annotated. This limits ceiling performance, especially for Pneumonia and Consolidation.
False positives: The model tends to over-detect Pneumonia (382 FP / 1103 test) and Consolidation (375 FP / 1103 test). In clinical use, this means unnecessary referrals rather than missed diagnoses.
Pneumonia detection is weak: AUC 0.617 on NIH, 0.501 on CheXpert (only 11 positives = insufficient statistical power). Active development.
Single-view only: Trained on frontal (PA/AP) chest X-rays. Lateral views not supported.
Vision via GGUF not supported: The GGUF export does not include the vision encoder (mmproj). Image analysis requires the transformers library.

Recommendations

Always use with clinical oversight — this is a screening aid, not a replacement for radiologists.
Review false positives carefully before clinical action.
For Pneumonia specifically, treat model output as low-confidence and prioritize clinical judgment.
Validate on your target population before deployment.

Training Details

Training Data

Source: NIH ChestX-ray14 (112,120 frontal chest X-rays, 30,805 patients, CC0/Public Domain)
Pathologies trained: Pneumonia, Consolidation, Cardiomegaly, Pleural Effusion, Pulmonary Edema
Training split: ~23,000 examples (from 8,821 base images with oversampling + augmentation)
- 5x oversampling for Pneumonia and Consolidation (rare positives)
- 3x oversampling for Cardiomegaly
- Augmentation: brightness, contrast, rotation
Label format: Conversation-style (image + structured YES/NO per pathology with radiological descriptions)
Response length: Short (~80-120 tokens per response)

Training Procedure

Preprocessing

Images resized and normalized per Gemma 4 processor defaults
Conversation format with 5 varied prompt templates per pathology
Dataset v5: oversampled + augmented, balanced for rare positives

Training Hyperparameters

Parameter	Value
LoRA rank (r)	64
LoRA alpha	64
LoRA dropout	0
Target modules	all-linear
Vision layers fine-tuned	Yes
Language layers fine-tuned	Yes
Epochs	2
Learning rate	1e-4
LR scheduler	cosine
Warmup ratio	0.1
Batch size	1
Gradient accumulation	8
Max sequence length	1024
Optimizer	adamw_8bit
Weight decay	0.01
Max grad norm	0.3
Precision	4-bit (QLoRA via Unsloth)
Training regime	bf16 mixed precision

Speeds, Sizes, Times

Training time: 4 hours 27 minutes (~16,000 seconds)
Steps: ~5,800
Speed: ~2.9 samples/sec
Hardware: NVIDIA RTX 5070 Ti (16GB VRAM)
Peak VRAM: ~10.7 GB
Final loss: ~0.089 (avg 0.2009)
Trainable parameters: ~82M / 8B total (1.02%)
Adapter size: 660 MB

Evaluation

Testing Data, Factors & Metrics

Testing Data

NIH ChestX-ray14 held-out test set: 1,103 images with NLP-extracted ground truth labels
CheXpert gold standard: 500 images annotated by 5 board-certified radiologists (Stanford)

Metrics

AUC (Area Under ROC Curve): Primary metric, threshold-independent discrimination ability
Sensitivity (Recall): Proportion of true positives correctly identified
Specificity: Proportion of true negatives correctly identified
Accuracy: Overall correct classification rate

Results

NIH Test Set (N=1,103 held-out images)

Pathology	Base AUC	Fine-tuned AUC	Improvement	Sensitivity	Specificity
Cardiomegaly	0.490	0.832	+70%	0.826	0.838
Pulm. Edema	0.688	0.753	+9%	0.833	0.673
Pleural Effusion	0.605	0.703	+16%	0.680	0.725
Pneumonia	0.519	0.617	+19%	0.636	0.599
Consolidation	0.599	0.627	+5%	0.684	0.570

3/5 pathologies exceed AUC 0.70. All 5 improved vs. baseline Gemma 4.

CheXpert Gold Standard (N=500, 5-radiologist consensus, Stanford)

Pathology	AUC	Sensitivity	Specificity
Pleural Effusion	0.797	0.952	0.641
Cardiomegaly	0.723	0.656	0.791
Consolidation	0.667	0.897	0.437
Pulm. Edema	0.668	0.500	0.837
Pneumonia*	0.501	0.636	0.366

*Pneumonia: only 11 positives (2.2% prevalence) in CheXpert test set — insufficient statistical power.

Highlight: Pleural Effusion sensitivity of 95.2% on CheXpert — catches 95 out of 100 cases.

Summary

This model demonstrates that fine-tuning Gemma 4 E4B on real clinical images produces genuine visual understanding (not text memorization). The base model scored near-random (AUC ~0.50) on Cardiomegaly; after fine-tuning, it achieves 0.832 — a 70% improvement validated on independent test sets.

Environmental Impact

Hardware: 1x NVIDIA RTX 5070 Ti (16GB, consumer GPU)
Total GPU hours: ~43 hours (training 18.7h + evaluation 22.4h + misc 2h)
Training-only hours: 4.4 hours (v4 final run)
Cloud Provider: None (local workstation)
Total project cost: < $25 (electricity only)
Carbon Emitted: Estimated ~4.3 kg CO2eq (based on Colombia grid factor ~0.1 kg CO2/kWh, RTX 5070 Ti TDP 300W)

Technical Specifications

Model Architecture and Objective

Base model: Google Gemma 4 E4B-it (8B parameters with 4.5B effective, vision-language)
Fine-tuning method: QLoRA via Unsloth (4-bit quantized base + low-rank adapters)
LoRA rank: 64 on all linear layers (vision + language + attention + MLP)
Context length: 128K tokens (inherited from Gemma 4)
Objective: Supervised fine-tuning (SFT) on chest X-ray analysis conversations

Compute Infrastructure

Hardware

NVIDIA RTX 5070 Ti 16GB (local workstation)
64GB system RAM
Arch Linux

Software

Python 3.14
PyTorch 2.10.0+cu128
Unsloth (latest)
Transformers >= 4.45.0
TRL (SFTTrainer)
PEFT 0.19.1

Training Iterations

This model is the result of 6 training iterations:

Version	Key Change	Best AUC	Outcome
v1	Simple labels, 1 epoch	~0.50	Random — text memorization
v2	Rich labels, 1 epoch	~0.50	Parser broken, same problem
v3	Short responses, 3 epochs, 3x oversample	0.787	First real learning
v4	+2 epochs from v3	0.807	Overfit, worse overall
v5	r=64, 5x oversample, augmentation	0.832	Best model
v6	RSNA clean labels	0.823	Did not improve — locked v5

Each failure taught us something: long responses dilute gradient signal, low LoRA rank lacks capacity, and clean labels from a different distribution can hurt rather than help.

Citation

BibTeX:

@misc{arias2026medvisionedge,
  title={MedVision Edge: AI Radiology for Everyone},
  author={Arias, Joel},
  year={2026},
  howpublished={\url{https://huggingface.co/j4rias/medvision-edge-v4}},
  note={Fine-tuned Gemma 4 E4B for chest X-ray screening. Gemma 4 Good Hackathon submission.}
}

Acknowledgements

Google for the Gemma 4 model family and the Gemma 4 Good Hackathon
Unsloth for efficient QLoRA fine-tuning of vision-language models
NIH Clinical Center for the ChestX-ray14 dataset (CC0)
Stanford AIMI for the CheXpert gold-standard test set
WHO for the IMCI clinical protocols

Framework Versions

PEFT 0.19.1
Transformers >= 4.45.0
TRL (latest)
Unsloth (latest)
PyTorch 2.10.0+cu128

Downloads last month: 4