Instructions to use j4rias/medvision-edge-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use j4rias/medvision-edge-v4 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-e4b-it-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "j4rias/medvision-edge-v4") - Transformers
How to use j4rias/medvision-edge-v4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="j4rias/medvision-edge-v4") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("j4rias/medvision-edge-v4", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use j4rias/medvision-edge-v4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "j4rias/medvision-edge-v4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "j4rias/medvision-edge-v4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/j4rias/medvision-edge-v4
- SGLang
How to use j4rias/medvision-edge-v4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "j4rias/medvision-edge-v4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "j4rias/medvision-edge-v4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "j4rias/medvision-edge-v4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "j4rias/medvision-edge-v4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio
How to use j4rias/medvision-edge-v4 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for j4rias/medvision-edge-v4 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for j4rias/medvision-edge-v4 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for j4rias/medvision-edge-v4 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="j4rias/medvision-edge-v4", max_seq_length=2048, ) - Docker Model Runner
How to use j4rias/medvision-edge-v4 with Docker Model Runner:
docker model run hf.co/j4rias/medvision-edge-v4
MedVision Edge v4 โ Chest X-ray Screening (LoRA Adapter)
Fine-tuned Gemma 4 E4B-it (8B params) for automated chest X-ray pathology detection. Screens 5 conditions simultaneously with validated clinical accuracy, generates WHO-compliant treatment protocols, and outputs in 140+ languages natively.
This repo contains the LoRA adapter weights (660MB). For the merged full-precision model ready for direct inference, see j4rias/medvision-edge-v4-merged.
| Resource | Link |
|---|---|
| Live Demo | HuggingFace Space |
| Merged Model | j4rias/medvision-edge-v4-merged |
| Source Code | GitHub |
| Video | YouTube (3 min) |
Model Details
Model Description
MedVision Edge is an AI-powered chest X-ray screening system designed for underserved communities where 2.2 billion people lack access to medical imaging (WHO, 2023). A community health worker photographs a chest X-ray with any smartphone and receives:
- Pathology detection for 5 conditions screened simultaneously
- WHO IMCI clinical protocols with evidence-based treatment guidelines (deterministic, zero hallucination)
- Weight-based drug dosing from verified lookup tables
- Referral urgency assessment with color-coded triage
- Native language output in 140+ languages via Gemma 4's built-in multilingual capability
The model was fine-tuned using Unsloth QLoRA on ~23,000 training examples derived from the NIH ChestX-ray14 dataset (112,120 images, 30,805 patients), with oversampling and augmentation for rare pathologies.
- Developed by: Joel Arias (@j4rias)
- Model type: Vision-Language Model (LoRA adapter for Gemma 4 E4B-it)
- Language(s): 140+ languages (Gemma 4 native multilingual)
- License: Apache 2.0
- Fine-tuned from: unsloth/gemma-4-e4b-it-unsloth-bnb-4bit (Google Gemma 4 E4B-it)
Model Sources
- Repository: github.com/J4rias/medvision-edge
- Demo: huggingface.co/spaces/j4rias/MedVision-Edge
Uses
Direct Use
Load the adapter with Unsloth for local inference on chest X-ray images:
from unsloth import FastVisionModel
from PIL import Image
# Load model + adapter
model, processor = FastVisionModel.from_pretrained(
"j4rias/medvision-edge-v4",
load_in_4bit=True,
)
FastVisionModel.for_inference(model)
# Prepare input
image = Image.open("chest_xray.jpg").convert("RGB")
messages = [
{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": "Analyze this chest X-ray for: Pneumonia, Consolidation, Cardiomegaly, Pleural Effusion, Pulmonary Edema. For each: state YES or NO, then describe findings."},
]}
]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to("cuda")
output = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
print(processor.decode(output[0], skip_special_tokens=True))
Important: To load this adapter, use FastVisionModel.from_pretrained() from Unsloth. Do not use PeftModel.from_pretrained() โ it is incompatible with Gemma 4's Gemma4ClippableLinear layers.
For inference without Unsloth (e.g., on HuggingFace Spaces with ZeroGPU), use the merged model instead: j4rias/medvision-edge-v4-merged.
Downstream Use
- Offline clinics: Deploy via Ollama or llama.cpp on consumer hardware (text reasoning; vision requires transformers)
- Telemedicine platforms: Integrate via Gradio API or transformers pipeline
- Research: Baseline for chest X-ray screening in low-resource settings
Out-of-Scope Use
- Not a diagnostic tool. This is an AI screening assistant. All findings must be confirmed by a qualified medical professional.
- Not validated for: CT scans, MRI, ultrasound, or non-chest radiographs.
- Not intended for: Autonomous clinical decision-making without human oversight.
Bias, Risks, and Limitations
- Dataset bias: Trained on NIH ChestX-ray14, which over-represents US hospital populations. Performance may vary on radiographs from different demographics, equipment, or imaging protocols.
- Label noise: NIH labels are NLP-extracted from radiology reports (~15-20% estimated error rate), not radiologist-annotated. This limits ceiling performance, especially for Pneumonia and Consolidation.
- False positives: The model tends to over-detect Pneumonia (382 FP / 1103 test) and Consolidation (375 FP / 1103 test). In clinical use, this means unnecessary referrals rather than missed diagnoses.
- Pneumonia detection is weak: AUC 0.617 on NIH, 0.501 on CheXpert (only 11 positives = insufficient statistical power). Active development.
- Single-view only: Trained on frontal (PA/AP) chest X-rays. Lateral views not supported.
- Vision via GGUF not supported: The GGUF export does not include the vision encoder (mmproj). Image analysis requires the transformers library.
Recommendations
- Always use with clinical oversight โ this is a screening aid, not a replacement for radiologists.
- Review false positives carefully before clinical action.
- For Pneumonia specifically, treat model output as low-confidence and prioritize clinical judgment.
- Validate on your target population before deployment.
Training Details
Training Data
- Source: NIH ChestX-ray14 (112,120 frontal chest X-rays, 30,805 patients, CC0/Public Domain)
- Pathologies trained: Pneumonia, Consolidation, Cardiomegaly, Pleural Effusion, Pulmonary Edema
- Training split: ~23,000 examples (from 8,821 base images with oversampling + augmentation)
- 5x oversampling for Pneumonia and Consolidation (rare positives)
- 3x oversampling for Cardiomegaly
- Augmentation: brightness, contrast, rotation
- Label format: Conversation-style (image + structured YES/NO per pathology with radiological descriptions)
- Response length: Short (~80-120 tokens per response)
Training Procedure
Preprocessing
- Images resized and normalized per Gemma 4 processor defaults
- Conversation format with 5 varied prompt templates per pathology
- Dataset v5: oversampled + augmented, balanced for rare positives
Training Hyperparameters
| Parameter | Value |
|---|---|
| LoRA rank (r) | 64 |
| LoRA alpha | 64 |
| LoRA dropout | 0 |
| Target modules | all-linear |
| Vision layers fine-tuned | Yes |
| Language layers fine-tuned | Yes |
| Epochs | 2 |
| Learning rate | 1e-4 |
| LR scheduler | cosine |
| Warmup ratio | 0.1 |
| Batch size | 1 |
| Gradient accumulation | 8 |
| Max sequence length | 1024 |
| Optimizer | adamw_8bit |
| Weight decay | 0.01 |
| Max grad norm | 0.3 |
| Precision | 4-bit (QLoRA via Unsloth) |
| Training regime | bf16 mixed precision |
Speeds, Sizes, Times
- Training time: 4 hours 27 minutes (~16,000 seconds)
- Steps: ~5,800
- Speed: ~2.9 samples/sec
- Hardware: NVIDIA RTX 5070 Ti (16GB VRAM)
- Peak VRAM: ~10.7 GB
- Final loss: ~0.089 (avg 0.2009)
- Trainable parameters: ~82M / 8B total (1.02%)
- Adapter size: 660 MB
Evaluation
Testing Data, Factors & Metrics
Testing Data
- NIH ChestX-ray14 held-out test set: 1,103 images with NLP-extracted ground truth labels
- CheXpert gold standard: 500 images annotated by 5 board-certified radiologists (Stanford)
Metrics
- AUC (Area Under ROC Curve): Primary metric, threshold-independent discrimination ability
- Sensitivity (Recall): Proportion of true positives correctly identified
- Specificity: Proportion of true negatives correctly identified
- Accuracy: Overall correct classification rate
Results
NIH Test Set (N=1,103 held-out images)
| Pathology | Base AUC | Fine-tuned AUC | Improvement | Sensitivity | Specificity |
|---|---|---|---|---|---|
| Cardiomegaly | 0.490 | 0.832 | +70% | 0.826 | 0.838 |
| Pulm. Edema | 0.688 | 0.753 | +9% | 0.833 | 0.673 |
| Pleural Effusion | 0.605 | 0.703 | +16% | 0.680 | 0.725 |
| Pneumonia | 0.519 | 0.617 | +19% | 0.636 | 0.599 |
| Consolidation | 0.599 | 0.627 | +5% | 0.684 | 0.570 |
3/5 pathologies exceed AUC 0.70. All 5 improved vs. baseline Gemma 4.
CheXpert Gold Standard (N=500, 5-radiologist consensus, Stanford)
| Pathology | AUC | Sensitivity | Specificity |
|---|---|---|---|
| Pleural Effusion | 0.797 | 0.952 | 0.641 |
| Cardiomegaly | 0.723 | 0.656 | 0.791 |
| Consolidation | 0.667 | 0.897 | 0.437 |
| Pulm. Edema | 0.668 | 0.500 | 0.837 |
| Pneumonia* | 0.501 | 0.636 | 0.366 |
*Pneumonia: only 11 positives (2.2% prevalence) in CheXpert test set โ insufficient statistical power.
Highlight: Pleural Effusion sensitivity of 95.2% on CheXpert โ catches 95 out of 100 cases.
Summary
This model demonstrates that fine-tuning Gemma 4 E4B on real clinical images produces genuine visual understanding (not text memorization). The base model scored near-random (AUC ~0.50) on Cardiomegaly; after fine-tuning, it achieves 0.832 โ a 70% improvement validated on independent test sets.
Environmental Impact
- Hardware: 1x NVIDIA RTX 5070 Ti (16GB, consumer GPU)
- Total GPU hours: ~43 hours (training 18.7h + evaluation 22.4h + misc 2h)
- Training-only hours: 4.4 hours (v4 final run)
- Cloud Provider: None (local workstation)
- Total project cost: < $25 (electricity only)
- Carbon Emitted: Estimated ~4.3 kg CO2eq (based on Colombia grid factor ~0.1 kg CO2/kWh, RTX 5070 Ti TDP 300W)
Technical Specifications
Model Architecture and Objective
- Base model: Google Gemma 4 E4B-it (8B parameters with 4.5B effective, vision-language)
- Fine-tuning method: QLoRA via Unsloth (4-bit quantized base + low-rank adapters)
- LoRA rank: 64 on all linear layers (vision + language + attention + MLP)
- Context length: 128K tokens (inherited from Gemma 4)
- Objective: Supervised fine-tuning (SFT) on chest X-ray analysis conversations
Compute Infrastructure
Hardware
- NVIDIA RTX 5070 Ti 16GB (local workstation)
- 64GB system RAM
- Arch Linux
Software
- Python 3.14
- PyTorch 2.10.0+cu128
- Unsloth (latest)
- Transformers >= 4.45.0
- TRL (SFTTrainer)
- PEFT 0.19.1
Training Iterations
This model is the result of 6 training iterations:
| Version | Key Change | Best AUC | Outcome |
|---|---|---|---|
| v1 | Simple labels, 1 epoch | ~0.50 | Random โ text memorization |
| v2 | Rich labels, 1 epoch | ~0.50 | Parser broken, same problem |
| v3 | Short responses, 3 epochs, 3x oversample | 0.787 | First real learning |
| v4 | +2 epochs from v3 | 0.807 | Overfit, worse overall |
| v5 | r=64, 5x oversample, augmentation | 0.832 | Best model |
| v6 | RSNA clean labels | 0.823 | Did not improve โ locked v5 |
Each failure taught us something: long responses dilute gradient signal, low LoRA rank lacks capacity, and clean labels from a different distribution can hurt rather than help.
Citation
BibTeX:
@misc{arias2026medvisionedge,
title={MedVision Edge: AI Radiology for Everyone},
author={Arias, Joel},
year={2026},
howpublished={\url{https://huggingface.co/j4rias/medvision-edge-v4}},
note={Fine-tuned Gemma 4 E4B for chest X-ray screening. Gemma 4 Good Hackathon submission.}
}
Acknowledgements
- Google for the Gemma 4 model family and the Gemma 4 Good Hackathon
- Unsloth for efficient QLoRA fine-tuning of vision-language models
- NIH Clinical Center for the ChestX-ray14 dataset (CC0)
- Stanford AIMI for the CheXpert gold-standard test set
- WHO for the IMCI clinical protocols
Framework Versions
- PEFT 0.19.1
- Transformers >= 4.45.0
- TRL (latest)
- Unsloth (latest)
- PyTorch 2.10.0+cu128
- Downloads last month
- 28