File size: 3,905 Bytes
29b7aff ea0c655 29b7aff ea0c655 29b7aff 06c3d6a ea0c655 02637dd 06c3d6a f34a0fa 06c3d6a 02637dd ea0c655 4ad7e6e ea0c655 4ad7e6e f10c5f8 ea0c655 4ad7e6e ea0c655 4ad7e6e f10c5f8 4ad7e6e f10c5f8 4ad7e6e f10c5f8 4ad7e6e f10c5f8 4ad7e6e f10c5f8 4ad7e6e f10c5f8 4ad7e6e f10c5f8 4ad7e6e ea0c655 29b7aff 6b4ef08 29b7aff 6b4ef08 29b7aff 6b4ef08 29b7aff 6b4ef08 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
# Asset from the SCALEMED Framework
This model/dataset is an asset released as part of the **SCALEMED** framework, a project focused on developing scalable and resource-efficient medical AI assistants.
## Project Overview
The models, known as **DermatoLlama**, were trained on versions of the **DermaSynth** dataset, which was also generated using the SCALEMED pipeline.
For a complete overview of the project, including all related models, datasets, and the source code, please visit our main Hugging Face organization page and GitHub repositories: <br>
**[https://huggingface.co/DermaVLM](https://huggingface.co/DermaVLM)** <br>
**[https://github.com/DermaVLM](https://github.com/DermaVLM)** <br>
## Requirements and Our Test System
transformers==4.57.1 <br>
accelerate==1.8.1 <br>
pillow==11.0.0 <br>
peft==0.16.0 <br>
torch==2.7.1+cu126 <br>
torchaudio==2.7.1+cu126 <br>
torchvision==0.22.1+cu126 <br>
python==3.11.13 <br>
CUDA: 12.6 <br>
Driver Version 560.94 <br>
GPU: 1xRTX4090 <br>
## Usage
```python
# %%
from transformers import MllamaForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image
# Load base model
base_model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
base_model_name, torch_dtype=torch.bfloat16, device_map="auto"
)
processor = AutoProcessor.from_pretrained(base_model_name)
# Load LoRA adapter
adapter_path = "DermaVLM/DermatoLLama-full"
model = PeftModel.from_pretrained(model, adapter_path)
# %%
# Load image using Pillow
image_path = rf"IMAGE_LOCATION" # Replace with your image path
image = Image.open(image_path)
prompt_text = "Analyze the dermatological condition shown in the image and provide a detailed report including body location."
messages = []
content_list = []
# Add the image to the content
if image:
content_list.append({"type": "image"})
# Add the text part of the prompt
content_list.append({"type": "text", "text": prompt_text})
messages.append({"role": "user", "content": content_list})
input_text = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=False,
)
# Prepare final inputs with the loaded image
inputs = processor(
images=image,
text=input_text,
add_special_tokens=False,
return_tensors="pt",
).to(model.device)
generation_config = {
"max_new_tokens": 512, # be careful with this, it can cause very long inference times
"do_sample": True,
"temperature": 0.4,
"top_p": 0.95,
}
input_length = inputs.input_ids.shape[1]
print(f"Processing image: {image_path}")
print(f"Image size: {image.size}")
print("Generating response...")
with torch.no_grad():
outputs = model.generate(
**inputs,
**generation_config,
pad_token_id=(
processor.tokenizer.pad_token_id
if processor.tokenizer.pad_token_id is not None
else processor.tokenizer.eos_token_id
),
)
generated_tokens = outputs[0][input_length:]
raw_output = processor.decode(generated_tokens, skip_special_tokens=True)
print("\n" + "="*50)
print("DERMATOLOGY ANALYSIS:")
print("="*50)
print(raw_output)
print("="*50)
```
## Citation
If you use this model, dataset, or any other asset from our work in your research, we kindly ask that you please cite our preprint:
```bibtex
@article {Yilmaz2025-DermatoLlama-VLM,
author = {Yilmaz, Abdurrahim and Yuceyalcin, Furkan and Varol, Rahmetullah and Gokyayla, Ece and Erdem, Ozan and Choi, Donghee and Demircali, Ali Anil and Gencoglan, Gulsum and Posma, Joram M. and Temelkuran, Burak},
title = {Resource-efficient medical vision language model for dermatology via a synthetic data generation framework},
year = {2025},
doi = {10.1101/2025.05.17.25327785},
url = {https://www.medrxiv.org/content/early/2025/07/30/2025.05.17.25327785},
journal = {medRxiv}
}
```
|