File size: 3,905 Bytes
29b7aff
ea0c655
29b7aff
ea0c655
29b7aff
 
 
 
06c3d6a
 
 
ea0c655
02637dd
06c3d6a
 
f34a0fa
06c3d6a
 
 
 
 
 
 
 
 
02637dd
ea0c655
 
 
4ad7e6e
ea0c655
 
4ad7e6e
f10c5f8
ea0c655
 
4ad7e6e
 
 
 
ea0c655
 
 
 
 
4ad7e6e
 
 
 
f10c5f8
4ad7e6e
f10c5f8
 
4ad7e6e
 
f10c5f8
 
 
 
 
 
 
 
 
 
 
 
 
4ad7e6e
f10c5f8
 
 
 
 
 
 
 
4ad7e6e
f10c5f8
 
 
 
 
 
 
4ad7e6e
 
 
 
f10c5f8
 
 
 
 
 
 
 
 
 
 
 
 
4ad7e6e
 
 
f10c5f8
4ad7e6e
ea0c655
29b7aff
 
 
 
 
 
6b4ef08
29b7aff
6b4ef08
 
29b7aff
6b4ef08
 
29b7aff
6b4ef08
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# Asset from the SCALEMED Framework

This model/dataset is an asset released as part of the **SCALEMED** framework, a project focused on developing scalable and resource-efficient medical AI assistants.

## Project Overview

The models, known as **DermatoLlama**, were trained on versions of the **DermaSynth** dataset, which was also generated using the SCALEMED pipeline.

For a complete overview of the project, including all related models, datasets, and the source code, please visit our main Hugging Face organization page and GitHub repositories: <br>
**[https://huggingface.co/DermaVLM](https://huggingface.co/DermaVLM)** <br>
**[https://github.com/DermaVLM](https://github.com/DermaVLM)** <br>

## Requirements and Our Test System
transformers==4.57.1 <br>
accelerate==1.8.1 <br>
pillow==11.0.0 <br>
peft==0.16.0 <br>
torch==2.7.1+cu126 <br>
torchaudio==2.7.1+cu126 <br>
torchvision==0.22.1+cu126 <br>
python==3.11.13 <br>

CUDA: 12.6 <br>
Driver Version 560.94 <br>
GPU: 1xRTX4090 <br>

## Usage

```python
# %%
from transformers import MllamaForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image

# Load base model
base_model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
    base_model_name, torch_dtype=torch.bfloat16, device_map="auto"
)
processor = AutoProcessor.from_pretrained(base_model_name)

# Load LoRA adapter
adapter_path = "DermaVLM/DermatoLLama-full"
model = PeftModel.from_pretrained(model, adapter_path)
# %%
# Load image using Pillow
image_path = rf"IMAGE_LOCATION"  # Replace with your image path
image = Image.open(image_path)

prompt_text = "Analyze the dermatological condition shown in the image and provide a detailed report including body location."
messages = []
content_list = []

# Add the image to the content
if image:
    content_list.append({"type": "image"})

# Add the text part of the prompt
content_list.append({"type": "text", "text": prompt_text})
messages.append({"role": "user", "content": content_list})

input_text = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=False,
)

# Prepare final inputs with the loaded image
inputs = processor(
    images=image,
    text=input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to(model.device)

generation_config = {
    "max_new_tokens": 512, # be careful with this, it can cause very long inference times
    "do_sample": True,
    "temperature": 0.4,
    "top_p": 0.95,
}

input_length = inputs.input_ids.shape[1]

print(f"Processing image: {image_path}")
print(f"Image size: {image.size}")
print("Generating response...")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        **generation_config,
        pad_token_id=(
            processor.tokenizer.pad_token_id
            if processor.tokenizer.pad_token_id is not None
            else processor.tokenizer.eos_token_id
        ),
    )
    generated_tokens = outputs[0][input_length:]
    raw_output = processor.decode(generated_tokens, skip_special_tokens=True)

print("\n" + "="*50)
print("DERMATOLOGY ANALYSIS:")
print("="*50)
print(raw_output)
print("="*50)
```

## Citation

If you use this model, dataset, or any other asset from our work in your research, we kindly ask that you please cite our preprint:

```bibtex
@article {Yilmaz2025-DermatoLlama-VLM,
	author = {Yilmaz, Abdurrahim and Yuceyalcin, Furkan and Varol, Rahmetullah and Gokyayla, Ece and Erdem, Ozan and Choi, Donghee and Demircali, Ali Anil and Gencoglan, Gulsum and Posma, Joram M. and Temelkuran, Burak},
	title = {Resource-efficient medical vision language model for dermatology via a synthetic data generation framework},
	year = {2025},
	doi = {10.1101/2025.05.17.25327785},
	url = {https://www.medrxiv.org/content/early/2025/07/30/2025.05.17.25327785},
	journal = {medRxiv}
}
```