| # Asset from the SCALEMED Framework | |
| This model/dataset is an asset released as part of the **SCALEMED** framework, a project focused on developing scalable and resource-efficient medical AI assistants. | |
| ## Project Overview | |
| The models, known as **DermatoLlama**, were trained on versions of the **DermaSynth** dataset, which was also generated using the SCALEMED pipeline. | |
| For a complete overview of the project, including all related models, datasets, and the source code, please visit our main Hugging Face organization page and GitHub repositories: <br> | |
| **[https://huggingface.co/DermaVLM](https://huggingface.co/DermaVLM)** <br> | |
| **[https://github.com/DermaVLM](https://github.com/DermaVLM)** <br> | |
| ## Requirements and Our Test System | |
| transformers==4.57.1 <br> | |
| accelerate==1.8.1 <br> | |
| pillow==11.0.0 <br> | |
| peft==0.16.0 <br> | |
| torch==2.7.1+cu126 <br> | |
| torchaudio==2.7.1+cu126 <br> | |
| torchvision==0.22.1+cu126 <br> | |
| python==3.11.13 <br> | |
| CUDA: 12.6 <br> | |
| Driver Version 560.94 <br> | |
| GPU: 1xRTX4090 <br> | |
| ## Usage | |
| ```python | |
| # %% | |
| from transformers import MllamaForConditionalGeneration, AutoProcessor | |
| from peft import PeftModel | |
| import torch | |
| from PIL import Image | |
| # Load base model | |
| base_model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct" | |
| model = MllamaForConditionalGeneration.from_pretrained( | |
| base_model_name, torch_dtype=torch.bfloat16, device_map="auto" | |
| ) | |
| processor = AutoProcessor.from_pretrained(base_model_name) | |
| # Load LoRA adapter | |
| adapter_path = "DermaVLM/DermatoLLama-full" | |
| model = PeftModel.from_pretrained(model, adapter_path) | |
| # %% | |
| # Load image using Pillow | |
| image_path = rf"IMAGE_LOCATION" # Replace with your image path | |
| image = Image.open(image_path) | |
| prompt_text = "Analyze the dermatological condition shown in the image and provide a detailed report including body location." | |
| messages = [] | |
| content_list = [] | |
| # Add the image to the content | |
| if image: | |
| content_list.append({"type": "image"}) | |
| # Add the text part of the prompt | |
| content_list.append({"type": "text", "text": prompt_text}) | |
| messages.append({"role": "user", "content": content_list}) | |
| input_text = processor.apply_chat_template( | |
| messages, | |
| add_generation_prompt=True, | |
| tokenize=False, | |
| ) | |
| # Prepare final inputs with the loaded image | |
| inputs = processor( | |
| images=image, | |
| text=input_text, | |
| add_special_tokens=False, | |
| return_tensors="pt", | |
| ).to(model.device) | |
| generation_config = { | |
| "max_new_tokens": 512, # be careful with this, it can cause very long inference times | |
| "do_sample": True, | |
| "temperature": 0.4, | |
| "top_p": 0.95, | |
| } | |
| input_length = inputs.input_ids.shape[1] | |
| print(f"Processing image: {image_path}") | |
| print(f"Image size: {image.size}") | |
| print("Generating response...") | |
| with torch.no_grad(): | |
| outputs = model.generate( | |
| **inputs, | |
| **generation_config, | |
| pad_token_id=( | |
| processor.tokenizer.pad_token_id | |
| if processor.tokenizer.pad_token_id is not None | |
| else processor.tokenizer.eos_token_id | |
| ), | |
| ) | |
| generated_tokens = outputs[0][input_length:] | |
| raw_output = processor.decode(generated_tokens, skip_special_tokens=True) | |
| print("\n" + "="*50) | |
| print("DERMATOLOGY ANALYSIS:") | |
| print("="*50) | |
| print(raw_output) | |
| print("="*50) | |
| ``` | |
| ## Citation | |
| If you use this model, dataset, or any other asset from our work in your research, we kindly ask that you please cite our preprint: | |
| ```bibtex | |
| @article {Yilmaz2025-DermatoLlama-VLM, | |
| author = {Yilmaz, Abdurrahim and Yuceyalcin, Furkan and Varol, Rahmetullah and Gokyayla, Ece and Erdem, Ozan and Choi, Donghee and Demircali, Ali Anil and Gencoglan, Gulsum and Posma, Joram M. and Temelkuran, Burak}, | |
| title = {Resource-efficient medical vision language model for dermatology via a synthetic data generation framework}, | |
| year = {2025}, | |
| doi = {10.1101/2025.05.17.25327785}, | |
| url = {https://www.medrxiv.org/content/early/2025/07/30/2025.05.17.25327785}, | |
| journal = {medRxiv} | |
| } | |
| ``` | |