DermatoLlama-full / README.md

Update README.md

4ad7e6e verified 28 days ago

3.91 kB

	# Asset from the SCALEMED Framework

	This model/dataset is an asset released as part of the SCALEMED framework, a project focused on developing scalable and resource-efficient medical AI assistants.

	## Project Overview

	The models, known as DermatoLlama, were trained on versions of the DermaSynth dataset, which was also generated using the SCALEMED pipeline.

	For a complete overview of the project, including all related models, datasets, and the source code, please visit our main Hugging Face organization page and GitHub repositories: <br>
	[https://huggingface.co/DermaVLM](https://huggingface.co/DermaVLM) <br>
	[https://github.com/DermaVLM](https://github.com/DermaVLM) <br>

	## Requirements and Our Test System
	transformers==4.57.1 <br>
	accelerate==1.8.1 <br>
	pillow==11.0.0 <br>
	peft==0.16.0 <br>
	torch==2.7.1+cu126 <br>
	torchaudio==2.7.1+cu126 <br>
	torchvision==0.22.1+cu126 <br>
	python==3.11.13 <br>

	CUDA: 12.6 <br>
	Driver Version 560.94 <br>
	GPU: 1xRTX4090 <br>

	## Usage

	```python
	# %%
	from transformers import MllamaForConditionalGeneration, AutoProcessor
	from peft import PeftModel
	import torch
	from PIL import Image

	# Load base model
	base_model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct"
	model = MllamaForConditionalGeneration.from_pretrained(
	base_model_name, torch_dtype=torch.bfloat16, device_map="auto"
	)
	processor = AutoProcessor.from_pretrained(base_model_name)

	# Load LoRA adapter
	adapter_path = "DermaVLM/DermatoLLama-full"
	model = PeftModel.from_pretrained(model, adapter_path)
	# %%
	# Load image using Pillow
	image_path = rf"IMAGE_LOCATION" # Replace with your image path
	image = Image.open(image_path)

	prompt_text = "Analyze the dermatological condition shown in the image and provide a detailed report including body location."
	messages = []
	content_list = []

	# Add the image to the content
	if image:
	content_list.append({"type": "image"})

	# Add the text part of the prompt
	content_list.append({"type": "text", "text": prompt_text})
	messages.append({"role": "user", "content": content_list})

	input_text = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=False,
	)

	# Prepare final inputs with the loaded image
	inputs = processor(
	images=image,
	text=input_text,
	add_special_tokens=False,
	return_tensors="pt",
	).to(model.device)

	generation_config = {
	"max_new_tokens": 512, # be careful with this, it can cause very long inference times
	"do_sample": True,
	"temperature": 0.4,
	"top_p": 0.95,
	}

	input_length = inputs.input_ids.shape[1]

	print(f"Processing image: {image_path}")
	print(f"Image size: {image.size}")
	print("Generating response...")

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	**generation_config,
	pad_token_id=(
	processor.tokenizer.pad_token_id
	if processor.tokenizer.pad_token_id is not None
	else processor.tokenizer.eos_token_id
	),
	)
	generated_tokens = outputs[0][input_length:]
	raw_output = processor.decode(generated_tokens, skip_special_tokens=True)

	print("\n" + "="*50)
	print("DERMATOLOGY ANALYSIS:")
	print("="*50)
	print(raw_output)
	print("="*50)
	```

	## Citation

	If you use this model, dataset, or any other asset from our work in your research, we kindly ask that you please cite our preprint:

	```bibtex
	@article {Yilmaz2025-DermatoLlama-VLM,
	author = {Yilmaz, Abdurrahim and Yuceyalcin, Furkan and Varol, Rahmetullah and Gokyayla, Ece and Erdem, Ozan and Choi, Donghee and Demircali, Ali Anil and Gencoglan, Gulsum and Posma, Joram M. and Temelkuran, Burak},
	title = {Resource-efficient medical vision language model for dermatology via a synthetic data generation framework},
	year = {2025},
	doi = {10.1101/2025.05.17.25327785},
	url = {https://www.medrxiv.org/content/early/2025/07/30/2025.05.17.25327785},
	journal = {medRxiv}
	}
	```