Awaliuddin
/

unsloth_finetune

text-generation-inference

Model card Files Files and versions

unsloth_finetune / README.md

Awaliuddin's picture

Update README.md

2442126 verified 10 months ago

|

history blame contribute delete

3.34 kB

	---
	base_model: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- mllama
	license: apache-2.0
	language:
	- en
	---

	# Fine-tuned Vision-Language Model for Radiology Report Generation

	This repository contains a fine-tuned vision-language model for generating radiology reports. It's based on the [Unsloth](https://github.com/unslothai/unsloth) library and utilizes the Llama-3.2-11B-Vision-Instruct model as a base.

	## Model Description

	This model is fine-tuned on a sampled version of the ROCO radiography dataset ([Radiology_mini](https://huggingface.co/datasets/unsloth/Radiology_mini)). It's designed to assist medical professionals by providing accurate descriptions of medical images, such as X-rays, CT scans, and ultrasounds.

	The fine-tuning process uses Low-Rank Adaptation (LoRA) to efficiently train the model, focusing on the language layers while keeping the vision layers frozen. This approach minimizes the computational resources required for fine-tuning while achieving significant performance improvements.

	## Usage

	To use this model, you'll need the Unsloth library:

	```bash
	pip install unsloth
	```

	Then, you can load the model and tokenizer:

	```python
	from unsloth import FastVisionModel

	model, tokenizer = FastVisionModel.from_pretrained("awaliuddin/unsloth_finetune", load_in_4bit=True)
	FastVisionModel.for_inference(model)
	```

	```python
	from PIL import Image

	image = Image.open("path/to/your/image.jpg") # Replace with your image path
	instruction = "You are an expert radiographer. Describe accurately what you see in this image."
	messages = [ {"role": "user", "content": [ {"type": "image"}, {"type": "text", "text": instruction} ]} ]

	input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True) inputs = tokenizer(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")

	from transformers import TextStreamer

	text_streamer = TextStreamer(tokenizer, skip_prompt=True) _ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True, temperature=1.5, min_p=0.1)
	```

	## Training Details

	* Base Model: Llama-3.2-11B-Vision-Instruct
	* Dataset: Radiology_mini (sampled from ROCO radiography dataset)
	* Fine-tuning Method: LoRA (language layers only)
	* Optimizer: AdamW 8-bit
	* Learning Rate: 2e-4

	## Limitations

	* This model is trained on a limited dataset and might not generalize well to all types of medical images.
	* The generated reports should be reviewed by qualified medical professionals before being used for diagnostic purposes.

	## Acknowledgements

	* The Unsloth library for efficient fine-tuning of vision-language models.
	* The Hugging Face team for providing the platform and tools for model sharing.
	* The authors of the ROCO radiography dataset.

	## License

	[Apache-2.0 License]

	# Uploaded finetuned model

	- Developed by: Awaliuddin
	- License: apache-2.0
	- Finetuned from model : unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit

	This mllama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)