unsloth_finetune / README.md
Awaliuddin's picture
Update README.md
2442126 verified
---
base_model: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- mllama
license: apache-2.0
language:
- en
---
# Fine-tuned Vision-Language Model for Radiology Report Generation
This repository contains a fine-tuned vision-language model for generating radiology reports. It's based on the [Unsloth](https://github.com/unslothai/unsloth) library and utilizes the Llama-3.2-11B-Vision-Instruct model as a base.
## Model Description
This model is fine-tuned on a sampled version of the ROCO radiography dataset ([Radiology_mini](https://huggingface.co/datasets/unsloth/Radiology_mini)). It's designed to assist medical professionals by providing accurate descriptions of medical images, such as X-rays, CT scans, and ultrasounds.
The fine-tuning process uses Low-Rank Adaptation (LoRA) to efficiently train the model, focusing on the language layers while keeping the vision layers frozen. This approach minimizes the computational resources required for fine-tuning while achieving significant performance improvements.
## Usage
To use this model, you'll need the Unsloth library:
```bash
pip install unsloth
```
Then, you can load the model and tokenizer:
```python
from unsloth import FastVisionModel
model, tokenizer = FastVisionModel.from_pretrained("awaliuddin/unsloth_finetune", load_in_4bit=True)
FastVisionModel.for_inference(model)
```
```python
from PIL import Image
image = Image.open("path/to/your/image.jpg") # Replace with your image path
instruction = "You are an expert radiographer. Describe accurately what you see in this image."
messages = [ {"role": "user", "content": [ {"type": "image"}, {"type": "text", "text": instruction} ]} ]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True) inputs = tokenizer(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True) _ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True, temperature=1.5, min_p=0.1)
```
## Training Details
* **Base Model:** Llama-3.2-11B-Vision-Instruct
* **Dataset:** Radiology_mini (sampled from ROCO radiography dataset)
* **Fine-tuning Method:** LoRA (language layers only)
* **Optimizer:** AdamW 8-bit
* **Learning Rate:** 2e-4
## Limitations
* This model is trained on a limited dataset and might not generalize well to all types of medical images.
* The generated reports should be reviewed by qualified medical professionals before being used for diagnostic purposes.
## Acknowledgements
* The Unsloth library for efficient fine-tuning of vision-language models.
* The Hugging Face team for providing the platform and tools for model sharing.
* The authors of the ROCO radiography dataset.
## License
[Apache-2.0 License]
# Uploaded finetuned model
- **Developed by:** Awaliuddin
- **License:** apache-2.0
- **Finetuned from model :** unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
This mllama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)