Cosmobillian commited on
Commit
78e36a9
·
verified ·
1 Parent(s): ee35039

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +128 -21
README.md CHANGED
@@ -1,21 +1,128 @@
1
- ---
2
- base_model: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - mllama
8
- license: apache-2.0
9
- language:
10
- - en
11
- ---
12
-
13
- # Uploaded finetuned model
14
-
15
- - **Developed by:** Cosmobillian
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
18
-
19
- This mllama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
-
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Radiologist Llama (`Cosmobillian/radiologist_llama`)
2
+
3
+ `Radiologist Llama` is a high-performance, multimodal large language model based on `unsloth/Llama-3.2-11B-Vision-Instruct`, fine-tuned to generate radiology reports from chest X-ray (CXR) images. This model is trained to analyze a given X-ray image and produce findings and impressions in text format, mimicking the expertise of a radiologist.
4
+
5
+ The training process was accelerated using the **Unsloth** library, which enabled training to be completed **2x faster** and with significantly less VRAM consumption compared to standard fine-tuning methods.
6
+
7
+ ## 🚀 Key Features
8
+
9
+ - **Specialization:** Radiology, specifically the analysis and reporting of chest X-rays.
10
+ - **Base Model:** Built on the powerful `Llama-3.2-11B-Vision-Instruct`.
11
+ - **Dataset:** Fine-tuned on tens of thousands of images and reports from the `itsanmolgupta/mimic-cxr-dataset` available on Hugging Face.
12
+ - **Efficient Training:** Utilized the 4-bit QLoRA (Quantized Low-Rank Adaptation) technique with Unsloth to efficiently fine-tune both the vision and language layers of the model.
13
+ - **Ready to Use:** The model is saved with its LoRA adapters merged into `float16` format, allowing for direct, high-performance inference with libraries such as VLLM.
14
+
15
+ ## 🔧 Model Architecture and Training Details
16
+
17
+ The development of this model followed these steps:
18
+
19
+ 1. **Model Loading:** The `unsloth/Llama-3.2-11B-Vision-Instruct` model was loaded in **4-bit** precision to significantly reduce memory usage.
20
+ 2. **PEFT (LoRA) Integration:** **LoRA (Low-Rank Adaptation)** adapters were added to both the vision encoder and the language decoder layers of the model. This approach avoids training all the parameters of the massive model, instead focusing on the small and manageable adapters, which speeds up the process and enhances resource efficiency.
21
+ - `r = 16`
22
+ - `lora_alpha = 32`
23
+ - `lora_dropout = 0.05`
24
+ 3. **Dataset Preparation:** Each sample from the `mimic-cxr-dataset` was converted into a conversational format:
25
+ - **User:** The X-ray image + the instruction: `"You are an expert radiographer. Describe accurately what you see in this image."`
26
+ - **Assistant:** The text from the `impression` or `findings` section of the corresponding radiology report.
27
+ 4. **Training:** The model was trained for 1 epoch on 30,633 prepared samples using the `SFTTrainer` from the `trl` library. The data processing pipeline was optimized with Unsloth's custom `UnslothVisionDataCollator`.
28
+
29
+ ### Training Hyperparameters
30
+
31
+ | Parameter | Value |
32
+ | :-------------------------- | :--------- |
33
+ | **Learning Rate** | `1e-4` |
34
+ | **Number of Epochs** | `1` |
35
+ | **Batch Size (per device)** | `2` |
36
+ | **Gradient Accumulation Steps** | `8` |
37
+ | **Effective Batch Size** | `16` |
38
+ | **Optimizer** | `adamw_8bit` |
39
+ | **LR Scheduler** | `linear` |
40
+ | **Warmup Steps** | `5` |
41
+ | **Weight Decay** | `0.01` |
42
+ | **Max Sequence Length** | `2048` |
43
+
44
+ ## 👨‍💻 How to Use (Inference)
45
+
46
+ Generating a report for a chest X-ray image using this model is straightforward.
47
+
48
+ ### 1. Install Necessary Libraries
49
+
50
+ ```bash
51
+ pip install "unsloth[colab-new] @ git+[https://github.com/unslothai/unsloth.git](https://github.com/unslothai/unsloth.git)"
52
+ pip install --no-deps transformers trl peft accelerate bitsandbytes
53
+ pip install Pillow # For image processing
54
+ ```
55
+
56
+ ### 2. Run Inference with Python
57
+
58
+ The following code snippet demonstrates how to load the model and generate a report from an image.
59
+
60
+ ```python
61
+ from unsloth import FastVisionModel
62
+ from transformers import TextStreamer
63
+ from PIL import Image
64
+ import torch
65
+
66
+ # Load the model and tokenizer in 16-bit (float16)
67
+ # If you have less VRAM, you can use load_in_4bit=True
68
+ model, tokenizer = FastVisionModel.from_pretrained(
69
+ "Cosmobillian/radiologist_llama",
70
+ dtype=torch.float16,
71
+ load_in_4bit=False, # False is ideal since the model was saved in 16-bit
72
+ )
73
+
74
+ # Prepare the model for inference
75
+ FastVisionModel.for_inference(model)
76
+
77
+ # Load your image (specify the path to your own X-ray image)
78
+ try:
79
+ image = Image.open("path/to/your/xray.jpg")
80
+ except FileNotFoundError:
81
+ print("Please provide a valid file path instead of 'path/to/your/xray.jpg'.")
82
+ # Creating a blank image as a placeholder
83
+ image = Image.new('RGB', (512, 512), 'black')
84
+
85
+
86
+ # The instruction format the model was trained on
87
+ instruction = "You are an expert radiographer. Describe accurately what you see in this image."
88
+
89
+ # Format the messages according to the chat template
90
+ messages = [
91
+ {"role": "user", "content": [
92
+ {"type": "image"},
93
+ {"type": "text", "text": instruction}
94
+ ]}
95
+ ]
96
+
97
+ # Prepare the inputs with the tokenizer
98
+ input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
99
+ inputs = tokenizer(
100
+ image,
101
+ input_text,
102
+ add_special_tokens=False, # Already present in the template
103
+ return_tensors="pt",
104
+ ).to("cuda")
105
+
106
+ # Use TextStreamer for real-time output
107
+ text_streamer = TextStreamer(tokenizer, skip_prompt=True)
108
+
109
+ print("Model is generating the report...\n---")
110
+
111
+ # Run the model and stream the output
112
+ _ = model.generate(
113
+ **inputs,
114
+ streamer=text_streamer,
115
+ max_new_tokens=256 # Maximum number of tokens to generate
116
+ )
117
+ ```
118
+
119
+ ## ⚠️ Disclaimer and Limitations
120
+
121
+ - **Not Medical Advice:** This model was developed for **research and experimental purposes only**. The text it generates **MUST NOT** be considered a real medical diagnosis or a substitute for the professional judgment of a qualified radiologist.
122
+ - **Not for Clinical Use:** The model's outputs should not be used as a basis for patient diagnosis, treatment, or any clinical decision-making process. It may produce incorrect or incomplete information.
123
+ - **Dataset Limitations:** The model's knowledge is limited to the information contained in the `MIMIC-CXR` dataset. It may not be able to accurately report on rare conditions, artifacts, or different imaging protocols not present in the dataset. Furthermore, the model may have inherited biases present in the training data.
124
+ - **No Guarantees:** No guarantees are made regarding the accuracy, consistency, or reliability of the model's outputs.
125
+
126
+ ## Author
127
+
128
+ This model was developed by **Cosmobillian** using the Unsloth and Hugging Face ecosystems.