LHPKAI commited on
Commit
3e8ca0e
·
verified ·
1 Parent(s): e979c12

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +178 -0
README.md ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3-VL-8B-Instruct
4
+ tags:
5
+ - qwen3-vl
6
+ - vision-language
7
+ - lora
8
+ - fine-tuned
9
+ library_name: peft
10
+ ---
11
+
12
+ # qwen3vl-8b-lora
13
+
14
+ This is a LoRA adapter fine-tuned on top of [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct).
15
+
16
+ ## Model Description
17
+
18
+ This model is a fine-tuned version of Qwen3-VL-8B-Instruct using LoRA (Low-Rank Adaptation) for efficient training.
19
+ The adapter weights can be merged with the base model for inference.
20
+
21
+ ## Training Details
22
+
23
+ ### Base Model
24
+ - **Model:** Qwen/Qwen3-VL-8B-Instruct
25
+ - **Architecture:** Vision-Language Model (VLM)
26
+
27
+ ### LoRA Configuration
28
+ - **Rank (r):** 64
29
+ - **Alpha:** 128
30
+ - **Dropout:** 0.05
31
+ - **Target Modules:** q_proj, k_proj, v_proj, o_proj
32
+ - **Task Type:** Causal Language Modeling
33
+
34
+ ### Training Hyperparameters
35
+ - **Learning Rate:** 1e-5
36
+ - **Batch Size:** 4 (per device)
37
+ - **Gradient Accumulation Steps:** 4
38
+ - **Epochs:** 2
39
+ - **Optimizer:** AdamW
40
+ - **Weight Decay:** 0
41
+ - **Warmup Ratio:** 0.03
42
+ - **LR Scheduler:** Cosine
43
+ - **Max Gradient Norm:** 1.0
44
+ - **Model Max Length:** 40960
45
+ - **Max Pixels:** 250880
46
+ - **Min Pixels:** 784
47
+
48
+ ### Training Infrastructure
49
+ - **Framework:** PyTorch + DeepSpeed (ZeRO Stage 2)
50
+ - **Precision:** BF16
51
+ - **Gradient Checkpointing:** Enabled
52
+
53
+ ## Usage
54
+
55
+ ### Requirements
56
+
57
+ ```bash
58
+ pip install transformers peft torch pillow qwen-vl-utils
59
+ ```
60
+
61
+ ### Loading the Model
62
+
63
+ ```python
64
+ from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
65
+ from peft import PeftModel
66
+ import torch
67
+
68
+ # Load base model
69
+ base_model = Qwen2VLForConditionalGeneration.from_pretrained(
70
+ "Qwen/Qwen3-VL-8B-Instruct",
71
+ torch_dtype=torch.bfloat16,
72
+ device_map="auto"
73
+ )
74
+
75
+ # Load LoRA adapter
76
+ model = PeftModel.from_pretrained(
77
+ base_model,
78
+ "openhay/qwen3vl-8b-lora",
79
+ torch_dtype=torch.bfloat16
80
+ )
81
+
82
+ # Load processor
83
+ processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-8B-Instruct")
84
+ ```
85
+
86
+ ### Inference Example
87
+
88
+ ```python
89
+ from qwen_vl_utils import process_vision_info
90
+ from PIL import Image
91
+
92
+ # Prepare messages
93
+ messages = [
94
+ {
95
+ "role": "user",
96
+ "content": [
97
+ {"type": "image", "image": "path/to/image.jpg"},
98
+ {"type": "text", "text": "Describe this image in detail."},
99
+ ],
100
+ }
101
+ ]
102
+
103
+ # Prepare for inference
104
+ text = processor.apply_chat_template(
105
+ messages, tokenize=False, add_generation_prompt=True
106
+ )
107
+ image_inputs, video_inputs = process_vision_info(messages)
108
+ inputs = processor(
109
+ text=[text],
110
+ images=image_inputs,
111
+ videos=video_inputs,
112
+ padding=True,
113
+ return_tensors="pt",
114
+ )
115
+ inputs = inputs.to("cuda")
116
+
117
+ # Generate
118
+ with torch.no_grad():
119
+ generated_ids = model.generate(**inputs, max_new_tokens=512)
120
+
121
+ generated_ids_trimmed = [
122
+ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
123
+ ]
124
+ output_text = processor.batch_decode(
125
+ generated_ids_trimmed,
126
+ skip_special_tokens=True,
127
+ clean_up_tokenization_spaces=False
128
+ )
129
+
130
+ print(output_text[0])
131
+ ```
132
+
133
+ ### Merging LoRA Weights (Optional)
134
+
135
+ If you want to merge the LoRA weights into the base model for faster inference:
136
+
137
+ ```python
138
+ from transformers import Qwen2VLForConditionalGeneration
139
+ from peft import PeftModel
140
+
141
+ # Load base model and adapter
142
+ base_model = Qwen2VLForConditionalGeneration.from_pretrained(
143
+ "Qwen/Qwen3-VL-8B-Instruct",
144
+ torch_dtype=torch.bfloat16,
145
+ device_map="auto"
146
+ )
147
+ model = PeftModel.from_pretrained(base_model, "openhay/qwen3vl-8b-lora")
148
+
149
+ # Merge and save
150
+ merged_model = model.merge_and_unload()
151
+ merged_model.save_pretrained("./merged_model")
152
+ ```
153
+
154
+ ## Limitations
155
+
156
+ - This model inherits all limitations from the base Qwen3-VL-8B-Instruct model
157
+ - Performance depends on the quality and domain of the fine-tuning dataset
158
+ - LoRA adapters may not capture all nuances that full fine-tuning would achieve
159
+
160
+ ## Citation
161
+
162
+ If you use this model, please cite:
163
+
164
+ ```bibtex
165
+ @misc{qwen3vl_8b_lora,
166
+ author = {OpenHay},
167
+ title = {qwen3vl-8b-lora},
168
+ year = {2025},
169
+ publisher = {HuggingFace},
170
+ howpublished = {\url{https://huggingface.co/openhay/qwen3vl-8b-lora}}
171
+ }
172
+ ```
173
+
174
+ ## Acknowledgements
175
+
176
+ - Base model: [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) by Alibaba Cloud
177
+ - Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) or similar
178
+ - LoRA implementation: [PEFT](https://github.com/huggingface/peft) by Hugging Face