DaveKevin commited on
Commit
b42a7ea
·
verified ·
1 Parent(s): 6a9b1ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -3
README.md CHANGED
@@ -1,3 +1,116 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen2.5-VL-3B-Instruct
5
+ pipeline_tag: image-text-to-text
6
+ library_name: adapter-transformers
7
+ ---
8
+
9
+ # Inference with RZNV-1.5-3B-Instruct (PEFT Adapter)
10
+
11
+ This repository contains only the **Parameter-Efficient Fine-Tuning (PEFT) adapter weights** for the Qwen2.5-VL-3B-Instruct model. This approach keeps the model highly portable and lightweight for sharing!
12
+
13
+ ## Important Note: Adapter Loading Required
14
+
15
+ We experienced issues during development where using the standard `merge_and_unload()` function resulted in the model incorrectly reverting to the base model's original performance.
16
+
17
+ **Therefore, to access the fine-tuned performance, you MUST load the original base model first and then explicitly attach these adapter weights using the `peft` library, as demonstrated in the setup steps below.**
18
+
19
+ ---
20
+
21
+ ## Model and Adapter Details
22
+
23
+ | Detail | Value |
24
+ | :--- | :--- |
25
+ | **Base Model ID** | `Qwen/Qwen2.5-VL-3B-Instruct` |
26
+ | **Adapter Type** | PEFT (e.g., LoRA) |
27
+ | **Adapter Repository ID** | `phronetic-ai/RZNV-1.5-3B-Instruct` |
28
+
29
+ ---
30
+
31
+ ## Running Inference
32
+
33
+ ### Step 1: Installation
34
+
35
+ Ensure you have the necessary libraries installed, including `peft` and `transformers`.
36
+
37
+ ```bash
38
+ pip install transformers peft accelerate torch
39
+ # You may also need to install the Qwen-VL-specific utilities (qwen_vl_utils)
40
+ ```
41
+
42
+ ```python
43
+ import torch
44
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
45
+ from peft import PeftModel
46
+ from qwen_vl_utils import process_vision_info # Required for Qwen-VL multi-modal processing
47
+
48
+ # --- Define Paths ---
49
+ BASE_MODEL_ID = "Qwen/Qwen2.5-VL-3B-Instruct"
50
+ ADAPTER_REPO_ID = "phronetic-ai/RZNV-1.5-3B-Instruct"
51
+
52
+ # 1. Load the base model (Ensure you use the same precision/device_map as during training)
53
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
54
+ BASE_MODEL_ID,
55
+ torch_dtype="auto",
56
+ device_map="auto"
57
+ )
58
+
59
+ # Optional: Enable flash_attention_2 if your hardware supports it for better speed/memory
60
+ # model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
61
+ # BASE_MODEL_ID,
62
+ # torch_dtype=torch.bfloat16,
63
+ # attn_implementation="flash_attention_2",
64
+ # device_map="auto",
65
+ # )
66
+
67
+ # 2. Load the processor (Tokenizer + Feature Extractor) from the base model
68
+ processor = AutoProcessor.from_pretrained(BASE_MODEL_ID)
69
+
70
+ # 3. Load and attach the PEFT adapter weights! This is the most important step.
71
+ # The 'model' object is updated in-place to include the fine-tuned weights.
72
+ model = PeftModel.from_pretrained(model, ADAPTER_REPO_ID)
73
+ ```
74
+
75
+
76
+ ## Run Generation
77
+ ```python
78
+ # Example multi-modal input
79
+ messages = [
80
+ {
81
+ "role": "user",
82
+ "content": [
83
+ {
84
+ "type": "image",
85
+ "image": "[https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg)",
86
+ },
87
+ {"type": "text", "text": "Describe this image."},
88
+ ],
89
+ }
90
+ ]
91
+
92
+ # Preparation for inference
93
+ text = processor.apply_chat_template(
94
+ messages, tokenize=False, add_generation_prompt=True
95
+ )
96
+ image_inputs, video_inputs = process_vision_info(messages) # Qwen-VL specific
97
+ inputs = processor(
98
+ text=[text],
99
+ images=image_inputs,
100
+ videos=video_inputs,
101
+ padding=True,
102
+ return_tensors="pt",
103
+ )
104
+ inputs = inputs.to(model.device) # Move inputs to the model's device
105
+
106
+ # Inference: Generation of the output
107
+ generated_ids = model.generate(**inputs, max_new_tokens=128)
108
+ generated_ids_trimmed = [
109
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
110
+ ]
111
+ output_text = processor.batch_decode(
112
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
113
+ )
114
+
115
+ print(output_text)
116
+ ```