phronetic-ai
/

RZNV-1.5-3B-Instruct

+---
+license: apache-2.0
+base_model:
+- Qwen/Qwen2.5-VL-3B-Instruct
+pipeline_tag: image-text-to-text
+library_name: adapter-transformers
+---
+# Inference with RZNV-1.5-3B-Instruct (PEFT Adapter)
+This repository contains only the **Parameter-Efficient Fine-Tuning (PEFT) adapter weights** for the Qwen2.5-VL-3B-Instruct model. This approach keeps the model highly portable and lightweight for sharing!
+## Important Note: Adapter Loading Required
+We experienced issues during development where using the standard `merge_and_unload()` function resulted in the model incorrectly reverting to the base model's original performance.
+**Therefore, to access the fine-tuned performance, you MUST load the original base model first and then explicitly attach these adapter weights using the `peft` library, as demonstrated in the setup steps below.**
+---
+## Model and Adapter Details
+| Detail | Value |
+| :--- | :--- |
+| **Base Model ID** | `Qwen/Qwen2.5-VL-3B-Instruct` |
+| **Adapter Type** | PEFT (e.g., LoRA) |
+| **Adapter Repository ID** | `phronetic-ai/RZNV-1.5-3B-Instruct` |
+---
+## Running Inference
+### Step 1: Installation
+Ensure you have the necessary libraries installed, including `peft` and `transformers`.
+```bash
+pip install transformers peft accelerate torch
+# You may also need to install the Qwen-VL-specific utilities (qwen_vl_utils)
+```
+```python
+import torch
+from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
+from peft import PeftModel
+from qwen_vl_utils import process_vision_info # Required for Qwen-VL multi-modal processing
+# --- Define Paths ---
+BASE_MODEL_ID = "Qwen/Qwen2.5-VL-3B-Instruct"
+ADAPTER_REPO_ID = "phronetic-ai/RZNV-1.5-3B-Instruct"
+# 1. Load the base model (Ensure you use the same precision/device_map as during training)
+model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
+    BASE_MODEL_ID,
+    torch_dtype="auto",
+    device_map="auto"
+)
+# Optional: Enable flash_attention_2 if your hardware supports it for better speed/memory
+# model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
+#     BASE_MODEL_ID,
+#     torch_dtype=torch.bfloat16,
+#     attn_implementation="flash_attention_2",
+#     device_map="auto",
+# )
+# 2. Load the processor (Tokenizer + Feature Extractor) from the base model
+processor = AutoProcessor.from_pretrained(BASE_MODEL_ID)
+# 3. Load and attach the PEFT adapter weights! This is the most important step.
+# The 'model' object is updated in-place to include the fine-tuned weights.
+model = PeftModel.from_pretrained(model, ADAPTER_REPO_ID)
+```
+## Run Generation
+```python
+# Example multi-modal input
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "image",
+                "image": "[https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg)",
+            },
+            {"type": "text", "text": "Describe this image."},
+        ],
+    }
+]
+# Preparation for inference
+text = processor.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True
+)
+image_inputs, video_inputs = process_vision_info(messages) # Qwen-VL specific
+inputs = processor(
+    text=[text],
+    images=image_inputs,
+    videos=video_inputs,
+    padding=True,
+    return_tensors="pt",
+)
+inputs = inputs.to(model.device) # Move inputs to the model's device
+# Inference: Generation of the output
+generated_ids = model.generate(**inputs, max_new_tokens=128)
+generated_ids_trimmed = [
+    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+]
+output_text = processor.batch_decode(
+    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
+)
+print(output_text)
+```