LHPKAI
/

final-model-all-datasets-EP3

+---
+license: apache-2.0
+base_model: Qwen/Qwen3-VL-8B-Instruct
+tags:
+- qwen3-vl
+- vision-language
+- lora
+- fine-tuned
+library_name: peft
+---
+# qwen3vl-8b-lora
+This is a LoRA adapter fine-tuned on top of [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct).
+## Model Description
+This model is a fine-tuned version of Qwen3-VL-8B-Instruct using LoRA (Low-Rank Adaptation) for efficient training.
+The adapter weights can be merged with the base model for inference.
+## Training Details
+### Base Model
+- **Model:** Qwen/Qwen3-VL-8B-Instruct
+- **Architecture:** Vision-Language Model (VLM)
+### LoRA Configuration
+- **Rank (r):** 64
+- **Alpha:** 128
+- **Dropout:** 0.05
+- **Target Modules:** q_proj, k_proj, v_proj, o_proj
+- **Task Type:** Causal Language Modeling
+### Training Hyperparameters
+- **Learning Rate:** 1e-5
+- **Batch Size:** 4 (per device)
+- **Gradient Accumulation Steps:** 4
+- **Epochs:** 2
+- **Optimizer:** AdamW
+- **Weight Decay:** 0
+- **Warmup Ratio:** 0.03
+- **LR Scheduler:** Cosine
+- **Max Gradient Norm:** 1.0
+- **Model Max Length:** 40960
+- **Max Pixels:** 250880
+- **Min Pixels:** 784
+### Training Infrastructure
+- **Framework:** PyTorch + DeepSpeed (ZeRO Stage 2)
+- **Precision:** BF16
+- **Gradient Checkpointing:** Enabled
+## Usage
+### Requirements
+```bash
+pip install transformers peft torch pillow qwen-vl-utils
+```
+### Loading the Model
+```python
+from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
+from peft import PeftModel
+import torch
+# Load base model
+base_model = Qwen2VLForConditionalGeneration.from_pretrained(
+    "Qwen/Qwen3-VL-8B-Instruct",
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+# Load LoRA adapter
+model = PeftModel.from_pretrained(
+    base_model,
+    "openhay/qwen3vl-8b-lora",
+    torch_dtype=torch.bfloat16
+)
+# Load processor
+processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-8B-Instruct")
+```
+### Inference Example
+```python
+from qwen_vl_utils import process_vision_info
+from PIL import Image
+# Prepare messages
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image", "image": "path/to/image.jpg"},
+            {"type": "text", "text": "Describe this image in detail."},
+        ],
+    }
+]
+# Prepare for inference
+text = processor.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True
+)
+image_inputs, video_inputs = process_vision_info(messages)
+inputs = processor(
+    text=[text],
+    images=image_inputs,
+    videos=video_inputs,
+    padding=True,
+    return_tensors="pt",
+)
+inputs = inputs.to("cuda")
+# Generate
+with torch.no_grad():
+    generated_ids = model.generate(**inputs, max_new_tokens=512)
+generated_ids_trimmed = [
+    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+]
+output_text = processor.batch_decode(
+    generated_ids_trimmed,
+    skip_special_tokens=True,
+    clean_up_tokenization_spaces=False
+)
+print(output_text[0])
+```
+### Merging LoRA Weights (Optional)
+If you want to merge the LoRA weights into the base model for faster inference:
+```python
+from transformers import Qwen2VLForConditionalGeneration
+from peft import PeftModel
+# Load base model and adapter
+base_model = Qwen2VLForConditionalGeneration.from_pretrained(
+    "Qwen/Qwen3-VL-8B-Instruct",
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+model = PeftModel.from_pretrained(base_model, "openhay/qwen3vl-8b-lora")
+# Merge and save
+merged_model = model.merge_and_unload()
+merged_model.save_pretrained("./merged_model")
+```
+## Limitations
+- This model inherits all limitations from the base Qwen3-VL-8B-Instruct model
+- Performance depends on the quality and domain of the fine-tuning dataset
+- LoRA adapters may not capture all nuances that full fine-tuning would achieve
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{qwen3vl_8b_lora,
+  author = {OpenHay},
+  title = {qwen3vl-8b-lora},
+  year = {2025},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/openhay/qwen3vl-8b-lora}}
+}
+```
+## Acknowledgements
+- Base model: [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) by Alibaba Cloud
+- Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) or similar
+- LoRA implementation: [PEFT](https://github.com/huggingface/peft) by Hugging Face