Upload README.md with huggingface_hub

3e8ca0e verified 4 months ago

4.37 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-VL-8B-Instruct
	tags:
	- qwen3-vl
	- vision-language
	- lora
	- fine-tuned
	library_name: peft
	---

	# qwen3vl-8b-lora

	This is a LoRA adapter fine-tuned on top of [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct).

	## Model Description

	This model is a fine-tuned version of Qwen3-VL-8B-Instruct using LoRA (Low-Rank Adaptation) for efficient training.
	The adapter weights can be merged with the base model for inference.

	## Training Details

	### Base Model
	- Model: Qwen/Qwen3-VL-8B-Instruct
	- Architecture: Vision-Language Model (VLM)

	### LoRA Configuration
	- Rank (r): 64
	- Alpha: 128
	- Dropout: 0.05
	- Target Modules: q_proj, k_proj, v_proj, o_proj
	- Task Type: Causal Language Modeling

	### Training Hyperparameters
	- Learning Rate: 1e-5
	- Batch Size: 4 (per device)
	- Gradient Accumulation Steps: 4
	- Epochs: 2
	- Optimizer: AdamW
	- Weight Decay: 0
	- Warmup Ratio: 0.03
	- LR Scheduler: Cosine
	- Max Gradient Norm: 1.0
	- Model Max Length: 40960
	- Max Pixels: 250880
	- Min Pixels: 784

	### Training Infrastructure
	- Framework: PyTorch + DeepSpeed (ZeRO Stage 2)
	- Precision: BF16
	- Gradient Checkpointing: Enabled

	## Usage

	### Requirements

	```bash
	pip install transformers peft torch pillow qwen-vl-utils
	```

	### Loading the Model

	```python
	from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
	from peft import PeftModel
	import torch

	# Load base model
	base_model = Qwen2VLForConditionalGeneration.from_pretrained(
	"Qwen/Qwen3-VL-8B-Instruct",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# Load LoRA adapter
	model = PeftModel.from_pretrained(
	base_model,
	"openhay/qwen3vl-8b-lora",
	torch_dtype=torch.bfloat16
	)

	# Load processor
	processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-8B-Instruct")
	```

	### Inference Example

	```python
	from qwen_vl_utils import process_vision_info
	from PIL import Image

	# Prepare messages
	messages = [
	{
	"role": "user",
	"content": [
	{"type": "image", "image": "path/to/image.jpg"},
	{"type": "text", "text": "Describe this image in detail."},
	],
	}
	]

	# Prepare for inference
	text = processor.apply_chat_template(
	messages, tokenize=False, add_generation_prompt=True
	)
	image_inputs, video_inputs = process_vision_info(messages)
	inputs = processor(
	text=[text],
	images=image_inputs,
	videos=video_inputs,
	padding=True,
	return_tensors="pt",
	)
	inputs = inputs.to("cuda")

	# Generate
	with torch.no_grad():
	generated_ids = model.generate(**inputs, max_new_tokens=512)

	generated_ids_trimmed = [
	out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
	]
	output_text = processor.batch_decode(
	generated_ids_trimmed,
	skip_special_tokens=True,
	clean_up_tokenization_spaces=False
	)

	print(output_text[0])
	```

	### Merging LoRA Weights (Optional)

	If you want to merge the LoRA weights into the base model for faster inference:

	```python
	from transformers import Qwen2VLForConditionalGeneration
	from peft import PeftModel

	# Load base model and adapter
	base_model = Qwen2VLForConditionalGeneration.from_pretrained(
	"Qwen/Qwen3-VL-8B-Instruct",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	model = PeftModel.from_pretrained(base_model, "openhay/qwen3vl-8b-lora")

	# Merge and save
	merged_model = model.merge_and_unload()
	merged_model.save_pretrained("./merged_model")
	```

	## Limitations

	- This model inherits all limitations from the base Qwen3-VL-8B-Instruct model
	- Performance depends on the quality and domain of the fine-tuning dataset
	- LoRA adapters may not capture all nuances that full fine-tuning would achieve

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{qwen3vl_8b_lora,
	author = {OpenHay},
	title = {qwen3vl-8b-lora},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/openhay/qwen3vl-8b-lora}}
	}
	```

	## Acknowledgements

	- Base model: [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) by Alibaba Cloud
	- Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) or similar
	- LoRA implementation: [PEFT](https://github.com/huggingface/peft) by Hugging Face