--- library_name: peft license: mit base_model: Qwen/Qwen2.5-VL-7B-Instruct datasets: - SinaLab/ImageEval2025Task2TrainDataset tags: - arabic - image-captioning - vision-language - lora - qwen2.5-vl - cultural-heritage language: - ar model-index: - name: arabic-image-captioning-qwen2.5vl results: [] --- # Arabic Image Captioning - Qwen2.5-VL Fine-tuned This model is a LoRA fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) for generating Arabic captions for images. ## Model Description This model was developed as part of the [Arabic Image Captioning Shared Task 2025](https://sina.birzeit.edu/image_eval2025/index.html). It generates natural Arabic captions for images with focus on historical and cultural content related to Palestinian heritage. please refer to the [training dataset](https://huggingface.co/datasets/SinaLab/ImageEval2025Task2TrainDataset) for more details. ## Usage ```python from transformers import Qwen2VLForConditionalGeneration, AutoProcessor from peft import PeftModel import torch from PIL import Image # Load base model and processor base_model = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "your-username/arabic-image-captioning-qwen2.5vl") # Process image and generate caption image = Image.open("your_image.jpg") prompt = "اكتب وصفاً مختصراً لهذه الصورة باللغة العربية" inputs = processor(images=image, text=prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=128) caption = processor.decode(outputs[0], skip_special_tokens=True) print(caption) ``` ## Training Details ### Dataset - **Training data**: Arabic image captions dataset from the shared task - **Languages**: Arabic (ar) - **Dataset size**: ~2,700 training images with Arabic captions ### Training Procedure - **Fine-tuning method**: LoRA (Low-Rank Adaptation) - **Training epochs**: 15 - **Learning rate**: 2e-05 - **Batch size**: 1 with gradient accumulation (effective batch size: 16) - **Optimizer**: AdamW with cosine learning rate scheduling - **Hardware**: NVIDIA A100 GPU - **Training time**: ~6 hours ### Framework Versions - PEFT 0.15.2 - Transformers 4.49.0 - PyTorch 2.4.1+cu121 ## Contact For questions or support: - abashiti@birzeit.edu - aaljabari@birzeit.edu - hhamoud@dohainstitute.edu.qa