--- license: apache-2.0 datasets: - Dauka-transformers/Compact_VLM_filter_data language: - en base_model: - Qwen/Qwen2-VL-2B-Instruct --- # Qwen2VL Fine-Tuned for Filtration Tasks This model is a fine-tuned version of [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen-VL) trained to perform filtration-oriented image-text evaluation, based on our custom dataset. ## 🔍 Intended Use The model is designed to: - Evaluate alignment of image and caption - Provide justification scores for noisy web-scale data - Support local deployment for cost-efficient filtering ## 🏋️ Training Details - Base model: `Qwen/Qwen2-VL-2B-Instruct` - Fine-tuning objective: in-context scoring + justification - Dataset: ~4.8K samples with score, justification, text, and image ## 📁 Files - `model.safetensors` – fine-tuned weights - `processor` – image and text processor - `README.md` – this card ## 🤝 Acknowledgements Thanks to the [Qwen team](https://huggingface.co/Qwen/Qwen-VL) for open-sourcing their VLM models, which serve as the foundation for our filtration-oriented model. ## 📜 License Licensed under the Apache License 2.0.