---
license: apache-2.0
datasets:
- Dauka-transformers/Compact_VLM_filter_data
language:
- en
base_model:
- Qwen/Qwen2-VL-2B-Instruct
---
# Compact VLM Filter: Image-caption filtration-oriented Qwen2VL model

This model is a fine-tuned version of [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen-VL) trained to perform filtration-oriented image-text evaluation, based on our custom dataset.

## 🔍 Intended Use

The model is designed to:

- Evaluate alignment of image and caption
- Provide image/caption alignment scores and textual justification for noisy web-scale data
- Supports local deployment for cost-efficient training data filtration

## 🏋️ Training Details

- Base model: `Qwen/Qwen2-VL-2B-Instruct`
- Fine-tuning objective: in-context evaluation of aligment, quality and safety
- Dataset: ~4.8K samples with score, justification, caption, and image


## 🤝 Acknowledgements

Thanks to the [Qwen team](https://huggingface.co/Qwen/Qwen-VL) for open-sourcing their VLM models, which serve as the foundation for our filtration-oriented model.


## 📜 License

Licensed under the Apache License 2.0.