metadata
license: apache-2.0
datasets:
- Dauka-transformers/Compact_VLM_filter_data
language:
- en
base_model:
- Qwen/Qwen2-VL-2B-Instruct
Qwen2VL Fine-Tuned for Filtration Tasks
This model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct trained to perform filtration-oriented image-text evaluation, based on our custom dataset.
π Intended Use
The model is designed to:
- Evaluate alignment of image and caption
- Provide justification scores for noisy web-scale data
- Support local deployment for cost-efficient filtering
ποΈ Training Details
- Base model:
Qwen/Qwen2-VL-2B-Instruct - Fine-tuning objective: in-context scoring + justification
- Dataset: ~4.8K samples with score, justification, text, and image
π Files
model.safetensorsβ fine-tuned weightsprocessorβ image and text processorREADME.mdβ this card
π€ Acknowledgements
Thanks to the Qwen team for open-sourcing their VLM models, which serve as the foundation for our filtration-oriented model.
π License
Licensed under the Apache License 2.0.