CLIP Model for Chart Understanding

This repository contains the CLIP model implementation from our paper "On the Perception Bottleneck of VLMs for Chart Understanding".

Code: https://github.com/hkust-nlp/Vision4Chart

Overview

This CLIP model is specifically trained to address the perception bottleneck in Vision Language Models (VLMs) when processing and understanding charts and visualizations. Our work explores and aims to improve how CLIP effect its LVLMs.

Model Details

Model architecture: trained from openai/clip-vit-large-patch14-336
Training data: from our collected and synthetic hard negatives chart data(Vision4Chart Dataset)
Training method: NegCLIP Training

Citation

If you find this model useful in your research, please consider citing our paper:

@misc{liu2025perceptionbottleneckvlmschart,
      title={On the Perception Bottleneck of VLMs for Chart Understanding}, 
      author={Junteng Liu and Weihao Zeng and Xiwen Zhang and Yijun Wang and Zifei Shan and Junxian He},
      year={2025},
      eprint={2503.18435},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.18435}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Junteng/Chart_CLIP

Paper for Junteng/Chart_CLIP

On the Perception Bottleneck of VLMs for Chart Understanding

Paper • 2503.18435 • Published Mar 24, 2025 • 1