On the Perception Bottleneck of VLMs for Chart Understanding
Paper • 2503.18435 • Published • 1
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Junteng/Chart_CLIP", dtype="auto")This repository contains the CLIP model implementation from our paper "On the Perception Bottleneck of VLMs for Chart Understanding".
Code: https://github.com/hkust-nlp/Vision4Chart
This CLIP model is specifically trained to address the perception bottleneck in Vision Language Models (VLMs) when processing and understanding charts and visualizations. Our work explores and aims to improve how CLIP effect its LVLMs.
If you find this model useful in your research, please consider citing our paper:
@misc{liu2025perceptionbottleneckvlmschart,
title={On the Perception Bottleneck of VLMs for Chart Understanding},
author={Junteng Liu and Weihao Zeng and Xiwen Zhang and Yijun Wang and Zifei Shan and Junxian He},
year={2025},
eprint={2503.18435},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.18435},
}
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-feature-extraction", model="Junteng/Chart_CLIP")