On the Perception Bottleneck of VLMs for Chart Understanding

Welcome! This repository accompanies the paper On the Perception Bottleneck of VLMs for Chart Understanding.

What is included in this project?

This repository provides implementations for training and evaluating CLIP and LLaVA models on chart understanding tasks. Specifically, it includes:

CLIP Training: Training scripts for CLIP with and without hard negative captions.
CLIP Evaluation: Code for evaluating CLIP on various chart-related datasets.
LLaVA Training: Training scripts for LLaVA-13B and LLaVA-Phi.
LLaVA Evaluation: Evaluation scripts for LLaVA on multiple chart benchmarks.
CLIP Learning Data: Data from CLIP contrastive learning on Chart Tasks.

Environment Setup

Detailed instructions for setting up the environment are provided in config_env.md.

CLIP Training

We utilize the open_clip repository for CLIP training. The source code is available in the open_clip directory.

Example training script: example_scripts/train_openclip.sh.

For NegCLIP training, we build upon the neg_clip repository, modifying it to support multi-GPU training. The modified code is in the neg_clip directory.

Example NegCLIP training script: example_scripts/train_negclip.sh.

CLIP Evaluation

The evaluation code for CLIP is located in the eval_clip directory.

Example evaluation script: example_scripts/eval_clip.sh.

LLaVA Training

We train two types of LLaVA models:

LLaVA-v1.5-13B: Uses Vicuna-13B as the language model.
LLaVA-Phi: Uses Phi-3-mini-4k-instruct as the language model.

LLaVA-v1.5-13B training is based on the LLaVA repository, while LLaVA-Phi training is based on the LLaVA-pp repository. Additionally, we enable unfreezing vision encoder tuning.

Example training full llava script: example_scripts/train_full_llava.sh.

LLaVA Evaluation

LLaVA is evaluated on multiple chart-related benchmarks.

For FigureQA, DVQA, PlotQA, ChartQA, ChartBench, and ChartX, evaluation scripts are provided in: example_scripts/eval_llava.sh.

For MathVista, evaluation scripts are provided in: example_scripts/eval_mathvista.sh.

Released Resources

Dataset

Model	Link
ChartCLIP	🤗

Model

Dataset	Link
Vision4Chart	🤗

Citation

If you find this work helpful, please kindly cite as:

@misc{liu2025perceptionbottleneckvlmschart,
      title={On the Perception Bottleneck of VLMs for Chart Understanding}, 
      author={Junteng Liu and Weihao Zeng and Xiwen Zhang and Yijun Wang and Zifei Shan and Junxian He},
      year={2025},
      eprint={2503.18435},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.18435}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

On the Perception Bottleneck of VLMs for Chart Understanding

What is included in this project?

Environment Setup

CLIP Training

CLIP Evaluation

LLaVA Training

LLaVA Evaluation

Released Resources

Dataset

Model

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LLaVA-pp		LLaVA-pp
LLaVA		LLaVA
eval_clip		eval_clip
eval_llava		eval_llava
example_scripts		example_scripts
figs		figs
neg_clip		neg_clip
open_clip		open_clip
.gitignore		.gitignore
README.md		README.md
config_env.md		config_env.md

hkust-nlp/Vision4Chart

Folders and files

Latest commit

History

Repository files navigation

On the Perception Bottleneck of VLMs for Chart Understanding

What is included in this project?

Environment Setup

CLIP Training

CLIP Evaluation

LLaVA Training

LLaVA Evaluation

Released Resources

Dataset

Model

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages