Skip to content

hkust-nlp/Vision4Chart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

On the Perception Bottleneck of VLMs for Chart Understanding

Welcome! This repository accompanies the paper On the Perception Bottleneck of VLMs for Chart Understanding.

What is included in this project?

This repository provides implementations for training and evaluating CLIP and LLaVA models on chart understanding tasks. Specifically, it includes:

  • CLIP Training: Training scripts for CLIP with and without hard negative captions.
  • CLIP Evaluation: Code for evaluating CLIP on various chart-related datasets.
  • LLaVA Training: Training scripts for LLaVA-13B and LLaVA-Phi.
  • LLaVA Evaluation: Evaluation scripts for LLaVA on multiple chart benchmarks.
  • CLIP Learning Data: Data from CLIP contrastive learning on Chart Tasks.

Environment Setup

Detailed instructions for setting up the environment are provided in config_env.md.

CLIP Training

We utilize the open_clip repository for CLIP training. The source code is available in the open_clip directory.

Example training script: example_scripts/train_openclip.sh.

For NegCLIP training, we build upon the neg_clip repository, modifying it to support multi-GPU training. The modified code is in the neg_clip directory.

Example NegCLIP training script: example_scripts/train_negclip.sh.

CLIP Evaluation

The evaluation code for CLIP is located in the eval_clip directory.

Example evaluation script: example_scripts/eval_clip.sh.

LLaVA Training

We train two types of LLaVA models:

  • LLaVA-v1.5-13B: Uses Vicuna-13B as the language model.
  • LLaVA-Phi: Uses Phi-3-mini-4k-instruct as the language model.

LLaVA-v1.5-13B training is based on the LLaVA repository, while LLaVA-Phi training is based on the LLaVA-pp repository. Additionally, we enable unfreezing vision encoder tuning.

Example training full llava script: example_scripts/train_full_llava.sh.

LLaVA Evaluation

LLaVA is evaluated on multiple chart-related benchmarks.

For FigureQA, DVQA, PlotQA, ChartQA, ChartBench, and ChartX, evaluation scripts are provided in: example_scripts/eval_llava.sh.

For MathVista, evaluation scripts are provided in: example_scripts/eval_mathvista.sh.

Released Resources

Dataset

Model Link
ChartCLIP 🤗

Model

Dataset Link
Vision4Chart 🤗

Citation

If you find this work helpful, please kindly cite as:

@misc{liu2025perceptionbottleneckvlmschart,
      title={On the Perception Bottleneck of VLMs for Chart Understanding}, 
      author={Junteng Liu and Weihao Zeng and Xiwen Zhang and Yijun Wang and Zifei Shan and Junxian He},
      year={2025},
      eprint={2503.18435},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.18435}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published