File size: 7,875 Bytes
8d65bb9
 
 
 
 
 
 
f9a9f5d
8d65bb9
f9a9f5d
8d65bb9
f9a9f5d
8d65bb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9a9f5d
 
8d65bb9
 
 
 
f9a9f5d
8d65bb9
f9a9f5d
 
 
 
 
8d65bb9
 
 
 
f9a9f5d
 
 
 
8d65bb9
f9a9f5d
8d65bb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9a9f5d
 
8d65bb9
 
 
 
f9a9f5d
 
 
8d65bb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9a9f5d
 
8d65bb9
f9a9f5d
8d65bb9
 
f9a9f5d
 
 
 
 
 
 
 
 
8d65bb9
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
pipeline_tag: image-text-to-text
library_name: transformers
license: cc-by-nc-4.0
tags:
  - code-generation
---

# ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation (ACL 2025 Main)

[![🤗 Dataset (HuggingFace)](https://img.shields.io/badge/Dataset-HuggingFace-FFD21E.svg?logo=huggingface&logoColor=yellow)](https://huggingface.co/datasets/xxxllz/Chart2Code-160k)  [![🤖 Dataset (ModelScope)](https://img.shields.io/badge/Dataset-ModelScope-00A0E9.svg)](https://modelscope.cn/datasets/Noct25/Chart2Code-160k)  [![🤗 Model (HuggingFace)](https://img.shields.io/badge/Model-HuggingFace-FFD21E.svg?logo=huggingface&logoColor=yellow)](https://huggingface.co/xxxllz/ChartCoder) [![📑 Paper (arXiv:2501.06598)](https://img.shields.io/badge/arXiv-2501.06598-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.06598) [![GitHub Repo](https://img.shields.io/badge/GitHub-Repo-181717.svg?logo=github)](https://github.com/thunlp/ChartCoder)

This repository is the official implementation of [ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation](https://arxiv.org/abs/2501.06598).

## About ChartCoder

ChartCoder is the first dedicated multimodal large language model (MLLM) designed for **chart-to-code generation**. It leverages Code LLMs as its language backbone to significantly enhance the executability of generated code. This model addresses two key challenges in chart interpretation:

1.  **Low executability and poor detail restoration** in generated code from existing MLLMs.
2.  **Lack of large-scale and diverse training data** for chart-to-code tasks.

To overcome these, ChartCoder introduces:
-   **Chart2Code-160k**: The first large-scale and diverse dataset for chart-to-code generation.
-   **Snippet-of-Thought (SoT)**: A method that transforms direct chart-to-code generation into a step-by-step process.

With only 7B parameters, ChartCoder surpasses existing open-source MLLMs on chart-to-code benchmarks, achieving superior chart restoration and code executability.

## Notes
1. ChartCoder is tested on the new version of Chartmimic, which contains 600 samples. The iclr version of ChartMimic is https://huggingface.co/datasets/ChartMimic/ChartMimic/blob/main/dataset-iclr.tar.gz.
2. The code we utilize for evaluating is the Supplementary Material of https://openreview.net/forum?id=sGpCzsfd1K.

All the results (including the baseline and our models) in the Table 3 in the paper is evaluated based on above two settings. When conducting the assessment in other settings, there may be performance differences. If you want to replicate the performance in the paper, it is recommended to achieve it under the aforementioned settings.

## News

**[2025.5.17]** ChartCoder has been accepted by **ACL 2025 Main**.

**[2025.3.13]** We have upload our dataset [Chart2Code-160k(HF)](https://huggingface.co/datasets/xxxllz/Chart2Code-160k) to Huggingface.

**[2025.2.19]** We have released our dataset [Chart2Code-160k](https://modelscope.cn/datasets/Noct25/Chart2Code-160k) to ModelScope.

**[2025.1.16]** We have updated our data generation code [data_generator](https://github.com/thunlp/ChartCoder/tree/main/data_generator), built on [Multi-modal-Self-instruct](https://github.com/zwq2018/Multi-modal-Self-instruct). Please follow their instructions and our code to generate the <chart, code> data pairs.

## Overview

![main](https://github.com/thunlp/ChartCoder/raw/main/fig/main.png)

## Installation

To get started with ChartCoder, clone the repository and set up the environment:

```bash
git clone https://github.com/thunlp/ChartCoder.git
cd ChartCoder
conda create -n chartcoder python=3.10 -y
conda activate chartcoder
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
```

For training, additional packages are required:

```bash
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
```

## Models

| Model | Download Link |
|---|---|
| MLP Connector | [projector](https://drive.google.com/file/d/1S_LwG65TIz_miW39rFPhuEAb5ClgopYi/view?usp=drive_link) |
| ChartCoder | [ChartCoder](https://huggingface.co/xxxllz/ChartCoder) |

The MLP Connector is our pre-trained MLP weights, which you could directly use for SFT.

## Data

| Dataset | Download Link |
|---|---|
| Chart2Code-160k | [HuggingFace](https://huggingface.co/datasets/xxxllz/Chart2Code-160k) |
| Chart2Code-160k | [ModelScope](https://modelscope.cn/datasets/Noct25/Chart2Code-160k) |

## Training

The whole training process consists of two stages. To train the ChartCoder, `siglip-so400m-patch14-384` and `deepseek-coder-6.7b-instruct` should be downloaded first.

For **Pre-training**, run:

```bash
bash scripts/train/pretrain_siglip.sh
```

For **SFT**, run:

```bash
bash scripts/train/finetune_siglip_a4.sh
```

Please change the model path to your local path. See the corresponding `.sh` file for details. We also provide other training scripts, such as using CLIP `_clip` and multiple machines `_m`. See `scripts/train` for further information.

## Inference (Sample Usage)

You can easily use ChartCoder with the Hugging Face `transformers` library. Ensure you have `transformers` and `torch` installed.

```python
from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image
import requests
from io import BytesIO

# Load model and processor
model_id = "xxxllz/ChartCoder" # The model's Hugging Face ID
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

# Example image (replace with a real chart image path or URL)
# For demonstration, let's use a placeholder image. In a real scenario, load your chart image.
# Example: image = Image.open("path/to/your/chart_image.png").convert("RGB")
# Or from a URL:
image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/chart_example.png"
image = Image.open(BytesIO(requests.get(image_url).content)).convert("RGB")

# Define your prompt for chart-to-code generation
prompt = "Generate Python code to recreate the given chart. Provide only the code, no explanations."

# Prepare messages in the chat template format
messages = [
    {"role": "user", "content": f"<image>\
{prompt}"}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Move inputs to GPU if available
inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)

# Generate code
output_ids = model.generate(**inputs, max_new_tokens=512)
output_text = processor.batch_decode(output_ids, skip_special_tokens=True)[0].strip()

# Print the generated code
print(output_text)
```

## Results

Please refer to our paper for detailed performance on ChartMimic, Plot2Code and ChartX benchmarks. Thanks for these contributions to the chart-to-code field.
![results](https://github.com/thunlp/ChartCoder/raw/main/fig/results.png)

## Contact

For any questions, you can contact [2429527z@gmail.com](mailto:2429527z@gmail.com).

## Citation

If you find this work useful, consider giving this repository a star ⭐️ and citing 📝 our paper as follows:

```bibtex
@misc{zhao2025chartcoderadvancingmultimodallarge,
      title={ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation}, 
      author={Xuanle Zhao and Xianzhen Luo and Qi Shi and Chi Chen and Shuo Wang and Wanxiang Che and Zhiyuan Liu and Maosong Sun},
      year={2025},
      eprint={2501.06598},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2501.06598}, 
}
```

## Acknowledgement

The code is based on the [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT). Thanks for these great works and open sourcing!