cwbc commited on
Commit
b61437b
·
verified ·
1 Parent(s): 5d00a21

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +129 -0
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-VL-7B-Instruct
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
+ tags:
10
+ - chart-to-code
11
+ - multimodal
12
+ - vision-language
13
+ - reinforcement-learning
14
+ - self-correction
15
+ - matplotlib
16
+ ---
17
+
18
+ # MM-ReCoder
19
+
20
+ <p align="center">
21
+ <a href="https://cvpr.thecvf.com/Conferences/2026"><b>CVPR 2026</b></a>
22
+ &nbsp;|&nbsp;
23
+ <a href="https://zitiantang.github.io/MM-ReCoder/">Project Page</a>
24
+ &nbsp;|&nbsp;
25
+ <a href="https://arxiv.org/abs/2604.01600">arXiv</a>
26
+ &nbsp;|&nbsp;
27
+ <a href="https://github.com/ZitianTang/MM-ReCoder">Code</a>
28
+ &nbsp;|&nbsp;
29
+ <a href="https://huggingface.co/cwbc/MM-ReCoder-SFT-Cold-Start">SFT Cold-Start</a>
30
+ </p>
31
+
32
+ **MM-ReCoder** is the 7B vision-language model from the CVPR 2026 paper
33
+ [*MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction*](https://arxiv.org/abs/2604.01600).
34
+ It converts a chart image into the matplotlib code that reproduces it. At
35
+ inference time the model renders its own code with a sandboxed matplotlib
36
+ tool, inspects the result, and self-corrects across multiple turns.
37
+
38
+ This is the **final** RL-trained checkpoint. It is fine-tuned from
39
+ [`Qwen/Qwen2.5-VL-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
40
+ via:
41
+
42
+ 1. **SFT cold-start** — released separately as
43
+ [`cwbc/MM-ReCoder-SFT-Cold-Start`](https://huggingface.co/cwbc/MM-ReCoder-SFT-Cold-Start).
44
+ 2. **Multi-turn RL (GRPO), stage 1** — shared-first-turn optimization.
45
+ 3. **Multi-turn RL (GRPO), stage 2** — full-trajectory optimization,
46
+ resumed from stage 1.
47
+
48
+ ## Usage
49
+
50
+ The recommended way to use MM-ReCoder is through the inference scripts in
51
+ the [official repository](https://github.com/ZitianTang/MM-ReCoder), which
52
+ wrap the model with the self-correction agent loop (render → critique →
53
+ revise):
54
+
55
+ ```bash
56
+ git clone https://github.com/ZitianTang/MM-ReCoder.git
57
+ cd MM-ReCoder
58
+ # Follow the Installation section in the repo README.
59
+
60
+ # Downalod the MM-ReCoder checkpoint from Hugging Face
61
+ hf download cwbc/MM-ReCoder
62
+
63
+ # Two-turn self-correction on ChartMimic.
64
+ bash examples/mmrecoder/inference/chartmimic_2turns.sh
65
+ ```
66
+
67
+ ### Direct single-turn use (no self-correction)
68
+
69
+ You can also load the model in a single-pass setting via `transformers`:
70
+
71
+ ```python
72
+ from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
73
+ from PIL import Image
74
+ import torch
75
+
76
+ model_id = "cwbc/MM-ReCoder"
77
+ processor = AutoProcessor.from_pretrained(model_id)
78
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
79
+ model_id, torch_dtype=torch.bfloat16, device_map="auto"
80
+ )
81
+
82
+ image = Image.open("path/to/chart.png").convert("RGB")
83
+ messages = [{
84
+ "role": "user",
85
+ "content": [
86
+ {"type": "image", "image": image},
87
+ {"type": "text", "text": "Generate the matplotlib code that reproduces this chart."},
88
+ ],
89
+ }]
90
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
91
+ inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
92
+
93
+ out = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
94
+ print(processor.batch_decode(out[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0])
95
+ ```
96
+
97
+ This emits code in one shot. The full self-correction behavior requires the
98
+ agent loop in the repository.
99
+
100
+ ## Training
101
+
102
+ - **Base model:** Qwen2.5-VL-7B-Instruct.
103
+ - **RL algorithm:** GRPO with chart-specific rule-based rewards (format,
104
+ color, text, layout, type) plus an LLM-as-a-judge model reward.
105
+ - **RL data:** [Chart2Code-160k](https://huggingface.co/datasets/xxxllz/Chart2Code-160k)
106
+ prompts.
107
+ - **Evaluation:**
108
+ [ChartMimic](https://github.com/ChartMimic/ChartMimic) (direct-600),
109
+ [Plot2Code](https://github.com/TencentARC/Plot2Code), and
110
+ [ChartX](https://github.com/InternScience/ChartVLM).
111
+
112
+ See the [repository](https://github.com/ZitianTang/MM-ReCoder) for full
113
+ training scripts and configs.
114
+
115
+ ## Citation
116
+
117
+ ```bibtex
118
+ @inproceedings{tang2026mmrecoder,
119
+ title={MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction},
120
+ author={Zitian Tang and Xu Zhang and Jianbo Yuan and Yang Zou and Varad Gunjal and Songyao Jiang and Davide Modolo},
121
+ booktitle={CVPR},
122
+ year={2026}
123
+ }
124
+ ```
125
+
126
+ ## License
127
+
128
+ Released under the Apache 2.0 License, inheriting from the base
129
+ Qwen2.5-VL-7B-Instruct license.