cwbc commited on
Commit
44c0d3c
·
verified ·
1 Parent(s): 0712c1f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +129 -0
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-VL-7B-Instruct
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
+ tags:
10
+ - chart-to-code
11
+ - multimodal
12
+ - vision-language
13
+ - sft
14
+ - cold-start
15
+ - matplotlib
16
+ ---
17
+
18
+ # MM-ReCoder-SFT-Cold-Start
19
+
20
+ <p align="center">
21
+ <a href="https://cvpr.thecvf.com/Conferences/2026"><b>CVPR 2026</b></a>
22
+ &nbsp;|&nbsp;
23
+ <a href="https://zitiantang.github.io/MM-ReCoder/">Project Page</a>
24
+ &nbsp;|&nbsp;
25
+ <a href="https://arxiv.org/abs/2604.01600">arXiv</a>
26
+ &nbsp;|&nbsp;
27
+ <a href="https://github.com/ZitianTang/MM-ReCoder">Code</a>
28
+ &nbsp;|&nbsp;
29
+ <a href="https://huggingface.co/cwbc/MM-ReCoder">Final RL Model</a>
30
+ </p>
31
+
32
+ **MM-ReCoder-SFT-Cold-Start** is the supervised fine-tuned cold-start
33
+ checkpoint released alongside the CVPR 2026 paper
34
+ [*MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction*](https://arxiv.org/abs/2604.01600).
35
+ It is fine-tuned from
36
+ [`Qwen/Qwen2.5-VL-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
37
+ to bootstrap the chart-to-code and self-correction behaviors before the
38
+ multi-turn RL stages.
39
+
40
+ > **This is an intermediate checkpoint**, not the final MM-ReCoder model.
41
+ > If you want the best chart-to-code performance, use
42
+ > [`cwbc/MM-ReCoder`](https://huggingface.co/cwbc/MM-ReCoder) instead.
43
+ > This checkpoint is released for researchers who want to reproduce or
44
+ > ablate the RL stages of the paper.
45
+
46
+ ## Intended Use
47
+
48
+ This checkpoint is intended as the **starting point for multi-turn RL**
49
+ training. The pipeline is:
50
+
51
+ 1. **SFT cold-start** *(this checkpoint)* — Qwen2.5-VL-7B-Instruct fine-tuned
52
+ on chart-to-code demonstrations.
53
+ 2. **Multi-turn RL (GRPO), stage 1** — shared-first-turn optimization,
54
+ initialized from this checkpoint.
55
+ 3. **Multi-turn RL (GRPO), stage 2** — full-trajectory optimization, resumed
56
+ from stage 1. The result is released as
57
+ [`cwbc/MM-ReCoder`](https://huggingface.co/cwbc/MM-ReCoder).
58
+
59
+ ## Usage
60
+
61
+ To kick off RL from this cold-start checkpoint, clone the
62
+ [official repository](https://github.com/ZitianTang/MM-ReCoder) and run the
63
+ stage 1 training script (which references this checkpoint via
64
+ `REF_MODEL_PATH=cwbc/MM-ReCoder-SFT-Cold-Start`):
65
+
66
+ ```bash
67
+ git clone https://github.com/ZitianTang/MM-ReCoder.git
68
+ cd MM-ReCoder
69
+ # Follow the Installation section in the repo README, then launch the
70
+ # LLM-as-a-judge reward server (see the RL Training section).
71
+
72
+ # Stage 1: multi-turn GRPO with a shared first turn.
73
+ bash examples/mmrecoder/train/stage1-shared-first-turn.sh
74
+
75
+ # Stage 2: multi-turn GRPO on the full trajectory, resumed from stage 1.
76
+ bash examples/mmrecoder/train/stage2-full-trajectory.sh
77
+ ```
78
+
79
+ ### Multi-Turn Inference with the Cold-Start Model
80
+
81
+ This checkpoint also supports the multi-turn self-correction inference
82
+ loop from the repository — useful for measuring the RL gains over the
83
+ SFT-only baseline. Reuse the inference scripts and override the model path:
84
+
85
+ ```bash
86
+ # Download the cold-start checkpoint.
87
+ hf download cwbc/MM-ReCoder-SFT-Cold-Start
88
+
89
+ # Two-turn self-correction on ChartMimic, using the cold-start model.
90
+ bash examples/mmrecoder/inference/chartmimic_2turns.sh \
91
+ model.path=cwbc/MM-ReCoder-SFT-Cold-Start \
92
+ data.output_path=generations/coldstart_chartmimic_2turns.json
93
+ ```
94
+
95
+ The self-correction *policy* is sharpened by the RL stages, so the
96
+ cold-start model will generally underperform [`cwbc/MM-ReCoder`](https://huggingface.co/cwbc/MM-ReCoder)
97
+ on multi-turn benchmarks; this is the intended baseline comparison.
98
+
99
+ ### Direct single-turn use
100
+
101
+ You can also load the checkpoint directly with `transformers` to inspect
102
+ single-turn chart-to-code behavior:
103
+
104
+ ```python
105
+ from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
106
+ import torch
107
+
108
+ model_id = "cwbc/MM-ReCoder-SFT-Cold-Start"
109
+ processor = AutoProcessor.from_pretrained(model_id)
110
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
111
+ model_id, torch_dtype=torch.bfloat16, device_map="auto"
112
+ )
113
+ ```
114
+
115
+ ## Citation
116
+
117
+ ```bibtex
118
+ @inproceedings{tang2026mmrecoder,
119
+ title={MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction},
120
+ author={Zitian Tang and Xu Zhang and Jianbo Yuan and Yang Zou and Varad Gunjal and Songyao Jiang and Davide Modolo},
121
+ booktitle={CVPR},
122
+ year={2026}
123
+ }
124
+ ```
125
+
126
+ ## License
127
+
128
+ Released under the Apache 2.0 License, inheriting from the base
129
+ Qwen2.5-VL-7B-Instruct license.