--- license: cc-by-nc-4.0 language: - en base_model: - Qwen/Qwen2.5-VL-7B-Instruct pipeline_tag: image-text-to-text library_name: transformers tags: - chart-to-code - multimodal - vision-language - sft - cold-start - matplotlib --- # MM-ReCoder-SFT-Cold-Start

CVPR 2026  |  Project Page  |  arXiv  |  Code  |  Final RL Model

**MM-ReCoder-SFT-Cold-Start** is the supervised fine-tuned cold-start checkpoint released alongside the CVPR 2026 paper [*MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction*](https://arxiv.org/abs/2604.01600). It is fine-tuned from [`Qwen/Qwen2.5-VL-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) to bootstrap the chart-to-code and self-correction behaviors before the multi-turn RL stages. > **This is an intermediate checkpoint**, not the final MM-ReCoder model. > If you want the best chart-to-code performance, use > [`cwbc/MM-ReCoder`](https://huggingface.co/cwbc/MM-ReCoder) instead. > This checkpoint is released for researchers who want to reproduce or > ablate the RL stages of the paper. ## Intended Use This checkpoint is intended as the **starting point for multi-turn RL** training. The pipeline is: 1. **SFT cold-start** *(this checkpoint)* — Qwen2.5-VL-7B-Instruct fine-tuned on chart-to-code demonstrations. 2. **Multi-turn RL (GRPO), stage 1** — shared-first-turn optimization, initialized from this checkpoint. 3. **Multi-turn RL (GRPO), stage 2** — full-trajectory optimization, resumed from stage 1. The result is released as [`cwbc/MM-ReCoder`](https://huggingface.co/cwbc/MM-ReCoder). ## Usage To kick off RL from this cold-start checkpoint, clone the [official repository](https://github.com/ZitianTang/MM-ReCoder) and run the stage 1 training script (which references this checkpoint via `REF_MODEL_PATH=cwbc/MM-ReCoder-SFT-Cold-Start`): ```bash git clone https://github.com/ZitianTang/MM-ReCoder.git cd MM-ReCoder # Follow the Installation section in the repo README, then launch the # LLM-as-a-judge reward server (see the RL Training section). # Stage 1: multi-turn GRPO with a shared first turn. bash examples/mmrecoder/train/stage1-shared-first-turn.sh # Stage 2: multi-turn GRPO on the full trajectory, resumed from stage 1. bash examples/mmrecoder/train/stage2-full-trajectory.sh ``` ### Multi-Turn Inference with the Cold-Start Model This checkpoint also supports the multi-turn self-correction inference loop from the repository — useful for measuring the RL gains over the SFT-only baseline. Reuse the inference scripts and override the model path: ```bash # Download the cold-start checkpoint. hf download cwbc/MM-ReCoder-SFT-Cold-Start # Two-turn self-correction on ChartMimic, using the cold-start model. bash examples/mmrecoder/inference/chartmimic_2turns.sh \ model.path=cwbc/MM-ReCoder-SFT-Cold-Start \ data.output_path=generations/coldstart_chartmimic_2turns.json ``` The self-correction *policy* is sharpened by the RL stages, so the cold-start model will generally underperform [`cwbc/MM-ReCoder`](https://huggingface.co/cwbc/MM-ReCoder) on multi-turn benchmarks; this is the intended baseline comparison. ### Direct single-turn use You can also load the checkpoint directly with `transformers` to inspect single-turn chart-to-code behavior: ```python from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration import torch model_id = "cwbc/MM-ReCoder-SFT-Cold-Start" processor = AutoProcessor.from_pretrained(model_id) model = Qwen2_5_VLForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) ``` ## Citation ```bibtex @inproceedings{tang2026mmrecoder, title={MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction}, author={Zitian Tang and Xu Zhang and Jianbo Yuan and Yang Zou and Varad Gunjal and Songyao Jiang and Davide Modolo}, booktitle={CVPR}, year={2026} } ``` ## License Released under the Apache 2.0 License, inheriting from the base Qwen2.5-VL-7B-Instruct license.