Papers
arxiv:2603.16189

Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

Published on Mar 17
· Submitted by
Haomin Wang
on Mar 18
Authors:
,
,
,
,
,

Abstract

CTRL-S framework enhances SVG generation through chain-of-thought reasoning and multi-reward optimization, achieving better structural coherence and visual fidelity.

AI-generated summary

With the rapid advancement of vision-language models, an increasing number of studies have explored their potential for SVG generation tasks. Although existing approaches improve performance by constructing large-scale SVG datasets and introducing SVG-specific tokens, they still suffer from limited generalization, redundant paths in code outputs, and a lack of explicit reasoning. In this work, we present CTRL-S (Chain-of-Thought Reinforcement Learning for SVG), a unified framework that introduces a chain-of-thought mechanism to explicitly expose the model's reasoning process during SVG generation. To support this structured reasoning, we construct SVG-Sophia, a high-quality dataset containing 145K samples across SVG code refinement, Text-to-SVG, and Image-to-SVG tasks. By training the model to generate group-level structured SVG code, CTRL-S significantly improves structural coherence and visual fidelity. Furthermore, we adopt the GRPO algorithm and design a multi-reward optimization framework, incorporating DINO, image-text similarity, format, and code efficiency rewards. Through joint multi-reward optimization and multi-task training, our approach systematically enhances overall generation capabilities. Extensive experiments show that CTRL-S outperforms existing methods, achieving higher task success rates, superior SVG code quality, and exceptional visual fidelity.

Community

In this work, we present CTRL-S (Chain-of-Thought Reinforcement Learning for SVG), a unified framework that introduces a chain-of-thought mechanism to explicitly expose the model’s reasoning process during SVG generation. To support this structured reasoning, we construct SVG-Sophia, a high-quality dataset of 145K samples across SVG code refinement, Text-to-SVG, and Image-to-SVG tasks. Furthermore, we design a robust multi-reward reinforcement learning scheme powered by the GRPO algorithm. By jointly optimizing across DINO, image-text similarity, format, and code efficiency rewards in a multi-task setting, our approach systematically boosts structural coherence and generation capabilities. Extensive experiments show that CTRL-S outperforms existing methods, achieving higher task success rates, superior code quality, and exceptional visual fidelity.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.16189 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.16189 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.