VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
Paper • 2603.24575 • Published • 12
This repository contains the merged RL-trained checkpoint for VFIG-4B, a 4B vision-language model for converting complex scientific and technical figures into clean, editable SVG code.
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3-VL-4B-Instruct |
| Training | SFT + GRPO-based RL with rendering-aware visual rewards |
| Architecture | LoRA on language model; vision encoder and projector frozen |
| Parameters | 4B |
| Precision | BF16 |
Given an input image of a scientific or technical figure, the model generates SVG code that reconstructs the figure as a scalable, editable vector graphic.
pip install transformers torch accelerate pillow
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText
model_name = "XunmeiLiu/VFIG-4B"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="cuda",
trust_remote_code=True,
)
model.eval()
def figure_to_svg(image_path: str) -> str:
img = Image.open(image_path).convert("RGB")
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Convert this figure into valid SVG code."},
],
}
]
chat_input = processor.tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(text=[chat_input], images=[img], return_tensors="pt").to("cuda")
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=8192, do_sample=False)
decoded = processor.tokenizer.decode(output_ids[0], skip_special_tokens=True)
if "<svg" in decoded:
decoded = decoded[decoded.find("<svg"):]
if "</svg>" in decoded:
decoded = decoded[: decoded.find("</svg>") + len("</svg>")]
return decoded.strip()
# Download the example image
import urllib.request
urllib.request.urlretrieve(
"https://huggingface.co/XunmeiLiu/VFIG-4B/resolve/main/simple_diagram.png",
"simple_diagram.png"
)
svg_code = figure_to_svg("simple_diagram.png")
print(svg_code)
This repository uses the cc-by-4.0 license. Please also ensure that redistribution is compatible with the license of the underlying base model.
If you use VFIG-4B in your research, please cite:
@misc{he2026vfigvectorizingcomplexfigures,
title={VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models},
author={Qijia He and Xunmei Liu and Hammaad Memon and Ziang Li and Zixian Ma and Jaemin Cho and Jason Ren and Daniel S Weld and Ranjay Krishna},
year={2026},
eprint={2603.24575},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.24575},
}