VFIG-4B

This repository contains the merged RL-trained checkpoint for VFIG-4B, a 4B vision-language model for converting complex scientific and technical figures into clean, editable SVG code.

Model Details

Property Value
Base Model Qwen/Qwen3-VL-4B-Instruct
Training SFT + GRPO-based RL with rendering-aware visual rewards
Architecture LoRA on language model; vision encoder and projector frozen
Parameters 4B
Precision BF16

Intended Use

Given an input image of a scientific or technical figure, the model generates SVG code that reconstructs the figure as a scalable, editable vector graphic.

Quick Start

pip install transformers torch accelerate pillow
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText

model_name = "XunmeiLiu/VFIG-4B"

processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
    trust_remote_code=True,
)
model.eval()

def figure_to_svg(image_path: str) -> str:
    img = Image.open(image_path).convert("RGB")

    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image"},
                {"type": "text", "text": "Convert this figure into valid SVG code."},
            ],
        }
    ]

    chat_input = processor.tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = processor(text=[chat_input], images=[img], return_tensors="pt").to("cuda")

    with torch.no_grad():
        output_ids = model.generate(**inputs, max_new_tokens=8192, do_sample=False)

    decoded = processor.tokenizer.decode(output_ids[0], skip_special_tokens=True)

    if "<svg" in decoded:
        decoded = decoded[decoded.find("<svg"):]
    if "</svg>" in decoded:
        decoded = decoded[: decoded.find("</svg>") + len("</svg>")]
    return decoded.strip()

# Download the example image
import urllib.request
urllib.request.urlretrieve(
    "https://huggingface.co/XunmeiLiu/VFIG-4B/resolve/main/simple_diagram.png",
    "simple_diagram.png"
)

svg_code = figure_to_svg("simple_diagram.png")
print(svg_code)

License

This repository uses the cc-by-4.0 license. Please also ensure that redistribution is compatible with the license of the underlying base model.

Citation

If you use VFIG-4B in your research, please cite:

@misc{he2026vfigvectorizingcomplexfigures,
      title={VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models}, 
      author={Qijia He and Xunmei Liu and Hammaad Memon and Ziang Li and Zixian Ma and Jaemin Cho and Jason Ren and Daniel S Weld and Ranjay Krishna},
      year={2026},
      eprint={2603.24575},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.24575}, 
}
Downloads last month
260
Safetensors
Model size
4B params
Tensor type
BF16
·
Video Preview
loading

Model tree for XunmeiLiu/VFIG-4B

Finetuned
(225)
this model
Quantizations
2 models

Space using XunmeiLiu/VFIG-4B 1

Paper for XunmeiLiu/VFIG-4B