Image-to-Text
Transformers
Safetensors
English
qwen2_5_vl
image-text-to-text
svg
hivg
vector-graphics
text-to-svg
image-to-svg
hierarchical-tokenization
autoregressive-generation
code-generation
text-generation-inference
Instructions to use xingxm/HiVG-3B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use xingxm/HiVG-3B-Base with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="xingxm/HiVG-3B-Base")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("xingxm/HiVG-3B-Base") model = AutoModelForImageTextToText.from_pretrained("xingxm/HiVG-3B-Base") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| license: mit | |
| library_name: transformers | |
| tags: | |
| - svg | |
| - hivg | |
| - vector-graphics | |
| - text-to-svg | |
| - image-to-svg | |
| - hierarchical-tokenization | |
| - autoregressive-generation | |
| - code-generation | |
| base_model: Qwen/Qwen2.5-VL-3B | |
| pipeline_tag: image-to-text | |
| model-index: | |
| - name: HiVG-3B-Base | |
| results: [] | |
| datasets: | |
| - xingxm/SVGX-Core-250k | |
| # HiVG: Hierarchical SVG Tokenization | |
| **HiVG-3B-Base** is a 3B-parameter vision-language model for **autoregressive Scalable Vector Graphics (SVG) generation**. | |
| <p align="center"> | |
| <a href="https://arxiv.org/abs/2604.05072"><img src="https://img.shields.io/badge/arXiv-2604.05072-B31B1B?style=for-the-badge&logo=arxiv&logoColor=white" alt="arXiv"></a> | |
| <a href="https://hy-hivg.github.io/"><img src="https://img.shields.io/badge/Project-Page-green?style=for-the-badge" alt="Project Page"></a> | |
| <a href="https://huggingface.co/papers/2604.05072"><img src="https://img.shields.io/badge/%F0%9F%A4%97-Paper%20Page-yellow?style=for-the-badge" alt="HuggingFace Paper Page"></a> | |
| </p> | |
| HiVG introduces a novel **hierarchical SVG tokenization framework** that replaces generic byte-level tokenization with geometry-aware atomic and segment tokens, enabling significantly more efficient and faithful SVG code generation. | |
| ## Highlights | |
| - **Small Model, Frontier Results** β 3B parameters that beat 7/7 proprietary models including GPT-5 and Gemini 2.5 on image-to-SVG. | |
| - **Efficient SVG Token Compression** β Hierarchical tokenization (Raw SVG β Atomic tokens β Segment tokens) with 2.76x sequence compression. | |
| - **High-Fidelity Image-to-SVG** β Convert any image into a clean, editable SVG β structure, layout, and detail faithfully preserved. | |
| ## Quick Start | |
| You can use the provided inference pipeline for both image-to-SVG and text-to-SVG tasks. | |
| ```python | |
| from hivg_infer import HiSVGInferencePipeline | |
| pipeline = HiSVGInferencePipeline( | |
| model_path="xingxm/HiVG-3B-Base", | |
| coord_range=234, | |
| temperature=0.7, | |
| top_p=0.9, | |
| max_new_tokens=4096, | |
| ) | |
| # Image-to-SVG | |
| result = pipeline.img2svg("path/to/your_image.png") | |
| if result["success"]: | |
| print(result["svg"]) | |
| # Text-to-SVG | |
| result = pipeline.text2svg("A minimalist black phone icon with an outline style") | |
| if result["success"]: | |
| with open("output.svg", "w") as f: | |
| f.write(result["svg"]) | |
| ``` | |
| > Note: For detailed inference code, data preprocessing, and the hierarchical SVG tokenizer/detokenizer, please visit the [project page](https://hy-hivg.github.io/) and the associated code repository. | |
| ## Intended Uses | |
| ### Primary Use Cases | |
| - **Text-to-SVG Generation:** Generate SVG vector graphics from natural language descriptions. | |
| - **Image-to-SVG Generation (Vectorization):** Convert raster images into editable SVG code. | |
| ### Out-of-Scope Uses | |
| - This is a **base model** and has not been instruction-tuned or RLHF-aligned for production deployment. | |
| - Not designed for generating arbitrary code beyond SVG. | |
| - Not suitable for safety-critical applications without additional safeguards. | |
| ## Training Details | |
| ### Training Procedure | |
| - **Backbone:** Qwen2.5-VL-3B | |
| - **Fine-tuning:** Full-parameter SFT with frozen vision encoder | |
| - **Curriculum Learning:** The model was trained with a curriculum training paradigm that progressively increases program complexity | |
| - **Initialization:** Hierarchical mean-noise initialization strategy for new SVG token embeddings | |
| ### Compute Infrastructure | |
| Please refer to the [paper](https://arxiv.org/abs/2604.05072) for detailed compute specifications. | |
| ## Citation | |
| If you find this work helpful, please cite: | |
| ```bibtex | |
| @article{xing2026hivg, | |
| title={Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling}, | |
| author={Ximing Xing and Ziteng Xue and Zhenxi Li and Weicong Liang and Linqing Wang and Zhantao Yang and Tiankai Hang and Zijin Yin and Qinglin Lu and Chunyu Wang and Qian Yu}, | |
| journal={arXiv preprint arXiv:2604.05072}, | |
| year={2026} | |
| } | |
| ``` |