xingxm commited on
Commit
330ffd9
·
verified ·
1 Parent(s): 370c83e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -3
README.md CHANGED
@@ -1,3 +1,126 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: transformers
6
+ tags:
7
+ - svg
8
+ - vector-graphics
9
+ - text-to-svg
10
+ - image-to-svg
11
+ - hierarchical-tokenization
12
+ - autoregressive-generation
13
+ - code-generation
14
+ base_model: Qwen/Qwen2.5-VL-3B
15
+ pipeline_tag: image-to-text
16
+ datasets:
17
+ - svg-stack
18
+ model-index:
19
+ - name: HiVG-3B-Base
20
+ results: []
21
+ ---
22
+
23
+ # HiVG-3B-Base
24
+
25
+ **HiVG-3B-Base** is a 3B-parameter vision-language model for **autoregressive Scalable Vector Graphics (SVG) generation**. It is the base model from the paper [**"Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling"**](https://arxiv.org/abs/2604.05072).
26
+
27
+ HiVG introduces a novel **hierarchical SVG tokenization framework** that replaces generic byte-level tokenization with geometry-aware atomic and segment tokens, enabling significantly more efficient and faithful SVG code generation.
28
+
29
+ | 📄 [Paper](https://arxiv.org/abs/2604.05072) | 🏠 [Project Page](https://hy-hivg.github.io/) | 🤗 [Paper Page](https://huggingface.co/papers/2604.05072) |
30
+ |---|---|---|
31
+
32
+ ## Model Description
33
+
34
+ ### Overview
35
+
36
+ Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on **generic byte-level tokenization** inherited from natural language processing, which poorly reflects the geometric structure of vector graphics — numerical coordinates are fragmented into discrete symbols, destroying spatial relationships and inflating token length and computational cost.
37
+
38
+ **HiVG** addresses these fundamental challenges through a hierarchical SVG tokenization framework:
39
+
40
+ 1. **Atomic Tokens (Level 1):** Raw SVG strings are decomposed into structured atomic tokens that preserve the full geometric semantics of SVG commands (structure, command type, and coordinates).
41
+ 2. **Segment Tokens (Level 2):** Executable command–parameter groups are further compressed into geometry-constrained segment tokens, substantially improving sequence efficiency while preserving syntactic validity.
42
+ 3. **Hierarchical Mean-Noise Initialization:** A novel embedding initialization strategy that bridges the gap between pre-trained LLM embeddings and the new SVG token space.
43
+ 4. **Curriculum Training Paradigm:** A training strategy that progressively increases SVG program complexity, enabling more stable learning of executable SVG programs.
44
+
45
+ ### Architecture
46
+
47
+ - **Parameters:** ~3B (4B total including vision encoder)
48
+ - **Training Strategy:** Full-parameter Supervised Fine-Tuning (SFT) with **frozen vision encoder**
49
+ - **Tokenization:** Hierarchical SVG tokenizer (atomic + segment tokens)
50
+
51
+ ## Intended Uses
52
+
53
+ ### Primary Use Cases
54
+
55
+ - **Text-to-SVG Generation:** Generate SVG vector graphics from natural language descriptions.
56
+ - **Image-to-SVG Generation (Vectorization):** Convert raster images into editable SVG code.
57
+
58
+ ### Out-of-Scope Uses
59
+
60
+ - This is a **base model** and has not been instruction-tuned or RLHF-aligned for production deployment.
61
+ - Not designed for generating arbitrary code beyond SVG.
62
+ - Not suitable for safety-critical applications without additional safeguards.
63
+
64
+ ## Training Details
65
+
66
+ ### Training Procedure
67
+
68
+ - **Backbone:** Qwen2.5-VL-3B
69
+ - **Fine-tuning:** Full-parameter SFT with frozen vision encoder
70
+ - **Curriculum Learning:** The model was trained with a curriculum training paradigm that progressively increases program complexity
71
+ - **Initialization:** Hierarchical mean-noise initialization strategy for new SVG token embeddings
72
+
73
+ ### Compute Infrastructure
74
+
75
+ Please refer to the [paper](https://arxiv.org/abs/2604.05072) for detailed compute specifications.
76
+
77
+ ## Evaluation
78
+
79
+ ### Tasks
80
+
81
+ The model was evaluated on both:
82
+ - **Text-to-SVG** generation
83
+ - **Image-to-SVG** generation (vectorization)
84
+
85
+ ### Results
86
+
87
+ Extensive experiments demonstrate that HiVG improves:
88
+ - **Generation fidelity** — higher visual quality of rendered SVGs
89
+ - **Spatial consistency** — better preservation of geometric layouts and spatial relationships
90
+ - **Sequence efficiency** — significantly shorter token sequences compared to conventional byte-level tokenization schemes
91
+
92
+ For detailed quantitative results, tables, and comparisons with baselines (e.g., StarVector, DuetSVG), please refer to the [paper](https://arxiv.org/abs/2604.05072).
93
+
94
+ ## How to Use
95
+
96
+ ```python
97
+ from hivg_infer import HiSVGInferencePipeline
98
+
99
+ pipeline = HiSVGInferencePipeline(
100
+ model_path="/path/to/model",
101
+ coord_range=234,
102
+ temperature=0.7,
103
+ top_p=0.9,
104
+ max_new_tokens=4096,
105
+ )
106
+
107
+ # Image-to-SVG
108
+ result = pipeline.img2svg("assets/cases/w2.png")
109
+ if result["success"]:
110
+ print(result["svg"])
111
+ ```
112
+
113
+ > Note: For detailed inference code, data preprocessing, and the hierarchical SVG tokenizer/detokenizer, please visit the [project page](https://hy-hivg.github.io/) and the associated code repository.
114
+
115
+ ## Citation
116
+
117
+ If you find this work helpful, please cite:
118
+
119
+ ```bibtex
120
+ @article{xing2026hivg,
121
+ title={Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling},
122
+ author={Ximing Xing and Ziteng Xue and Zhenxi Li and Weicong Liang and Linqing Wang and Zhantao Yang and Tiankai Hang and Zijin Yin and Qinglin Lu and Chunyu Wang and Qian Yu},
123
+ journal={arXiv preprint arXiv:2604.05072},
124
+ year={2026}
125
+ }
126
+ ```