xingxm
/

HiVG-3B-Base

@@ -20,7 +20,7 @@ model-index:
   results: []
 ---
-# HiVG-3B-Base
 **HiVG-3B-Base** is a 3B-parameter vision-language model for **autoregressive Scalable Vector Graphics (SVG) generation**. It is the base model from the paper [**"Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling"**](https://arxiv.org/abs/2604.05072).
@@ -29,24 +29,40 @@ HiVG introduces a novel **hierarchical SVG tokenization framework** that replace
 | 📄 [Paper](https://arxiv.org/abs/2604.05072) | 🏠 [Project Page](https://hy-hivg.github.io/) | 🤗 [Paper Page](https://huggingface.co/papers/2604.05072) |
 |---|---|---|
-## Model Description
-### Overview
-Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on **generic byte-level tokenization** inherited from natural language processing, which poorly reflects the geometric structure of vector graphics — numerical coordinates are fragmented into discrete symbols, destroying spatial relationships and inflating token length and computational cost.
-**HiVG** addresses these fundamental challenges through a hierarchical SVG tokenization framework:
-1. **Atomic Tokens (Level 1):** Raw SVG strings are decomposed into structured atomic tokens that preserve the full geometric semantics of SVG commands (structure, command type, and coordinates).
-2. **Segment Tokens (Level 2):** Executable command–parameter groups are further compressed into geometry-constrained segment tokens, substantially improving sequence efficiency while preserving syntactic validity.
-3. **Hierarchical Mean-Noise Initialization:** A novel embedding initialization strategy that bridges the gap between pre-trained LLM embeddings and the new SVG token space.
-4. **Curriculum Training Paradigm:** A training strategy that progressively increases SVG program complexity, enabling more stable learning of executable SVG programs.
-### Architecture
-- **Parameters:** ~3B (4B total including vision encoder)
-- **Training Strategy:** Full-parameter Supervised Fine-Tuning (SFT) with **frozen vision encoder**
-- **Tokenization:** Hierarchical SVG tokenizer (atomic + segment tokens)
 ## Intended Uses
@@ -74,44 +90,6 @@ Recent large language models have shifted SVG generation from differentiable ren
 Please refer to the [paper](https://arxiv.org/abs/2604.05072) for detailed compute specifications.
-## Evaluation
-### Tasks
-The model was evaluated on both:
-- **Text-to-SVG** generation
-- **Image-to-SVG** generation (vectorization)
-### Results
-Extensive experiments demonstrate that HiVG improves:
-- **Generation fidelity** — higher visual quality of rendered SVGs
-- **Spatial consistency** — better preservation of geometric layouts and spatial relationships
-- **Sequence efficiency** — significantly shorter token sequences compared to conventional byte-level tokenization schemes
-For detailed quantitative results, tables, and comparisons with baselines (e.g., StarVector, DuetSVG), please refer to the [paper](https://arxiv.org/abs/2604.05072).
-## How to Use
-```python
-from hivg_infer import HiSVGInferencePipeline
-pipeline = HiSVGInferencePipeline(
-    model_path="/path/to/model",
-    coord_range=234,
-    temperature=0.7,
-    top_p=0.9,
-    max_new_tokens=4096,
-)
-# Image-to-SVG
-result = pipeline.img2svg("assets/cases/w2.png")
-if result["success"]:
-    print(result["svg"])
-```
-> Note: For detailed inference code, data preprocessing, and the hierarchical SVG tokenizer/detokenizer, please visit the [project page](https://hy-hivg.github.io/) and the associated code repository.
 ## Citation
 If you find this work helpful, please cite:

   results: []
 ---
+# HiVG: Hierarchical SVG Tokenization
 **HiVG-3B-Base** is a 3B-parameter vision-language model for **autoregressive Scalable Vector Graphics (SVG) generation**. It is the base model from the paper [**"Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling"**](https://arxiv.org/abs/2604.05072).
 | 📄 [Paper](https://arxiv.org/abs/2604.05072) | 🏠 [Project Page](https://hy-hivg.github.io/) | 🤗 [Paper Page](https://huggingface.co/papers/2604.05072) |
 |---|---|---|
+## Highlights
+- **Small Model, Frontier Results** — 3B parameters that beat 7/7 proprietary models including GPT-5 and Gemini 2.5 on image-to-SVG.
+- **Efficient SVG Token Compression** — Hierarchical tokenization (Raw SVG → Atomic tokens → Segment tokens) with 2.76x sequence compression.
+- **High-Fidelity Image-to-SVG** — Convert any image into a clean, editable SVG — structure, layout, and detail faithfully preserved.
+## Quick Start
+You can use the provided inference pipeline for both image-to-SVG and text-to-SVG tasks.
+```python
+from hivg_infer import HiSVGInferencePipeline
+pipeline = HiSVGInferencePipeline(
+    model_path="xingxm/HiVG-3B-Base",
+    coord_range=234,
+    temperature=0.7,
+    top_p=0.9,
+    max_new_tokens=4096,
+)
+# Image-to-SVG
+result = pipeline.img2svg("path/to/your_image.png")
+if result["success"]:
+    print(result["svg"])
+# Text-to-SVG
+result = pipeline.text2svg("A minimalist black phone icon with an outline style")
+if result["success"]:
+    with open("output.svg", "w") as f:
+        f.write(result["svg"])
+```
+> Note: For detailed inference code, data preprocessing, and the hierarchical SVG tokenizer/detokenizer, please visit the [project page](https://hy-hivg.github.io/) and the associated code repository.
 ## Intended Uses
 Please refer to the [paper](https://arxiv.org/abs/2604.05072) for detailed compute specifications.
 ## Citation
 If you find this work helpful, please cite: