Duplicated from InternSVG/InternSVG-8B

minghexx
/

InternSVG-8B

Model card Files Files and versions

InternSVG-8B / README.md

minghexx's picture

Duplicate from InternSVG/InternSVG-8B

cf16dd3 27 days ago

|

history blame contribute delete

3.61 kB

	---
	license: apache-2.0
	datasets:
	- InternSVG/SAgoge
	base_model:
	- OpenGVLab/InternVL3-8B
	---
	<div align="center">
	<h1> InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models </h1>

	<div align="center">
	<a href='https://arxiv.org/abs/2510.11341'><img src='https://img.shields.io/badge/arXiv-2510.11341-b31b1b?logo=arXiv'></a>
	<a href='https://hmwang2002.github.io/release/internsvg/'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
	<a href="https://huggingface.co/datasets/InternSVG/SArena"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Benchmark%20-HF-orange"></a>
	<a href="https://huggingface.co/datasets/InternSVG/SAgoge"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset%20-HF-orange"></a>
	<a href="https://huggingface.co/InternSVG/InternSVG-8B"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Model%20-HF-orange"></a>
	</div>
	</div>

	## 🤖 InternSVG Model

	The InternSVG-8B model is available at [Hugging Face](https://huggingface.co/InternSVG/InternSVG-8B). It is based on the InternVL3-8B model, incorporating SVG-specific tokens, and undergoes Supervised Fine-Tuning (SFT) under a two-stage training strategy using the massive SVG training samples from the SAgoge dataset.

	### Deploy

	We recommend using [LMDeploy](https://github.com/InternLM/lmdeploy) for deployment. An example of launching a proxy server with 8 parallel workers (one per GPU) is provided below:

	```bash
	#!/bin/bash
	model_path="MODEL_PATH"
	model_name="InternSVG"

	# proxy
	lmdeploy serve proxy --server-name 0.0.0.0 --server-port 10010 --routing-strategy "min_expected_latency" &

	worker_num=8
	for ((i = 0; i < worker_num; i++)); do
	timestamp=$(date +"%Y-%m-%d_%H-%M-%S")
	CUDA_VISIBLE_DEVICES="${i}" lmdeploy serve api_server ${model_path} --proxy-url http://0.0.0.0:10010 \
	--model-name ${model_name} \
	--tp 1 \
	--max-batch-size 512 \
	--backend pytorch \
	--server-port $((10000 + i)) \
	--session-len 16384 \
	--chat-template "internvl2_5" \
	--log-level WARNING &>> ./logs/api_${model_name}_${timestamp}_${i}.out &
	sleep 10s
	done
	```

	### Train

	If you need to train your own model, please follow these steps:

	1. Prepare the Dataset: Download the SAgoge dataset. After that, update the paths for the SAgoge-related subdatasets in `LLaMA-Factory/data/dataset_info.json` to match your local file paths.
	2. Download InternVL3-8B: Download the InternVL3-8B from [link](https://huggingface.co/OpenGVLab/InternVL3-8B-hf).
	3. Add Special Tokens: Before training, you must add SVG-specific tokens to the base model. Run the `utils/add_token.py` script, which adds these special tokens to the original model weights and initializes their embeddings based on subwords.
	4. Start Training: We provide example configuration scripts for the two-stage training process. You can find them at:
	- Stage 1: `LLaMA-Factory/examples/train_full/stage_1.yaml`
	- Stage 2: `LLaMA-Factory/examples/train_full/stage_2.yaml`

	Then use `llamafactory-cli train` to start training.

	## 📖 Citation

	```BibTex
	@article{wang2025internsvg,
	title={InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models},
	author={Wang, Haomin and Yin, Jinhui and Wei, Qi and Zeng, Wenguang and Gu, Lixin and Ye, Shenglong and Gao, Zhangwei and Wang, Yaohui and Zhang, Yanting and Li, Yuanqi and others},
	journal={arXiv preprint arXiv:2510.11341},
	year={2025}
	}
	```