Safetensors
internvl
KiyotakaWang commited on
Commit
089910d
·
verified ·
1 Parent(s): 9e3b180

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -2
README.md CHANGED
@@ -1,5 +1,9 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
3
  ---
4
  <div align="center">
5
  <h1> InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models </h1>
@@ -13,6 +17,58 @@ license: apache-2.0
13
  </div>
14
  </div>
15
 
16
- ## 🤖 InternSVG-8B
17
 
18
- A unified multimodal large language model (MLLM) for SVG understanding, editing, and generation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - InternSVG/SAgoge
5
+ base_model:
6
+ - OpenGVLab/InternVL3-8B
7
  ---
8
  <div align="center">
9
  <h1> InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models </h1>
 
17
  </div>
18
  </div>
19
 
20
+ ## **🤖 InternSVG Model**
21
 
22
+ The **InternSVG-8B** model is available at [Hugging Face](https://huggingface.co/InternSVG/InternSVG-8B). It is based on the InternVL3-8B model, incorporating SVG-specific tokens, and undergoes Supervised Fine-Tuning (SFT) under a two-stage training strategy using the massive SVG training samples from the SAgoge dataset.
23
+
24
+ ### Deploy
25
+
26
+ We recommend using [LMDeploy](https://github.com/InternLM/lmdeploy) for deployment. An example of launching a proxy server with 8 parallel workers (one per GPU) is provided below:
27
+
28
+ ```bash
29
+ #!/bin/bash
30
+ model_path="MODEL_PATH"
31
+ model_name="InternSVG"
32
+
33
+ # proxy
34
+ lmdeploy serve proxy --server-name 0.0.0.0 --server-port 10010 --routing-strategy "min_expected_latency" &
35
+
36
+ worker_num=8
37
+ for ((i = 0; i < worker_num; i++)); do
38
+ timestamp=$(date +"%Y-%m-%d_%H-%M-%S")
39
+ CUDA_VISIBLE_DEVICES="${i}" lmdeploy serve api_server ${model_path} --proxy-url http://0.0.0.0:10010 \
40
+ --model-name ${model_name} \
41
+ --tp 1 \
42
+ --max-batch-size 512 \
43
+ --backend pytorch \
44
+ --server-port $((10000 + i)) \
45
+ --session-len 16384 \
46
+ --chat-template "internvl2_5" \
47
+ --log-level WARNING &>> ./logs/api_${model_name}_${timestamp}_${i}.out &
48
+ sleep 10s
49
+ done
50
+ ```
51
+
52
+ ### Train
53
+
54
+ If you need to train your own model, please follow these steps:
55
+
56
+ 1. **Prepare the Dataset:** Download the **SAgoge** dataset. After that, update the paths for the SAgoge-related subdatasets in `LLaMA-Factory/data/dataset_info.json` to match your local file paths.
57
+ 2. **Download InternVL3-8B:** Download the InternVL3-8B from [link](https://huggingface.co/OpenGVLab/InternVL3-8B-hf).
58
+ 3. **Add Special Tokens:** Before training, you must add SVG-specific tokens to the base model. Run the `utils/add_token.py` script, which adds these special tokens to the original model weights and initializes their embeddings based on subwords.
59
+ 4. **Start Training:** We provide example configuration scripts for the two-stage training process. You can find them at:
60
+ - **Stage 1:** `LLaMA-Factory/examples/train_full/stage_1.yaml`
61
+ - **Stage 2:** `LLaMA-Factory/examples/train_full/stage_2.yaml`
62
+
63
+ Then use `llamafactory-cli train` to start training.
64
+
65
+ ## 📖 Citation
66
+
67
+ ```BibTex
68
+ @article{wang2025internsvg,
69
+ title={InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models},
70
+ author={Wang, Haomin and Yin, Jinhui and Wei, Qi and Zeng, Wenguang and Gu, Lixin and Ye, Shenglong and Gao, Zhangwei and Wang, Yaohui and Zhang, Yanting and Li, Yuanqi and others},
71
+ journal={arXiv preprint arXiv:2510.11341},
72
+ year={2025}
73
+ }
74
+ ```