davenliu commited on
Commit
80cce53
·
verified ·
1 Parent(s): 985c52d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -3
README.md CHANGED
@@ -1,3 +1,63 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # AndesVL-2B-Instruct
5
+ AndesVL is a suite of mobile-optimized Multimodal Large Language Models (MLLMs) with **0.6B to 4B parameters**, built upon Qwen3's LLM and various visual encoders. Designed for efficient edge deployment, it achieves first-tier performance on diverse benchmarks including text-rich, reasoning, VQA, and GUI tasks, notably introducing AndesUI-Bench for mobile UI comprehension. Its 1+N LoRA architecture and QALFT framework facilitate efficient task adaptation and compression, maintaining performance (2% degradation) and enabling 200 tokens/s decoding with 1.7 bits/weight compression on mobile chips.
6
+
7
+ Detailed model sizes and components are provided below:
8
+
9
+ | Model | Total Parameters (B) | Visual Encoder | LLM |
10
+ |---|---|---|---|
11
+ | AndesVL-0.6B | 0.695 | SigLIP2-Base | Qwen3-0.6B |
12
+ | AndesVL-1B | 0.927 | AIMv2-Large | Qwen3-0.6B |
13
+ | **AndesVL-2B** | 2.055 | AIMv2-Large | Qwen3-1.7B|
14
+ | AndesVL-4B | 4.360 | AIMv2-Large | Qwen3-4B |
15
+
16
+ # Quick Start
17
+ ```commandline
18
+ # require transformers>=4.52.4
19
+
20
+ import torch
21
+ from transformers import AutoModel, AutoTokenizer, CLIPImageProcessor
22
+
23
+ model_dir = "OPPOer/AndesVL-2B-Instruct"
24
+
25
+ model = AutoModel.from_pretrained(model_dir, trust_remote_code=True,torch_dtype=torch.bfloat16).cuda()
26
+ tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
27
+ image_processor = CLIPImageProcessor.from_pretrained(model_dir, trust_remote_code=True)
28
+
29
+ messages = [
30
+ {
31
+ "role": "user",
32
+ "content": [
33
+ {"type": "text", "text": "描述这张图片。"},
34
+ {
35
+ "type": "image_url",
36
+ "image_url": {
37
+ "url": "https://i-blog.csdnimg.cn/blog_migrate/2f4c88e71f7eabe46d062d2f1ec77d10.jpeg" # image/to/path
38
+ },
39
+ }
40
+ ],
41
+ },
42
+ ]
43
+ res = model.chat(messages, tokenizer, image_processor, max_new_tokens=1024, do_sample=True, temperature=0.6)
44
+ print(res)
45
+ ```
46
+
47
+ # Citation
48
+ If you find our work helpful, feel free to give us a cite.
49
+
50
+ ```
51
+ @article{andesvl2025jin,
52
+ title={AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model},
53
+ author={Zhiwei Jin, NanWang, Yafei Liu, Chao Li, Yuqing Qiu, Xin Li, Ruichen Wang,
54
+ Zhihao Li, Qi Qi, Xiaohui Song, Ke Chen, Huafei Li, ChuangchuangWang, Kai Tang,
55
+ Zhiguang Zhu, Wenmei Gao, Rui Wang, Jun Wu, Chao Liu, Qin Xie
56
+ Chen Chen∗, and Haonan Lu∗},
57
+ journal={arXiv preprint arXiv:*****},
58
+ year={2025}
59
+ }
60
+ ```
61
+
62
+ # Acknowledge
63
+ We are very grateful for the efforts of the [Qwen](https://huggingface.co/Qwen), [AimV2](https://huggingface.co/apple/aimv2-large-patch14-224) and [Siglip 2](https://arxiv.org/abs/2502.14786) projects.