OPPOer
/

AndesVL-4B-Thinking

Image-Text-to-Text

andesvl-aimv2-qwen3

feature-extraction

Model card Files Files and versions

davenliu commited on Oct 15, 2025

Commit

6f57083

·

verified ·

1 Parent(s): e30390b

Update README.md

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -1,8 +1,5 @@
 ---
 license: apache-2.0
-base_model:
-- Qwen/Qwen3-4B-Thinking-2507
-- apple/aimv2-large-patch14-448
 pipeline_tag: image-text-to-text
 library_name: transformers
 ---
@@ -11,6 +8,7 @@ library_name: transformers
   <h1>AndesVL-4B-Thinking</h1>
 <a href='https://arxiv.org/abs/2510.11496'><img src='https://img.shields.io/badge/arXiv-2510.11496-b31b1b.svg'></a> &nbsp;
 <a href='https://huggingface.co/OPPOer'><img src='https://img.shields.io/badge/🤗%20HuggingFace-AndesVL-ffd21f.svg'></a>
 </div>
 AndesVL is a suite of mobile-optimized Multimodal Large Language Models (MLLMs) with **0.6B to 4B parameters**, built upon Qwen3's LLM and various visual encoders. Designed for efficient edge deployment, it achieves first-tier performance on diverse benchmarks, including those for text-rich tasks, reasoning tasks, Visual Question Answering (VQA), multi-image tasks, multilingual tasks, and GUI tasks. Its "1+N" LoRA architecture and QALFT framework facilitate efficient task adaptation and model compression, enabling a 6.7x peak decoding speedup and a 1.8 bits-per-weight compression ratio on mobile chips.

 ---
 license: apache-2.0
 pipeline_tag: image-text-to-text
 library_name: transformers
 ---
   <h1>AndesVL-4B-Thinking</h1>
 <a href='https://arxiv.org/abs/2510.11496'><img src='https://img.shields.io/badge/arXiv-2510.11496-b31b1b.svg'></a> &nbsp;
 <a href='https://huggingface.co/OPPOer'><img src='https://img.shields.io/badge/🤗%20HuggingFace-AndesVL-ffd21f.svg'></a>
+<a href='https://github.com/OPPO-Mente-Lab/AndesVL_Evaluation'><img src="https://img.shields.io/badge/GitHub-OPPOer-blue.svg?logo=github" alt="GitHub"></a>
 </div>
 AndesVL is a suite of mobile-optimized Multimodal Large Language Models (MLLMs) with **0.6B to 4B parameters**, built upon Qwen3's LLM and various visual encoders. Designed for efficient edge deployment, it achieves first-tier performance on diverse benchmarks, including those for text-rich tasks, reasoning tasks, Visual Question Answering (VQA), multi-image tasks, multilingual tasks, and GUI tasks. Its "1+N" LoRA architecture and QALFT framework facilitate efficient task adaptation and model compression, enabling a 6.7x peak decoding speedup and a 1.8 bits-per-weight compression ratio on mobile chips.