InternVLA-A1-3B / README.md

Jia-Zeng

update readme

f9a4d11 verified 4 days ago

preview code

raw

history blame contribute delete

8.68 kB

metadata

license: cc-by-nc-sa-4.0
base_model:
  - Qwen/Qwen3-VL-2B-Instruct
tags:
  - robotics
  - vision-language-action-model
datasets:
  - InternRobotics/InternData-A1

InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation

InternVLA-A1 integrates understanding, generation, and action experts into a unified model, which synergizes MLLMs' semantic reasoning with world-model-style dynamics prediction to guide action execution.

Building upon InternVL3 and Qwen3-VL, we instantiate InternVLA-A1 at 2B and 3B parameter scales. Covering different model scales and pre-training data configurations, we release the InternVLA-A1 series:

InternVLA-A1-3B: pretrained on the large-scale, high-fidelity simulation data InternData-A1, together with open-source robot data (e.g. Agibot-World)
InternVLA-A1-3B-Pretrain-InternData-A1: pretrained on InternData-A1 only
InternVLA-A1-2B-Pretrain-InternData-A1: pretrained on InternData-A1 only

🔑 Key Features

Regarding model architecture, InternVLA-A1 employs a Mixture-of-Transformers (MoT) design to unifies scene understanding, visual foresight, and action execution into a single framework. It synergizes MLLM's semantic understanding with world-model-style dynamic prediction, to "imagine" the future and guide adaptive actions.

Regarding training data, We pre-train InternVLA-A1 on hybrid synthetic-real datasets spanning InternData-A1 and open-source real-world data (e.g. Agibot-World). Our hybrid synthetic-real pre-training strategy combines the scene diversity of simulation with the physical fidelity of real-world data.

Usage

Please refer to our official repo InternVLA-A1.

Demonstrations

⚡ Dynamic Manipulation

InternVLA-A1 exhibits exceptional robustness in highly dynamic scenarios.

🤖 Daily tasks

InternVLA-A1 also demonstrates superior proficiency in dexterous and fine-grained manipulation.

License and Citation

All the code within this repo are under CC BY-NC-SA 4.0. Please consider citing our project if it helps your research.

@article{contributors2026internvla_a1,
  title={InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation},
  author={InternVLA-A1 contributors},
  journal={arXiv preprint arXiv:2601.02456},
  year={2026}
}

InternRobotics
/

InternVLA-A1-3B