Robotics
Safetensors
vision-language-action-model
Jia-Zeng commited on
Commit
6eb3b8f
·
verified ·
1 Parent(s): 5227503

update the comparative experiments on the RoboTwin 2.0 benchmark

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
  - robotics
7
  - vision-language-action-model
8
  datasets:
9
- - InternRobotics/InternData-A1
10
  ---
11
 
12
  # InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation
@@ -31,6 +31,39 @@ Building upon InternVL3 and Qwen3-VL, we instantiate InternVLA-A1 at 2B and 3B p
31
  - [ ] [InternVLA-A1-3B-Pretrain-InternData-A1](https://huggingface.co/InternRobotics/InternVLA-A1-3B-Pretrain-InternData-A1): pretrained on InternData-A1 only
32
  - [ ] [InternVLA-A1-2B-Pretrain-InternData-A1](https://huggingface.co/InternRobotics/InternVLA-A1-2B-Pretrain-InternData-A1): pretrained on InternData-A1 only
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ## 🔑 Key Features
35
 
36
  Regarding model architecture, InternVLA-A1 employs a Mixture-of-Transformers (MoT) design to unifies scene understanding, visual foresight, and action execution into a single framework.
 
6
  - robotics
7
  - vision-language-action-model
8
  datasets:
9
+ - hxma/RoboTwin-LeRobot-v3.0
10
  ---
11
 
12
  # InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation
 
31
  - [ ] [InternVLA-A1-3B-Pretrain-InternData-A1](https://huggingface.co/InternRobotics/InternVLA-A1-3B-Pretrain-InternData-A1): pretrained on InternData-A1 only
32
  - [ ] [InternVLA-A1-2B-Pretrain-InternData-A1](https://huggingface.co/InternRobotics/InternVLA-A1-2B-Pretrain-InternData-A1): pretrained on InternData-A1 only
33
 
34
+
35
+ ## **Evaluation on RoboTwin 2.0 Simulation Benchmark**
36
+
37
+ **Setting:** All models are jointly fine-tuned across 50 tasks (50 clean + 500 randomized demos each).
38
+
39
+ **Performance Summary:** InternVLA-A1-3B achieves the highest success rates across both Easy and Hard settings on the RoboTwin 2.0 Benchmark (averaged over 50 tasks).
40
+
41
+ | Metric | $\pi_0$ | $\pi_{0.5}$ | **InternVLA-A1-3B** |
42
+ | :--- | :---: | :---: | :---: |
43
+ | **Avg. Success (Easy)** | 79.98% | 84.70% | **88.30%** 🥇 |
44
+ | **Avg. Success (Hard)** | 79.50% | 85.02% | **88.48%** 🥇 |
45
+
46
+ <details>
47
+ <summary>🔻 <b>Click to view detailed results for specific tasks</b></summary>
48
+
49
+ <br>
50
+
51
+ The table below shows success rates formatted as <code>Easy / Hard</code>.
52
+
53
+ | Task Name | $\pi_0$ | $\pi_{0.5}$ | **InternVLA-A1-3B** |
54
+ | :--- | :---: | :---: | :---: |
55
+ | **Click Bell** | 70.0% / 69.0% | **97.0%** / 93.0% | **97.0%** / **94.0%** |
56
+ | **Move Pillbottle Pad** | 83.0% / 82.0% | 92.0% / 89.0% | **95.0%** / **99.0%** |
57
+ | **Open Laptop** | 90.0% / 97.0% | 92.0% / 97.0% | **99.0%** / **99.0%** |
58
+ | **Handover Block** | 70.0% / 53.0% | 60.0% / 59.0% | **87.0%** / **81.0%** |
59
+ | **Blocks Ranking Size** | 59.0% / 57.0% | 73.0% / 77.0% | **82.0%** / **92.0%** |
60
+ | **Place Dual Shoes** | 69.0% / 76.0% | 57.0% / 65.0% | **93.0%** / **85.0%** |
61
+ | **Stamp Seal** | 62.0% / 65.0% | 66.0% / **73.0%** | **71.0%** / 71.0% |
62
+ | **Stack Bowls Three** | 81.0% / 75.0% | **88.0%** / 85.0% | 86.0% / **95.0%** |
63
+
64
+ </details>
65
+
66
+
67
  ## 🔑 Key Features
68
 
69
  Regarding model architecture, InternVLA-A1 employs a Mixture-of-Transformers (MoT) design to unifies scene understanding, visual foresight, and action execution into a single framework.