Safetensors
English
File size: 1,265 Bytes
a8d8d69
 
 
 
 
 
 
1bb702e
 
 
a8d8d69
 
 
 
1bb702e
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
license: apache-2.0
language:
- en
datasets:
- nvidia/PhysicalAI-Robotics-GR00T-Teleop-Sim
---

# DIAL Checkpoints

<p align="center">
  <a href="https://xpeng-robotics.github.io/dial/"><b>Project Page</b></a> &nbsp;|&nbsp;
  <a href="https://xpeng-robotics.github.io/dial/DIAL.pdf"><b>Paper</b></a> &nbsp;|&nbsp;
  <a href="https://github.com/xpeng-robotics/DIAL"><b>Code</b></a>
</p>

Model weights for **DIAL** (**D**ecoupling **I**ntent and **A**ction via **L**atent World Modeling), an end-to-end Vision-Language-Action (VLA) framework built on [NVIDIA Isaac GR00T N1.5](https://github.com/NVIDIA/Isaac-GR00T/tree/n1.5-release) with a [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) backbone.


## Available Checkpoints

| Checkpoint | Training Data | Steps | Description |
|---|---|---|---|
| `DIAL-3B-fewshot` | EgoDex human data + 10% GR1 simulation data | 20K per stage (3-stage) | Co-trained with heterogeneous human demonstrations |
| `DIAL-3B-fulldata` | All GR1 simulation data (~24,000 demos) | 40K per stage (2-stage) | Trained on full teleoperation trajectories in simulation |

For installation, training, and evaluation instructions, please refer to the [GitHub repository](https://github.com/xpeng-robotics/DIAL).