--- title: OXE-AugE emoji: π¦Ύ colorFrom: blue colorTo: indigo sdk: static pinned: true ---
Augmentation for OXE dataset
π Project Page Β· π Paper Β· π» GitHub Β· π¦ Twitter
--- ## TL;DR ## What we do We present **OXE-AugE**, a high-quality open-source dataset that augments **16** popular OXE datasets with **9** different robot embodiments, using a scalable robot augmentation pipeline which we call **AugE-Toolkit**. ## Why it matters We find **robot augmentation scales**. While the Open X-Embodiment (OXE) dataset aggregates demonstrations from over **60** real-world robot datasets, it is highly imbalanced: over **85%** of real trajectories come from just four robots (**Franka**, **xArm**, **Kuka iiwa**, and **Google Robot**), while many others appear in only **1β2** datasets. By diversifying the robot embodiment while preserving task and scene, **OXE-AugE** provides a new resource for training robust and transferable visuomotor policies. Through both sim and real experiments, we find that scaling robot augmentation enhances robustness, transfer, and generalization. ## Whatβs included **OXE-AugE** currently augments **16** commonly-used OXE datasets, resulting in over **4.4 million** trajectoriesβmore than **triple** the size of the original OXEβand covers **60%** of the widely-used Octo pre-training mixture. --- ## π€ Robots & Coverage **Legend:** **β** = source robotβ|β**β** = augmented demos available. *For the full, current table, see the Dashboard or dataset READMEs.* | Dataset | Panda | UR5e | Xarm7 | Google | WidowX | Sawyer | Kinova3 | IIWA | Jaco | # Episodes | | :--------------------- | :---: | :---: | :---: | :----: | :----: | :----: | :-----: | :--: | :---: | :-----: | | Berkeley AUTOLab UR5 | β | **β** | β | β | β | β | β | β | β | 1000 | | TACO Play | **β** | β | β | β | β | β | β | β | β | 3603 | | Austin BUDS | **β** | β | β | β | β | β | β | β | β | 50 | | Austin Mutex | **β** | β | β | β | β | β | β | β | β | 1500 | | Austin Sailor | **β** | β | β | β | β | β | β | β | β | 240 | | CMU Franka Pick-Insert | **β** | β | β | β | β | β | β | β | β | 631 | | KAIST Nonprehensile | **β** | β | β | β | β | β | β | β | β | 201 | | NYU Franka Play | **β** | β | β | β | β | β | β | β | β | 456 | | TOTO | **β** | β | β | β | β | β | β | β | β | 1003 | | UTokyo xArm PickPlace | β | β | **β** | β | β | β | β | β | β | 102 | | UCSD Kitchen | β | β | **β** | β | β | β | β | β | β | 150 | | Austin VIOLA | **β** | β | β | β | β | β | β | β | β | 150 | | Bridge | β | β | β | β | **β** | β | β | β | β | 28935 | | RT-1 Robot Action | β | β | β | **β** | | β | β | β | β | 87212 | | Jaco Play | β | β | β | β | β | β | β | β | **β** | 1084 | | Language Table | β | β | **β** | β | | β | β | β | β | 442226 | --- ## π¦ How to Use Below is an example showing how to load a dataset from Hugging Face using LeRobotDataset, then iterate through the first episode and print robot-specific fields for each frame. ```python import torch from lerobot.datasets.lerobot_dataset import LeRobotDataset REPO_ID = "oxe-auge/nyu_franka_play_dataset_val_augmented" ROBOT = "kuka_iiwa" EPISODE = 0 KEYS = [ f"observation.images.{ROBOT}", f"observation.{ROBOT}.joints", f"observation.{ROBOT}.ee_pose", f"observation.{ROBOT}.base_position", f"observation.{ROBOT}.base_orientation", f"observation.{ROBOT}.ee_error", ] dataset = LeRobotDataset(REPO_ID) i = 0 while i < len(dataset) and int(dataset[i]["episode_index"]) == EPISODE: sample = dataset[i] print(f"\n--- episode_index={EPISODE}, frame_index={int(sample['frame_index'])}, dataset_index={i} ---") for k in KEYS: print(f"{k}: {sample[k]}") i += 1 print(f"\nNumber of frames in episode {EPISODE}: {i}") ``` ## Check EE Error Threshold During simulator replay, a target robot (e.g., Kuka iiwa) may not be able to exactly reach the source robotβs end-effector pose due to embodiment differences (size, joint limits, etc.). This mismatch is recorded as `observation.