Spaces:

HongzeFu
/

RoboMME

Running on T4

RoboMME / doc /h5_data_format.md

HF Space: code-only (no binary assets)

06c11b0 6 days ago

3.19 kB

HDF5 Training Data Format

Structure inside each record_dataset_<EnvID>.h5 file:

episode_1/
  setup/
  timestep_1/
    obs/
    action/
    info/
  timestep_2/
    obs/
    action/
    info/
  ...
...

Each episode contains:

Field	Type	Description
`seed`	`int`	Environment seed (fixed for benchmarking)
`difficulty`	`str`	Difficulty level (fixed for benchmarking)
`task_goal`	`list[str]`	Possible language goals for the task
`front_camera_intrinsic`	`float32 (3, 3)`	Front camera intrinsic matrix
`wrist_camera_intrinsic`	`float32 (3, 3)`	Wrist camera intrinsic matrix
`available_multi_choices`	`str`	Available options for the multi-choice Video-QA problem

Field	Type / shape	Description
`front_rgb`	`uint8 (512, 512, 3)`	Front camera RGB
`wrist_rgb`	`uint8 (256, 256, 3)`	Wrist camera RGB
`front_depth`	`int16 (512, 512, 1)`	Front camera depth (mm)
`wrist_depth`	`int16 (256, 256, 1)`	Wrist camera depth (mm)
`joint_state`	`float32 (7,)`	Joint positions (7 joints)
`eef_state`	`float32 (6,)`	End-effector pose `[x, y, z, roll, pitch, yaw]`
`gripper_state`	`float32 (2,)`	Gripper opening width in [0, 0.04]
`is_gripper_close`	`bool`	Whether gripper is closed
`front_camera_extrinsic`	`float32 (3, 4)`	Front camera extrinsic matrix
`wrist_camera_extrinsic`	`float32 (3, 4)`	Wrist camera extrinsic matrix

Field	Type / shape	Description
`joint_action`	`float32 (8,)`	Joint-space action: 7 joint angles + gripper
`eef_action`	`float32 (7,)`	End-effector action `[x, y, z, roll, pitch, yaw, gripper]`
`waypoint_action`	`float32 (7,)`	End-effector action at discrete time steps; a subtask may contain multiple waypoint actions. Used for data generation.
`choice_action`	`str`	JSON string for multi-choice selection with an optional grounded pixel location on the front image, e.g., `{"choice": "A", "point": [y, x]}`

In RoboMME, a gripper action of -1 means close and 1 means open.

Field	Type	Description
`simple_subgoal`	`bytes (UTF-8)`	Simple subgoal text (built-in planner view)
`simple_subgoal_online`	`bytes (UTF-8)`	Simple subgoal text (online view; may advance to the next subgoal earlier than planner view)
`grounded_subgoal`	`bytes (UTF-8)`	Grounded subgoal text (built-in planner view)
`grounded_subgoal_online`	`bytes (UTF-8)`	Grounded subgoal text (online view; may advance to the next subgoal earlier than planner view)
`is_video_demo`	`bool`	Whether this frame is from the conditioning video shown before execution
`is_subgoal_boundary`	`bool`	Whether this is a keyframe (i.e., a boundary between subtasks)
`is_completed`	`bool`	Whether the task is finished