# Environment Input/Output On RoboMME, a key difference from traditional Gym-like envs is that every observation value is a **list** rather than a single item. This is because some RoboMME tasks use conditioning video input, and for discrete action types (e.g. waypoint or multi_choice) we also return intermediate observations for potential use with video-based policy models. ## Env Input Format We support four `ACTION_SPACE` types: - `joint_angle`: 7 joint angles + gripper open/close - `ee_pose`: 3 position (xyz) + 3 rotation (rpy) + gripper open/close - `waypoint`: Same format as ee_pose, but executed in discrete keyframe steps - `multi_choice`: Command dict, e.g. `{"choice": "A", "point": [y, x]}`; the total choices can be found in `info["available_multi_choices"]`, where the `point` is the pixel location on the front image. this action is designed for Video-QA research. Note: Gripper closed is -1, gripper open is 1. ## Env Output Format When calling the `step` function: ```python obs, reward, terminated, truncated, info = env.step(action) ``` | Return | Description | Typical type | |--------|-------------|--------------| | `obs` | Observation dict | `dict[str, list]` | | `info` | Info dict | `dict[str, Any]` | | `reward` | Reward value (not used) | scalar tensor | | `terminated` | Termination flag | scalar boolean tensor | | `truncated` | Truncation flag | scalar boolean tensor | ### `obs` dict | Key | Meaning | Typical content | |-----|---------|-----------------| | `maniskill_obs` | The original raw env observation from ManiSkill | Raw observation dict | | `front_rgb_list` | Front camera RGB List | Image frames, e.g. `(H, W, 3)` | | `wrist_rgb_list` | Wrist camera RGB List | Image frames, e.g. `(H, W, 3)` | | `front_depth_list` | Front camera depth List | Depth map, e.g. `(H, W, 1)` | | `wrist_depth_list` | Wrist camera depth List | Depth map, e.g. `(H, W, 1)` | | `eef_state_list` | End-effector state List | `[x, y, z, roll, pitch, yaw]` | | `joint_state_list` | Robot joint state List | Joint vector, often 7-D | | `gripper_state_list` | Robot gripper state List | 2-D | | `front_camera_extrinsic_list` | Front camera extrinsic List | Camera extrinsic matrix | | `wrist_camera_extrinsic_list` | Wrist camera extrinsic List | Camera extrinsic matrix | To use only the current (latest) observation, use `obs[key][-1]`. ### Optional field switches (`include_*`) `BenchmarkEnvBuilder.make_env_for_episode(...)` controls optional observation/info fields through `include_*` flags. Default behavior: - All `include_*` flags default to `False`. - Without extra flags, env returns RGB + state related fields only. Mapping: | Flag | Added key | |------|-----------| | `include_maniskill_obs` | `obs["maniskill_obs"]` | | `include_front_depth` | `obs["front_depth_list"]` | | `include_wrist_depth` | `obs["wrist_depth_list"]` | | `include_front_camera_extrinsic` | `obs["front_camera_extrinsic_list"]` | | `include_wrist_camera_extrinsic` | `obs["wrist_camera_extrinsic_list"]` | | `include_available_multi_choices` | `info["available_multi_choices"]` | | `include_front_camera_intrinsic` | `info["front_camera_intrinsic"]` | | `include_wrist_camera_intrinsic` | `info["wrist_camera_intrinsic"]` | Special case: - If `action_space="multi_choice"`, front camera parameters are forced on internally: - `front_camera_extrinsic_list` - `front_camera_intrinsic` Even if the corresponding `include_front_camera_*` flags are `False`. Example: ```python from robomme.env_record_wrapper import BenchmarkEnvBuilder builder = BenchmarkEnvBuilder( env_id="VideoUnmaskSwap", dataset="test", action_space="joint_angle", gui_render=False, ) env = builder.make_env_for_episode( episode_idx=0, max_steps=1000, include_maniskill_obs=False, include_front_depth=True, include_wrist_depth=False, include_front_camera_extrinsic=True, include_wrist_camera_extrinsic=False, include_available_multi_choices=False, include_front_camera_intrinsic=True, include_wrist_camera_intrinsic=False, ) obs, info = env.reset() ``` ### `info` dict | Key | Meaning | Typical content | |-----|---------|-----------------| | `task_goal` | Task goal list | `list[str]` | | `simple_subgoal_online` | Oracle online simple subgoal | Description of the current simple subgoal | | `grounded_subgoal_online` | Oracle online grounded subgoal | Description of the current grounded subgoal | | `available_multi_choices` | Current available options for multi-choice action | List of e.g. `{"label: "a/b/...", "action": str, "need_parameter": bool}`, need_parameter means this action needs grounding info like `[y, x]` | | `front_camera_intrinsic` | Front camera intrinsic | Camera intrinsic matrix | | `wrist_camera_intrinsic` | Wrist camera intrinsic | Camera intrinsic matrix | | `status` | Status flag | One of `success`, `fail`, `timeout`, `ongoing`, `error` |