| # RoboMME: A Robotic Benchmark for Memory-Augmented Manipulation | |
|  | |
| ## π’ Announcements | |
| [03/2026] We are thrilled to release RoboMME, the first large-scale robotic benchmark dedicated to memory-augmented manipulation! Spanning 4 cognitively motivated task suites with 16 carefully designed tasks, RoboMME pushes robots to remember, reason, and act. | |
| ## π¦ Installation | |
| After cloning the repo, install [uv](https://docs.astral.sh/uv/getting-started/installation/), then run: | |
| ```bash | |
| uv sync | |
| uv pip install -e . | |
| ``` | |
| ## π³ Gradio Docker Deployment (HF Space + GPU) | |
| This repository also supports Docker deployment for the Gradio app entrypoint: | |
| ```bash | |
| python3 gradio-web/main.py | |
| ``` | |
| Build image: | |
| ```bash | |
| docker build -t robomme-gradio:gpu . | |
| ``` | |
| Run container (GPU + Vulkan for ManiSkill/SAPIEN): | |
| ```bash | |
| docker run --rm --gpus all -p 7860:7860 robomme-gradio:gpu | |
| ``` | |
| The image sets `NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics` so the NVIDIA container runtime exposes Vulkan/graphics driver files inside the container. Without graphics capability, ManiSkill/SAPIEN may fail with `vk::createInstanceUnique: ErrorIncompatibleDriver`. | |
| Optional metadata override: | |
| ```bash | |
| docker run --rm --gpus all -p 7860:7860 \ | |
| -e ROBOMME_METADATA_ROOT=/home/user/app/src/robomme/env_metadata/train \ | |
| robomme-gradio:gpu | |
| ``` | |
| Notes: | |
| - Docker deployment is focused on `gradio-web/main.py`. | |
| - Existing `uv` workflow for training/testing remains unchanged. | |
| - Space metadata is configured via root `README.md` with `sdk: docker` and `app_port: 7860`. | |
| ## π Quick Start | |
| Start an environment with a specified setup: | |
| ```bash | |
| uv run scripts/run_example.py | |
| ``` | |
| This generates a rollout video in the `sample_run_videos` directory. | |
| We provide four action types: `joint_action`, `ee_pose`, `waypoint`, and `multi_choice`, e.g., predict continuous actions with `joint_action` or `ee_pose`, discrete waypoint actions with `waypoint`, or use `multi_choice` for VideoQA-style problems. | |
| ## π Benchmark | |
| ### π€ Tasks | |
| We have four task suites, each with 4 tasks: | |
| | Suite | Focus | Task ID | | |
| | ---------- | ----------------- | --------------------------------------------------------------------- | | |
| | Counting | Temporal memory | BinFill, PickXtimes, SwingXtimes, StopCube | | |
| | Permanence | Spatial memory | VideoUnmask, VideoUnmaskSwap, ButtonUnmask, ButtonUnmaskSwap | | |
| | Reference | Object memory | PickHighlight, VideoRepick, VideoPlaceButton, VideoPlaceOrder | | |
| | Imitation | Procedural memory | MoveCube, InsertPeg, PatternLock, RouteStick | | |
| All tasks are defined in `src/robomme/robomme_env`. A detailed description can be found in our paper appendix. | |
| ### π₯ Training Data | |
| Training data can be downloaded [here](https://huggingface.co/datasets/Yinpei/robomme_data). There are 1,600 demonstrations in total (100 per task). The HDF5 format is described in [doc/h5_data_format.md](doc/h5_data_format.md). | |
| After downloading, replay the dataset for a sanity check: | |
| ```bash | |
| uv run scripts/dataset_replay.py --h5-data-dir <your_downloaded_data_dir> | |
| ``` | |
| ### π Evaluation | |
| To evaluate on the test set, set the `dataset` argument of `BenchmarkEnvBuilder`: | |
| ```python | |
| task_id = "PickXtimes" | |
| episode_idx = 0 | |
| env_builder = BenchmarkEnvBuilder( | |
| env_id=task_id, | |
| dataset="test", | |
| ... | |
| ) | |
| env = env_builder.make_env_for_episode(episode_idx) | |
| obs, info = env.reset() # initial step | |
| ... | |
| obs, _, terminated, truncated, info = env.step(action) # each step | |
| ``` | |
| The train split has 100 episodes. The val/test splits each have 50 episodes. All seeds are fixed for benchmarking. | |
| The environment input/output format is described in [doc/env_format.md](doc/env_format.md). | |
| > Currently, environment spawning is set up only for imitation learning. We are working on extending it to support more general parallel environments for reinforcement learning in the future. | |
| ### π§ Data Generation | |
| You can also re-generate your own HDF5 data via parallel processing using | |
| @hongze | |
| ```bash | |
| uv run scripts/dev/xxxx | |
| ``` | |
| ## π§ Model Training | |
| ### π MME-VLA-Suite | |
| The [MME Policy Learning](https://github.com/RoboMME/robomme_policy_learning) repo provides MME-VLA model training and evaluation used in our paper. It contains a family of memory-augmented VLA models built on [pi05](https://github.com/Physical-Intelligence/openpi) backbone and our implementation of [MemER](https://jen-pan.github.io/memer/). | |
| ### π Prior Methods | |
| **MemER**: The [MME Policy Learning](https://github.com/RoboMME/robomme_policy_learning) repo also provides our implementation of the [MemER](https://jen-pan.github.io/memer/), using the same GroundSG policy model as in MME-VLA. | |
| **SAM2Act+**: The [RoboMME_SAM2Act](https://github.com/RoboMME/SAM2Act) repo provides our implementation adapted from the [SAM2Act](https://github.com/sam2act/sam2act) repo. | |
| **MemoryVLA**: The [RoboMME_MemoryVLA](https://github.com/RoboMME/MemoryVLA) repo provides our implementation adapted from the [MemoryVLA](https://github.com/shihao1895/MemoryVLA) repo. | |
| **Diffusion Policy**: The [RoboMME_DP](https://github.com/RoboMME/DP) repo provides our implementation adapted from the [diffusion_policy](https://github.com/real-stanford/diffusion_policy) repo. | |
| ## π Submit Your Models | |
| Want to add your model? Download the [dataset](https://huggingface.co/datasets/Yinpei/robomme_data) from Hugging Face, run evaluation using our [eval scripts](scripts/evaluation.py), then submit a PR with your results by adding `<your_model>.md` to the `doc/submission/` [directory](https://github.com/RoboMME/robomme_benchmark/tree/main/doc/submission). We will review it and update our leaderboard. | |
| ## π§ Troubleshooting | |
| **Q1: RuntimeError: Create window failed: Renderer does not support display.** | |
| A1: Use a physical display or set up a virtual display for GUI rendering (e.g. install a VNC server and set the `DISPLAY` variable correctly). | |
| **Q2: Failure related to Vulkan installation.** | |
| A2: ManiSkill/SAPIEN requires both Vulkan userspace packages inside the container and NVIDIA graphics capability exposed by the container runtime. This image installs `libvulkan1`, `vulkan-tools`, and `libglvnd-dev`, and sets `NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics`. If it still does not work, first verify the host machine itself supports Vulkan (`vulkaninfo` on the host), then switch to CPU rendering: | |
| ```python | |
| os.environ['SAPIEN_RENDER_DEVICE'] = 'cpu' | |
| os.environ['MUJOCO_GL'] = 'osmesa' | |
| ``` | |
| ## π Acknowledgements | |
| This work was supported in part by NSF SES-2128623, NSF CAREER #2337870, NSF NRI #2220876, NSF NAIRR250085. We would also like to thank the wonderful [OpenPi](https://github.com/Physical-Intelligence/openpi/tree/main) codebase from Physical-Intelligence. | |
| ## π Citation | |
| ``` | |
| ... | |
| ``` | |