# RoboMME: A Robotic Benchmark for Memory-Augmented Manipulation ![Robomme bench](assets/robomme_bench.jpg) ## 📢 Announcements [03/2026] We are thrilled to release RoboMME, the first large-scale robotic benchmark dedicated to memory-augmented manipulation! Spanning 4 cognitively motivated task suites with 16 carefully designed tasks, RoboMME pushes robots to remember, reason, and act. ## 📦 Installation After cloning the repo, install [uv](https://docs.astral.sh/uv/getting-started/installation/), then run: ```bash uv sync uv pip install -e . ``` ## 🐳 Gradio Docker Deployment (HF Space + GPU) This repository also supports Docker deployment for the Gradio app entrypoint: ```bash python3 gradio-web/main.py ``` Build image: ```bash docker build -t robomme-gradio:gpu . ``` Run container (GPU + Vulkan for ManiSkill/SAPIEN): ```bash docker run --rm --gpus all -p 7860:7860 robomme-gradio:gpu ``` The image sets `NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics` so the NVIDIA container runtime exposes Vulkan/graphics driver files inside the container. Without graphics capability, ManiSkill/SAPIEN may fail with `vk::createInstanceUnique: ErrorIncompatibleDriver`. Optional metadata override: ```bash docker run --rm --gpus all -p 7860:7860 \ -e ROBOMME_METADATA_ROOT=/home/user/app/src/robomme/env_metadata/train \ robomme-gradio:gpu ``` Notes: - Docker deployment is focused on `gradio-web/main.py`. - Existing `uv` workflow for training/testing remains unchanged. - Space metadata is configured via root `README.md` with `sdk: docker` and `app_port: 7860`. ## 🚀 Quick Start Start an environment with a specified setup: ```bash uv run scripts/run_example.py ``` This generates a rollout video in the `sample_run_videos` directory. We provide four action types: `joint_action`, `ee_pose`, `waypoint`, and `multi_choice`, e.g., predict continuous actions with `joint_action` or `ee_pose`, discrete waypoint actions with `waypoint`, or use `multi_choice` for VideoQA-style problems. ## 📁 Benchmark ### 🤖 Tasks We have four task suites, each with 4 tasks: | Suite | Focus | Task ID | | ---------- | ----------------- | --------------------------------------------------------------------- | | Counting | Temporal memory | BinFill, PickXtimes, SwingXtimes, StopCube | | Permanence | Spatial memory | VideoUnmask, VideoUnmaskSwap, ButtonUnmask, ButtonUnmaskSwap | | Reference | Object memory | PickHighlight, VideoRepick, VideoPlaceButton, VideoPlaceOrder | | Imitation | Procedural memory | MoveCube, InsertPeg, PatternLock, RouteStick | All tasks are defined in `src/robomme/robomme_env`. A detailed description can be found in our paper appendix. ### 📥 Training Data Training data can be downloaded [here](https://huggingface.co/datasets/Yinpei/robomme_data). There are 1,600 demonstrations in total (100 per task). The HDF5 format is described in [doc/h5_data_format.md](doc/h5_data_format.md). After downloading, replay the dataset for a sanity check: ```bash uv run scripts/dataset_replay.py --h5-data-dir ``` ### 📊 Evaluation To evaluate on the test set, set the `dataset` argument of `BenchmarkEnvBuilder`: ```python task_id = "PickXtimes" episode_idx = 0 env_builder = BenchmarkEnvBuilder( env_id=task_id, dataset="test", ... ) env = env_builder.make_env_for_episode(episode_idx) obs, info = env.reset() # initial step ... obs, _, terminated, truncated, info = env.step(action) # each step ``` The train split has 100 episodes. The val/test splits each have 50 episodes. All seeds are fixed for benchmarking. The environment input/output format is described in [doc/env_format.md](doc/env_format.md). > Currently, environment spawning is set up only for imitation learning. We are working on extending it to support more general parallel environments for reinforcement learning in the future. ### 🔧 Data Generation You can also re-generate your own HDF5 data via parallel processing using @hongze ```bash uv run scripts/dev/xxxx ``` ## 🧠 Model Training ### 🌟 MME-VLA-Suite The [MME Policy Learning](https://github.com/RoboMME/robomme_policy_learning) repo provides MME-VLA model training and evaluation used in our paper. It contains a family of memory-augmented VLA models built on [pi05](https://github.com/Physical-Intelligence/openpi) backbone and our implementation of [MemER](https://jen-pan.github.io/memer/). ### 📚 Prior Methods **MemER**: The [MME Policy Learning](https://github.com/RoboMME/robomme_policy_learning) repo also provides our implementation of the [MemER](https://jen-pan.github.io/memer/), using the same GroundSG policy model as in MME-VLA. **SAM2Act+**: The [RoboMME_SAM2Act](https://github.com/RoboMME/SAM2Act) repo provides our implementation adapted from the [SAM2Act](https://github.com/sam2act/sam2act) repo. **MemoryVLA**: The [RoboMME_MemoryVLA](https://github.com/RoboMME/MemoryVLA) repo provides our implementation adapted from the [MemoryVLA](https://github.com/shihao1895/MemoryVLA) repo. **Diffusion Policy**: The [RoboMME_DP](https://github.com/RoboMME/DP) repo provides our implementation adapted from the [diffusion_policy](https://github.com/real-stanford/diffusion_policy) repo. ## 🏆 Submit Your Models Want to add your model? Download the [dataset](https://huggingface.co/datasets/Yinpei/robomme_data) from Hugging Face, run evaluation using our [eval scripts](scripts/evaluation.py), then submit a PR with your results by adding `.md` to the `doc/submission/` [directory](https://github.com/RoboMME/robomme_benchmark/tree/main/doc/submission). We will review it and update our leaderboard. ## 🔧 Troubleshooting **Q1: RuntimeError: Create window failed: Renderer does not support display.** A1: Use a physical display or set up a virtual display for GUI rendering (e.g. install a VNC server and set the `DISPLAY` variable correctly). **Q2: Failure related to Vulkan installation.** A2: ManiSkill/SAPIEN requires both Vulkan userspace packages inside the container and NVIDIA graphics capability exposed by the container runtime. This image installs `libvulkan1`, `vulkan-tools`, and `libglvnd-dev`, and sets `NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics`. If it still does not work, first verify the host machine itself supports Vulkan (`vulkaninfo` on the host), then switch to CPU rendering: ```python os.environ['SAPIEN_RENDER_DEVICE'] = 'cpu' os.environ['MUJOCO_GL'] = 'osmesa' ``` ## 🙏 Acknowledgements This work was supported in part by NSF SES-2128623, NSF CAREER #2337870, NSF NRI #2220876, NSF NAIRR250085. We would also like to thank the wonderful [OpenPi](https://github.com/Physical-Intelligence/openpi/tree/main) codebase from Physical-Intelligence. ## 📄 Citation ``` ... ```