Spaces:

HongzeFu
/

RoboMME

Running on T4

App Files Files Community

RoboMME / human_readme.md

HongzeFu

v2 vulkan

ac91894 5 days ago

preview code

raw

history blame contribute delete

6.98 kB

RoboMME: A Robotic Benchmark for Memory-Augmented Manipulation

📢 Announcements

[03/2026] We are thrilled to release RoboMME, the first large-scale robotic benchmark dedicated to memory-augmented manipulation! Spanning 4 cognitively motivated task suites with 16 carefully designed tasks, RoboMME pushes robots to remember, reason, and act.

📦 Installation

After cloning the repo, install uv, then run:

uv sync
uv pip install -e .

🐳 Gradio Docker Deployment (HF Space + GPU)

This repository also supports Docker deployment for the Gradio app entrypoint:

python3 gradio-web/main.py

Build image:

docker build -t robomme-gradio:gpu .

Run container (GPU + Vulkan for ManiSkill/SAPIEN):

docker run --rm --gpus all -p 7860:7860 robomme-gradio:gpu

The image sets NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics so the NVIDIA container runtime exposes Vulkan/graphics driver files inside the container. Without graphics capability, ManiSkill/SAPIEN may fail with vk::createInstanceUnique: ErrorIncompatibleDriver.

Optional metadata override:

docker run --rm --gpus all -p 7860:7860 \
  -e ROBOMME_METADATA_ROOT=/home/user/app/src/robomme/env_metadata/train \
  robomme-gradio:gpu

Notes:

Docker deployment is focused on gradio-web/main.py.
Existing uv workflow for training/testing remains unchanged.
Space metadata is configured via root README.md with sdk: docker and app_port: 7860.

🚀 Quick Start

Start an environment with a specified setup:

uv run scripts/run_example.py

This generates a rollout video in the sample_run_videos directory.

We provide four action types: joint_action, ee_pose, waypoint, and multi_choice, e.g., predict continuous actions with joint_action or ee_pose, discrete waypoint actions with waypoint, or use multi_choice for VideoQA-style problems.

📁 Benchmark

🤖 Tasks

We have four task suites, each with 4 tasks:

Suite	Focus	Task ID
Counting	Temporal memory	BinFill, PickXtimes, SwingXtimes, StopCube
Permanence	Spatial memory	VideoUnmask, VideoUnmaskSwap, ButtonUnmask, ButtonUnmaskSwap
Reference	Object memory	PickHighlight, VideoRepick, VideoPlaceButton, VideoPlaceOrder
Imitation	Procedural memory	MoveCube, InsertPeg, PatternLock, RouteStick

All tasks are defined in src/robomme/robomme_env. A detailed description can be found in our paper appendix.

📥 Training Data

Training data can be downloaded here. There are 1,600 demonstrations in total (100 per task). The HDF5 format is described in doc/h5_data_format.md.

After downloading, replay the dataset for a sanity check:

uv run scripts/dataset_replay.py --h5-data-dir <your_downloaded_data_dir>

📊 Evaluation

To evaluate on the test set, set the dataset argument of BenchmarkEnvBuilder:

task_id = "PickXtimes"
episode_idx = 0
env_builder = BenchmarkEnvBuilder(
    env_id=task_id,
    dataset="test",
    ...
)

env = env_builder.make_env_for_episode(episode_idx)
obs, info = env.reset() # initial step
...
obs, _, terminated, truncated, info = env.step(action) # each step

The train split has 100 episodes. The val/test splits each have 50 episodes. All seeds are fixed for benchmarking.

The environment input/output format is described in doc/env_format.md.

Currently, environment spawning is set up only for imitation learning. We are working on extending it to support more general parallel environments for reinforcement learning in the future.

🔧 Data Generation

You can also re-generate your own HDF5 data via parallel processing using @hongze

uv run scripts/dev/xxxx

🧠 Model Training

🌟 MME-VLA-Suite

The MME Policy Learning repo provides MME-VLA model training and evaluation used in our paper. It contains a family of memory-augmented VLA models built on pi05 backbone and our implementation of MemER.

📚 Prior Methods

MemER: The MME Policy Learning repo also provides our implementation of the MemER, using the same GroundSG policy model as in MME-VLA.

SAM2Act+: The RoboMME_SAM2Act repo provides our implementation adapted from the SAM2Act repo.

MemoryVLA: The RoboMME_MemoryVLA repo provides our implementation adapted from the MemoryVLA repo.

Diffusion Policy: The RoboMME_DP repo provides our implementation adapted from the diffusion_policy repo.

🏆 Submit Your Models

Want to add your model? Download the dataset from Hugging Face, run evaluation using our eval scripts, then submit a PR with your results by adding <your_model>.md to the doc/submission/ directory. We will review it and update our leaderboard.

🔧 Troubleshooting

Q1: RuntimeError: Create window failed: Renderer does not support display.

A1: Use a physical display or set up a virtual display for GUI rendering (e.g. install a VNC server and set the DISPLAY variable correctly).

Q2: Failure related to Vulkan installation.

A2: ManiSkill/SAPIEN requires both Vulkan userspace packages inside the container and NVIDIA graphics capability exposed by the container runtime. This image installs libvulkan1, vulkan-tools, and libglvnd-dev, and sets NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics. If it still does not work, first verify the host machine itself supports Vulkan (vulkaninfo on the host), then switch to CPU rendering:

os.environ['SAPIEN_RENDER_DEVICE'] = 'cpu'
os.environ['MUJOCO_GL'] = 'osmesa'

🙏 Acknowledgements

This work was supported in part by NSF SES-2128623, NSF CAREER #2337870, NSF NRI #2220876, NSF NAIRR250085. We would also like to thank the wonderful OpenPi codebase from Physical-Intelligence.

📄 Citation

...