Spaces:

HongzeFu
/

RoboMME

Running on T4

App Files Files Community

RoboMME / human_readme.md

HongzeFu

v2 vulkan

ac91894 5 days ago

preview code

raw

history blame contribute delete

6.98 kB

	# RoboMME: A Robotic Benchmark for Memory-Augmented Manipulation

	![Robomme bench](assets/robomme_bench.jpg)

	## 📢 Announcements

	[03/2026] We are thrilled to release RoboMME, the first large-scale robotic benchmark dedicated to memory-augmented manipulation! Spanning 4 cognitively motivated task suites with 16 carefully designed tasks, RoboMME pushes robots to remember, reason, and act.

	## 📦 Installation

	After cloning the repo, install [uv](https://docs.astral.sh/uv/getting-started/installation/), then run:

	```bash
	uv sync
	uv pip install -e .
	```

	## 🐳 Gradio Docker Deployment (HF Space + GPU)

	This repository also supports Docker deployment for the Gradio app entrypoint:

	```bash
	python3 gradio-web/main.py
	```

	Build image:

	```bash
	docker build -t robomme-gradio:gpu .
	```

	Run container (GPU + Vulkan for ManiSkill/SAPIEN):

	```bash
	docker run --rm --gpus all -p 7860:7860 robomme-gradio:gpu
	```

	The image sets `NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics` so the NVIDIA container runtime exposes Vulkan/graphics driver files inside the container. Without graphics capability, ManiSkill/SAPIEN may fail with `vk::createInstanceUnique: ErrorIncompatibleDriver`.

	Optional metadata override:

	```bash
	docker run --rm --gpus all -p 7860:7860 \
	-e ROBOMME_METADATA_ROOT=/home/user/app/src/robomme/env_metadata/train \
	robomme-gradio:gpu
	```

	Notes:
	- Docker deployment is focused on `gradio-web/main.py`.
	- Existing `uv` workflow for training/testing remains unchanged.
	- Space metadata is configured via root `README.md` with `sdk: docker` and `app_port: 7860`.

	## 🚀 Quick Start

	Start an environment with a specified setup:

	```bash
	uv run scripts/run_example.py
	```

	This generates a rollout video in the `sample_run_videos` directory.

	We provide four action types: `joint_action`, `ee_pose`, `waypoint`, and `multi_choice`, e.g., predict continuous actions with `joint_action` or `ee_pose`, discrete waypoint actions with `waypoint`, or use `multi_choice` for VideoQA-style problems.

	## 📁 Benchmark

	### 🤖 Tasks

	We have four task suites, each with 4 tasks:

	\| Suite \| Focus \| Task ID \|
	\| ---------- \| ----------------- \| --------------------------------------------------------------------- \|
	\| Counting \| Temporal memory \| BinFill, PickXtimes, SwingXtimes, StopCube \|
	\| Permanence \| Spatial memory \| VideoUnmask, VideoUnmaskSwap, ButtonUnmask, ButtonUnmaskSwap \|
	\| Reference \| Object memory \| PickHighlight, VideoRepick, VideoPlaceButton, VideoPlaceOrder \|
	\| Imitation \| Procedural memory \| MoveCube, InsertPeg, PatternLock, RouteStick \|

	All tasks are defined in `src/robomme/robomme_env`. A detailed description can be found in our paper appendix.

	### 📥 Training Data

	Training data can be downloaded [here](https://huggingface.co/datasets/Yinpei/robomme_data). There are 1,600 demonstrations in total (100 per task). The HDF5 format is described in [doc/h5_data_format.md](doc/h5_data_format.md).

	After downloading, replay the dataset for a sanity check:

	```bash
	uv run scripts/dataset_replay.py --h5-data-dir <your_downloaded_data_dir>
	```

	### 📊 Evaluation

	To evaluate on the test set, set the `dataset` argument of `BenchmarkEnvBuilder`:

	```python
	task_id = "PickXtimes"
	episode_idx = 0
	env_builder = BenchmarkEnvBuilder(
	env_id=task_id,
	dataset="test",
	...
	)

	env = env_builder.make_env_for_episode(episode_idx)
	obs, info = env.reset() # initial step
	...
	obs, _, terminated, truncated, info = env.step(action) # each step
	```
	The train split has 100 episodes. The val/test splits each have 50 episodes. All seeds are fixed for benchmarking.

	The environment input/output format is described in [doc/env_format.md](doc/env_format.md).

	> Currently, environment spawning is set up only for imitation learning. We are working on extending it to support more general parallel environments for reinforcement learning in the future.

	### 🔧 Data Generation

	You can also re-generate your own HDF5 data via parallel processing using
	@hongze
	```bash
	uv run scripts/dev/xxxx
	```


	## 🧠 Model Training

	### 🌟 MME-VLA-Suite

	The [MME Policy Learning](https://github.com/RoboMME/robomme_policy_learning) repo provides MME-VLA model training and evaluation used in our paper. It contains a family of memory-augmented VLA models built on [pi05](https://github.com/Physical-Intelligence/openpi) backbone and our implementation of [MemER](https://jen-pan.github.io/memer/).

	### 📚 Prior Methods

	MemER: The [MME Policy Learning](https://github.com/RoboMME/robomme_policy_learning) repo also provides our implementation of the [MemER](https://jen-pan.github.io/memer/), using the same GroundSG policy model as in MME-VLA.

	SAM2Act+: The [RoboMME_SAM2Act](https://github.com/RoboMME/SAM2Act) repo provides our implementation adapted from the [SAM2Act](https://github.com/sam2act/sam2act) repo.

	MemoryVLA: The [RoboMME_MemoryVLA](https://github.com/RoboMME/MemoryVLA) repo provides our implementation adapted from the [MemoryVLA](https://github.com/shihao1895/MemoryVLA) repo.

	Diffusion Policy: The [RoboMME_DP](https://github.com/RoboMME/DP) repo provides our implementation adapted from the [diffusion_policy](https://github.com/real-stanford/diffusion_policy) repo.



	## 🏆 Submit Your Models
	Want to add your model? Download the [dataset](https://huggingface.co/datasets/Yinpei/robomme_data) from Hugging Face, run evaluation using our [eval scripts](scripts/evaluation.py), then submit a PR with your results by adding `<your_model>.md` to the `doc/submission/` [directory](https://github.com/RoboMME/robomme_benchmark/tree/main/doc/submission). We will review it and update our leaderboard.


	## 🔧 Troubleshooting

	Q1: RuntimeError: Create window failed: Renderer does not support display.

	A1: Use a physical display or set up a virtual display for GUI rendering (e.g. install a VNC server and set the `DISPLAY` variable correctly).

	Q2: Failure related to Vulkan installation.

	A2: ManiSkill/SAPIEN requires both Vulkan userspace packages inside the container and NVIDIA graphics capability exposed by the container runtime. This image installs `libvulkan1`, `vulkan-tools`, and `libglvnd-dev`, and sets `NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics`. If it still does not work, first verify the host machine itself supports Vulkan (`vulkaninfo` on the host), then switch to CPU rendering:

	```python
	os.environ['SAPIEN_RENDER_DEVICE'] = 'cpu'
	os.environ['MUJOCO_GL'] = 'osmesa'
	```


	## 🙏 Acknowledgements

	This work was supported in part by NSF SES-2128623, NSF CAREER #2337870, NSF NRI #2220876, NSF NAIRR250085. We would also like to thank the wonderful [OpenPi](https://github.com/Physical-Intelligence/openpi/tree/main) codebase from Physical-Intelligence.


	## 📄 Citation

	```
	...
	```