File size: 6,984 Bytes
06c11b0 0f8a584 ac91894 0f8a584 ac91894 0f8a584 06c11b0 ac91894 06c11b0 0f8a584 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | # RoboMME: A Robotic Benchmark for Memory-Augmented Manipulation

## π’ Announcements
[03/2026] We are thrilled to release RoboMME, the first large-scale robotic benchmark dedicated to memory-augmented manipulation! Spanning 4 cognitively motivated task suites with 16 carefully designed tasks, RoboMME pushes robots to remember, reason, and act.
## π¦ Installation
After cloning the repo, install [uv](https://docs.astral.sh/uv/getting-started/installation/), then run:
```bash
uv sync
uv pip install -e .
```
## π³ Gradio Docker Deployment (HF Space + GPU)
This repository also supports Docker deployment for the Gradio app entrypoint:
```bash
python3 gradio-web/main.py
```
Build image:
```bash
docker build -t robomme-gradio:gpu .
```
Run container (GPU + Vulkan for ManiSkill/SAPIEN):
```bash
docker run --rm --gpus all -p 7860:7860 robomme-gradio:gpu
```
The image sets `NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics` so the NVIDIA container runtime exposes Vulkan/graphics driver files inside the container. Without graphics capability, ManiSkill/SAPIEN may fail with `vk::createInstanceUnique: ErrorIncompatibleDriver`.
Optional metadata override:
```bash
docker run --rm --gpus all -p 7860:7860 \
-e ROBOMME_METADATA_ROOT=/home/user/app/src/robomme/env_metadata/train \
robomme-gradio:gpu
```
Notes:
- Docker deployment is focused on `gradio-web/main.py`.
- Existing `uv` workflow for training/testing remains unchanged.
- Space metadata is configured via root `README.md` with `sdk: docker` and `app_port: 7860`.
## π Quick Start
Start an environment with a specified setup:
```bash
uv run scripts/run_example.py
```
This generates a rollout video in the `sample_run_videos` directory.
We provide four action types: `joint_action`, `ee_pose`, `waypoint`, and `multi_choice`, e.g., predict continuous actions with `joint_action` or `ee_pose`, discrete waypoint actions with `waypoint`, or use `multi_choice` for VideoQA-style problems.
## π Benchmark
### π€ Tasks
We have four task suites, each with 4 tasks:
| Suite | Focus | Task ID |
| ---------- | ----------------- | --------------------------------------------------------------------- |
| Counting | Temporal memory | BinFill, PickXtimes, SwingXtimes, StopCube |
| Permanence | Spatial memory | VideoUnmask, VideoUnmaskSwap, ButtonUnmask, ButtonUnmaskSwap |
| Reference | Object memory | PickHighlight, VideoRepick, VideoPlaceButton, VideoPlaceOrder |
| Imitation | Procedural memory | MoveCube, InsertPeg, PatternLock, RouteStick |
All tasks are defined in `src/robomme/robomme_env`. A detailed description can be found in our paper appendix.
### π₯ Training Data
Training data can be downloaded [here](https://huggingface.co/datasets/Yinpei/robomme_data). There are 1,600 demonstrations in total (100 per task). The HDF5 format is described in [doc/h5_data_format.md](doc/h5_data_format.md).
After downloading, replay the dataset for a sanity check:
```bash
uv run scripts/dataset_replay.py --h5-data-dir <your_downloaded_data_dir>
```
### π Evaluation
To evaluate on the test set, set the `dataset` argument of `BenchmarkEnvBuilder`:
```python
task_id = "PickXtimes"
episode_idx = 0
env_builder = BenchmarkEnvBuilder(
env_id=task_id,
dataset="test",
...
)
env = env_builder.make_env_for_episode(episode_idx)
obs, info = env.reset() # initial step
...
obs, _, terminated, truncated, info = env.step(action) # each step
```
The train split has 100 episodes. The val/test splits each have 50 episodes. All seeds are fixed for benchmarking.
The environment input/output format is described in [doc/env_format.md](doc/env_format.md).
> Currently, environment spawning is set up only for imitation learning. We are working on extending it to support more general parallel environments for reinforcement learning in the future.
### π§ Data Generation
You can also re-generate your own HDF5 data via parallel processing using
@hongze
```bash
uv run scripts/dev/xxxx
```
## π§ Model Training
### π MME-VLA-Suite
The [MME Policy Learning](https://github.com/RoboMME/robomme_policy_learning) repo provides MME-VLA model training and evaluation used in our paper. It contains a family of memory-augmented VLA models built on [pi05](https://github.com/Physical-Intelligence/openpi) backbone and our implementation of [MemER](https://jen-pan.github.io/memer/).
### π Prior Methods
**MemER**: The [MME Policy Learning](https://github.com/RoboMME/robomme_policy_learning) repo also provides our implementation of the [MemER](https://jen-pan.github.io/memer/), using the same GroundSG policy model as in MME-VLA.
**SAM2Act+**: The [RoboMME_SAM2Act](https://github.com/RoboMME/SAM2Act) repo provides our implementation adapted from the [SAM2Act](https://github.com/sam2act/sam2act) repo.
**MemoryVLA**: The [RoboMME_MemoryVLA](https://github.com/RoboMME/MemoryVLA) repo provides our implementation adapted from the [MemoryVLA](https://github.com/shihao1895/MemoryVLA) repo.
**Diffusion Policy**: The [RoboMME_DP](https://github.com/RoboMME/DP) repo provides our implementation adapted from the [diffusion_policy](https://github.com/real-stanford/diffusion_policy) repo.
## π Submit Your Models
Want to add your model? Download the [dataset](https://huggingface.co/datasets/Yinpei/robomme_data) from Hugging Face, run evaluation using our [eval scripts](scripts/evaluation.py), then submit a PR with your results by adding `<your_model>.md` to the `doc/submission/` [directory](https://github.com/RoboMME/robomme_benchmark/tree/main/doc/submission). We will review it and update our leaderboard.
## π§ Troubleshooting
**Q1: RuntimeError: Create window failed: Renderer does not support display.**
A1: Use a physical display or set up a virtual display for GUI rendering (e.g. install a VNC server and set the `DISPLAY` variable correctly).
**Q2: Failure related to Vulkan installation.**
A2: ManiSkill/SAPIEN requires both Vulkan userspace packages inside the container and NVIDIA graphics capability exposed by the container runtime. This image installs `libvulkan1`, `vulkan-tools`, and `libglvnd-dev`, and sets `NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics`. If it still does not work, first verify the host machine itself supports Vulkan (`vulkaninfo` on the host), then switch to CPU rendering:
```python
os.environ['SAPIEN_RENDER_DEVICE'] = 'cpu'
os.environ['MUJOCO_GL'] = 'osmesa'
```
## π Acknowledgements
This work was supported in part by NSF SES-2128623, NSF CAREER #2337870, NSF NRI #2220876, NSF NAIRR250085. We would also like to thank the wonderful [OpenPi](https://github.com/Physical-Intelligence/openpi/tree/main) codebase from Physical-Intelligence.
## π Citation
```
...
```
|