File size: 7,018 Bytes
06c11b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59a743a
0f8a584
 
 
 
 
 
 
 
 
 
59a743a
0f8a584
 
59a743a
0f8a584
 
59a743a
0f8a584
 
59a743a
ac91894
0f8a584
 
 
59a743a
0f8a584
59a743a
0f8a584
 
 
 
 
 
 
06c11b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59a743a
06c11b0
59a743a
06c11b0
 
59a743a
 
2385631
31ade5b
039d8b3
59a743a
039d8b3
06c11b0
 
 
 
 
 
 
 
 
 
 
 
0f8a584
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# RoboMME: A Robotic Benchmark for Memory-Augmented Manipulation

![Robomme bench](assets/robomme_bench.jpg)

## πŸ“’ Announcements

[03/2026] We are thrilled to release RoboMME, the first large-scale robotic benchmark dedicated to memory-augmented manipulation! Spanning 4 cognitively motivated task suites with 16 carefully designed tasks, RoboMME pushes robots to remember, reason, and act.

## πŸ“¦ Installation

After cloning the repo, install [uv](https://docs.astral.sh/uv/getting-started/installation/), then run:

```bash
uv sync
uv pip install -e .
```

## 🐳 Gradio Docker Deployment (HF Space CPU-only)

This repository also supports Docker deployment for the Gradio app entrypoint:

```bash
python3 gradio-web/main.py
```

Build image:

```bash
docker build -t robomme-gradio:cpu .
```

Run container:

```bash
docker run --rm -p 7860:7860 robomme-gradio:cpu
```

The container forces CPU-only ManiSkill/SAPIEN backends and does not require NVIDIA runtime or `--gpus all`, which keeps it aligned with Hugging Face Docker Spaces CPU deployments.

Optional metadata override:

```bash
docker run --rm -p 7860:7860 \
  -e ROBOMME_METADATA_ROOT=/home/user/app/src/robomme/env_metadata/train \
  robomme-gradio:cpu
```

Notes:
- Docker deployment is focused on `gradio-web/main.py`.
- Existing `uv` workflow for training/testing remains unchanged.
- Space metadata is configured via root `README.md` with `sdk: docker` and `app_port: 7860`.

## πŸš€ Quick Start

Start an environment with a specified setup:

```bash
uv run scripts/run_example.py
```

This generates a rollout video in the `sample_run_videos` directory.

We provide four action types: `joint_action`, `ee_pose`, `waypoint`, and `multi_choice`, e.g., predict continuous actions with `joint_action` or `ee_pose`, discrete waypoint actions with `waypoint`, or use `multi_choice` for VideoQA-style problems.

## πŸ“ Benchmark

### πŸ€– Tasks

We have four task suites, each with 4 tasks:

| Suite      | Focus             | Task ID                                                                 |
| ---------- | ----------------- | --------------------------------------------------------------------- |
| Counting   | Temporal memory   | BinFill, PickXtimes, SwingXtimes, StopCube                            |
| Permanence | Spatial memory    | VideoUnmask, VideoUnmaskSwap, ButtonUnmask, ButtonUnmaskSwap         |
| Reference  | Object memory     | PickHighlight, VideoRepick, VideoPlaceButton, VideoPlaceOrder         |
| Imitation  | Procedural memory | MoveCube, InsertPeg, PatternLock, RouteStick                          |

All tasks are defined in `src/robomme/robomme_env`. A detailed description can be found in our paper appendix.

### πŸ“₯ Training Data

Training data can be downloaded [here](https://huggingface.co/datasets/Yinpei/robomme_data). There are 1,600 demonstrations in total (100 per task). The HDF5 format is described in [doc/h5_data_format.md](doc/h5_data_format.md).

After downloading, replay the dataset for a sanity check:

```bash
uv run scripts/dataset_replay.py --h5-data-dir <your_downloaded_data_dir>
```

### πŸ“Š Evaluation

To evaluate on the test set, set the `dataset` argument of `BenchmarkEnvBuilder`:

```python
task_id = "PickXtimes"
episode_idx = 0
env_builder = BenchmarkEnvBuilder(
    env_id=task_id,
    dataset="test",
    ...
)

env = env_builder.make_env_for_episode(episode_idx)
obs, info = env.reset() # initial step
...
obs, _, terminated, truncated, info = env.step(action) # each step
```
The train split has 100 episodes. The val/test splits each have 50 episodes. All seeds are fixed for benchmarking.

The environment input/output format is described in [doc/env_format.md](doc/env_format.md).

> Currently, environment spawning is set up only for imitation learning. We are working on extending it to support more general parallel environments for reinforcement learning in the future.

### πŸ”§ Data Generation

You can also re-generate your own HDF5 data via parallel processing using
@hongze
```bash
uv run scripts/dev/xxxx
```


## 🧠 Model Training

### 🌟 MME-VLA-Suite

The [MME Policy Learning](https://github.com/RoboMME/robomme_policy_learning) repo provides MME-VLA model training and evaluation used in our paper. It contains a family of memory-augmented VLA models built on [pi05](https://github.com/Physical-Intelligence/openpi) backbone and our implementation of [MemER](https://jen-pan.github.io/memer/). 

### πŸ“š Prior Methods

**MemER**: The [MME Policy Learning](https://github.com/RoboMME/robomme_policy_learning) repo also provides our implementation of the [MemER](https://jen-pan.github.io/memer/), using the same GroundSG policy model as in MME-VLA.

**SAM2Act+**: The [RoboMME_SAM2Act](https://github.com/RoboMME/SAM2Act) repo provides our implementation adapted from the [SAM2Act](https://github.com/sam2act/sam2act) repo.

**MemoryVLA**: The [RoboMME_MemoryVLA](https://github.com/RoboMME/MemoryVLA) repo provides our implementation adapted from the [MemoryVLA](https://github.com/shihao1895/MemoryVLA) repo.
 
**Diffusion Policy**: The [RoboMME_DP](https://github.com/RoboMME/DP) repo provides our implementation adapted from the [diffusion_policy](https://github.com/real-stanford/diffusion_policy) repo.



## πŸ† Submit Your Models
Want to add your model? Download the [dataset](https://huggingface.co/datasets/Yinpei/robomme_data) from Hugging Face, run evaluation using our [eval scripts](scripts/evaluation.py), then submit a PR with your results by adding `<your_model>.md` to the `doc/submission/` [directory](https://github.com/RoboMME/robomme_benchmark/tree/main/doc/submission). We will review it and update our leaderboard.


## πŸ”§ Troubleshooting

**Q1: RuntimeError: Create window failed: Renderer does not support display.**

A1: Use a physical display or set up a virtual display for GUI rendering (e.g. install a VNC server and set the `DISPLAY` variable correctly).

**Q2: Failure related to ManiSkill/SAPIEN rendering initialization.**

A2: This Docker image is configured for CPU-only execution and should not rely on NVIDIA runtime settings. If rendering still fails, first check that no external environment variables are forcing GPU paths, then keep the container on the CPU-only defaults:

```python
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
os.environ['NVIDIA_VISIBLE_DEVICES'] = 'void'
os.environ.setdefault('ROBOMME_RENDER_BACKEND', 'pci:0')  # llvmpipe software Vulkan on CPU
os.environ.pop('SAPIEN_RENDER_DEVICE', None)
os.environ.pop('NVIDIA_DRIVER_CAPABILITIES', None)
os.environ.pop('MUJOCO_GL', None)
os.environ.setdefault('VK_ICD_FILENAMES', '/usr/share/vulkan/icd.d/lvp_icd.x86_64.json')
```


## πŸ™ Acknowledgements

This work was supported in part by NSF SES-2128623, NSF CAREER #2337870, NSF NRI #2220876, NSF NAIRR250085. We would also like to thank the wonderful [OpenPi](https://github.com/Physical-Intelligence/openpi/tree/main) codebase from Physical-Intelligence.


## πŸ“„ Citation

```
...
```