Mem-0-m1mix-RMBench / README.md
qiuly's picture
Add files using upload-large-folder tool
1dd1ae9 verified
|
Raw
History Blame Contribute Delete
10 kB
# Mem-0 Execution Module β€” `m1_mix` (RMBench / RoboTwin 2.0)
A single **Mem-0** low-level execution-module checkpoint trained jointly on all five
RMBench **M1** tasks (the `m1_mix` dataset) and evaluated on each task in turn. M1 tasks
require only the execution module β€” **no high-level planner / vLLM is needed for
inference.**
- **Backbone:** Qwen3-VL-2B-Instruct (vision-language) β€” *weights fine-tuned and bundled in the checkpoint*
- **Action head:** DiT-B flow-matching policy (action chunk of 30, 16-D action)
- **Memory:** MemoryBank (instant + anchor memory fusion across the episode)
- **Aux head:** subtask-end classifier (used for Mn multi-stage tasks; inert for M1)
- **Total parameters:** β‰ˆ 2.67 B
---
## Results
`task_config = demo_clean`, `instruction_type = unseen`, 100 episodes per task, `action_horizon = 30`.
The **same** checkpoint and **same** `m1_mix` normalization stats are used for every task.
| Task | Success Rate | Reward |
|---------------------|:------------:|:------:|
| put_back_block | **1.00** | 1.00 |
| rearrange_blocks | **0.86** | 0.86 |
| swap_blocks | **0.81** | 0.81 |
| swap_T | **0.13** | 0.13 |
| observe_and_pickup | **0.03** | 0.00 |
| **Average** | **0.566** | β€” |
Per-episode logs and rollout videos for all five tasks are under `eval_results/`.
See `eval_results/summary.md` for details and `task_instructions.json` for the exact
per-task language instruction used.
---
## Contents of this bundle
```
m1_mix_submit/
β”œβ”€β”€ README.md # this file
β”œβ”€β”€ task_instructions.json # verbatim --global_task per task + scores
β”œβ”€β”€ checkpoint/
β”‚ β”œβ”€β”€ m1_mix_final_step50000.pt.part00 … part08 # 15.3 GB full training ckpt, split into 9 parts (2Γ—4 GB + 7×≀1 GB)
β”‚ β”œβ”€β”€ m1_mix_final_step50000.pt.sha256 # SHA-256 of the reassembled checkpoint
β”‚ └── README_REASSEMBLE.md # how to cat the parts back together + verify
β”œβ”€β”€ norm_stats/
β”‚ └── norm_stats.json # min-max state/action stats β†’ [-1, 1]
β”œβ”€β”€ configs/
β”‚ β”œβ”€β”€ execution_module_train_m1_mix.yaml # training config (reproducibility)
β”‚ └── deploy_policy.yml # inference / deployment config
β”œβ”€β”€ qwen_base_config/ # Qwen3-VL-2B-Instruct config/processor ONLY
β”‚ β”œβ”€β”€ config.json, generation_config.json
β”‚ β”œβ”€β”€ tokenizer*.json, vocab.json, merges.txt
β”‚ β”œβ”€β”€ preprocessor_config.json, video_preprocessor_config.json, chat_template.json
β”‚ └── README_Qwen3-VL-2B-Instruct.md # upstream model card (Apache-2.0)
└── eval_results/
β”œβ”€β”€ summary.md
└── <task>/ # _result.txt, eval_log.txt, episode*.mp4 (Γ—100)
```
### About the checkpoint
> **Reassemble first.** The 15.3 GB checkpoint is uploaded as 9 byte-split parts
> (`m1_mix_final_step50000.pt.part00…08`) because the upload path capped single files and
> throttled per-window bytes. Concatenation reproduces the original **bit-for-bit**:
>
> ```bash
> cat m1_mix_final_step50000.pt.part?? > m1_mix_final_step50000.pt
> sha256sum -c m1_mix_final_step50000.pt.sha256 # -> m1_mix_final_step50000.pt: OK
> ```
>
> See `checkpoint/README_REASSEMBLE.md` for details.
Once reassembled, `m1_mix_final_step50000.pt` is the **full training checkpoint** at step 50000:
| key | content |
|------------------------|----------------------------------------------------------------------|
| `model_state_dict` | 910 tensors, β‰ˆ 2.67 B params (`qwen_model` β‰ˆ 2.44 B, `action_model` β‰ˆ 160 M, `memory_bank` β‰ˆ 39 M, `classifier` β‰ˆ 32 M); bf16 + fp32 |
| `optimizer_state_dict` | AdamW moments β€” for resume/fine-tune only |
| `scheduler_state_dict` | cosine LR scheduler state |
| `global_step` | 50000 |
The `model_state_dict` is **self-contained**: it already includes the fine-tuned
Qwen3-VL-2B backbone weights. The bundled `qwen_base_config/` provides only the
*architecture/tokenizer/processor* config β€” the base model weights (`model.safetensors`,
~4 GB) are **not** re-distributed here; download them from the official repo (see below).
**Inference-only slimming** (15.3 GB β†’ β‰ˆ 6 GB) if you don't need to resume training:
```python
import torch
ck = torch.load("checkpoint/m1_mix_final_step50000.pt", map_location="cpu", weights_only=False)
torch.save({"model_state_dict": ck["model_state_dict"], "global_step": ck["global_step"]},
"m1_mix_inference.pt")
```
The deploy loader reads `payload["model_state_dict"]` and calls
`load_state_dict(..., strict=False)`, so either the full or the slimmed file works
unchanged.
---
## Dependencies
1. **Code:** the RMBench / Mem-0 repository (this checkpoint targets its
`policy/Mem-0` execution module and `script/eval_policy.py`). Follow the repo README
for the RoboTwin 2.0 simulator environment setup.
2. **Base VLM:** `Qwen/Qwen3-VL-2B-Instruct` (Apache-2.0). Required at model
instantiation for the architecture + image/text processor. Its weights are
overwritten by this checkpoint at load time (`strict=False`), but the directory must
exist and contain `model.safetensors`:
```bash
huggingface-cli download Qwen/Qwen3-VL-2B-Instruct \
--local-dir policy/Mem-0/checkpoints/Qwen3-VL-2B-Instruct
```
The small config/processor files in `qwen_base_config/` are exactly the ones used for
training and evaluation; you may overlay them onto the downloaded directory if the
upstream revision differs.
---
## How to run evaluation
Point the deploy config at the checkpoint and the `m1_mix` stats, then run one task at a
time. This mirrors exactly how the numbers above were produced:
```bash
python script/eval_policy.py --config policy/Mem-0/deploy_policy.yml --overrides \
--task_name swap_blocks \
--execution_ckpt /path/to/m1_mix_final_step50000.pt \
--state_stats_path /path/to/norm_stats/norm_stats.json \
--ckpt_setting m1mix \
--global_task "There are three traies on the table, and two blocks are placed in two different traies. You may move only one block at a time, and each tray can hold at most one block. Swap the positions of the two blocks. Finally press the button." \
--action_horizon 30
```
- Replace `--task_name` and `--global_task` with each of the five tasks (strings in
`task_instructions.json`). The checkpoint and `--state_stats_path` stay the same.
- `--ckpt_setting m1mix` only labels the output directory
(`eval_result/<task>/Mem-0/demo_clean/m1mix/<timestamp>/`).
- `--vllm_url` is accepted but unused for M1 tasks (the global instruction is set
directly; the planner client is constructed but never queried).
- Ensure `execution_module.qwen_vl.model_path` in `deploy_policy.yml` points to your
downloaded Qwen3-VL-2B-Instruct directory.
---
## Model architecture (from `configs/`)
- **VLM backbone** β€” Qwen3-VL-2B-Instruct, 224Γ—224 head-camera image + language
instruction, last-layer hidden states (hidden size 2048).
- **MemoryBank** β€” `window_size 30`, `initial_anchor_size 1`, `num_heads 8`,
`memory_accumulation 8`, `dropout 0.1`; fuses an instant-memory and an anchor-memory
token; concatenated with the text feature β†’ a 3-token summary `(B, 3, 2048)`.
- **DiT-B action head** (`FlowmatchingActionHead`) β€” `num_layers 16`,
`cross_attention_dim 2048`, `action_dim 16`, `state_dim 16`, `action_horizon 30`,
`num_inference_timesteps 8`; flow-matching regression of a 30-step action chunk.
- **Subtask-end classifier** β€” MLP `hidden_sizes [6144, 2048, 512]`, `pos_weight 10.0`,
`focal_gamma 1.0`, `threshold 0.5`. Drives stage transitions in Mn tasks; for M1 the
episode is a single stage so it does not affect rollout.
## Training (from `configs/execution_module_train_m1_mix.yaml`)
- **Data:** `m1_mix` (the five M1 tasks merged into one LeRobot dataset with globally
unique `episode_id`s). Features: head-camera image, state, action, subtask,
subtask_end, episode_id.
- **Schedule:** `train_steps 50000`, `batch_size 56`, cosine scheduler,
`warmup_ratio 0.05`, `grad_clip_norm 2.5`, `weight_decay 0.005`, `seed 42`.
- **Learning rates:** base `1e-5`, qwen_model `1e-5`, action_model `1e-4`,
classifier `1e-4` (min LRs `1e-6 / 1e-6 / 5e-6 / 5e-6`).
- **Loss:** `lambda_action 1.0`, `lambda_classifier 0.2`.
## Normalization
State and action are min-max normalized to `[-1, 1]` over the 14 arm dimensions using
`norm_stats/norm_stats.json` (`NORM_WAY = "minmax"` in `deploy_policy.py`). Use the same
stats file at inference; predicted actions are denormalized with it before being sent to
the environment. Action chunks from overlapping predictions are averaged (mean smoothing)
before execution.
## Limitations
- **swap_T (0.13)** and **observe_and_pickup (0.03)** are weak: the former needs precise
T-block position *and* orientation alignment; the latter needs cross-view target
re-identification after a visual occlusion followed by a pickup. The joint `m1_mix`
model does not solve these reliably.
- Numbers are on RoboTwin 2.0 `demo_clean` with `unseen` instruction phrasings; other
task configs / domain randomization will differ.
## License & attribution
- Base VLM **Qwen3-VL-2B-Instruct** is Β© the Qwen team, licensed **Apache-2.0**
(see `qwen_base_config/README_Qwen3-VL-2B-Instruct.md`). Because the checkpoint
embeds fine-tuned Qwen weights, that license applies to the corresponding components.
- RMBench / RoboTwin and the Mem-0 policy code are governed by their respective upstream
licenses; refer to the source repository.