Upload README.md

1a4e3e6 verified 3 months ago

12.5 kB

	# MagicBot-VGA

	This repository documents how to evaluate our RoboTwin model
	[`zaleni/MagicBot-VGA-Robotwin`](https://huggingface.co/zaleni/MagicBot-VGA-Robotwin)
	with the MagicBot-VGA codebase.

	[![Repository](https://img.shields.io/badge/Repository-GitHub-181717?logo=github)](https://github.com/zaleni/MagicBot-VGA)
	[![Model](https://img.shields.io/badge/Model-HuggingFace-FFD21E?logo=huggingface&logoColor=000000)](https://huggingface.co/zaleni/MagicBot-VGA-Robotwin)

	This README focuses on RoboTwin 2.0 environment preparation and evaluation.

	It covers:

	- MagicBot environment installation
	- RoboTwin evaluation setup
	- required external model assets
	- single-task evaluation
	- 50-task randomized evaluation
	- CVPR 2026 RoboTwin Track 11-task evaluation
	- submission package generation for the leaderboard workflow

	## 1. Requirements

	The codebase is built and tested with:

	- Python 3.10
	- CUDA 12.8
	- PyTorch 2.7.1

	We recommend using a Linux machine with NVIDIA GPUs.

	## 2. Install the MagicBot Base Environment

	Clone the repository:

	```bash
	git clone https://github.com/zaleni/MagicBot-VGA.git
	cd MagicBot-VGA
	```

	Create a conda environment:

	```bash
	conda create -y -n magicbot python=3.10
	conda activate magicbot
	pip install --upgrade pip
	```

	Install the basic system dependencies used by the codebase:

	```bash
	conda install -c conda-forge ffmpeg=7.1.1 svt-av1 -y
	```

	Install PyTorch for CUDA 12.8:

	```bash
	pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
	--index-url https://download.pytorch.org/whl/cu128
	```

	Install Python dependencies:

	```bash
	pip install torchcodec numpy scipy transformers==4.57.1 mediapy loguru pytest omegaconf
	pip install -e .
	```

	## 3. Qwen3-VL Dependency

	For `CubeV2`, the recommended dependency is the official Hugging Face `Qwen3-VL`
	implementation provided by `transformers>=4.57.0`.

	In this repository, `CubeV2` imports Qwen3-VL directly from:

	```python
	from transformers.models.qwen3_vl import modeling_qwen3_vl
	from transformers.models.qwen3_vl import Qwen3VLForConditionalGeneration, Qwen3VLTextModel
	```

	So for standard evaluation, you do not need to patch `transformers` if your environment
	already uses a recent enough official version such as `transformers==4.57.1`.

	This repo also contains a vendored replacement file under:

	```text
	src/lerobot/policies/cubev2/transformers_replace/models/qwen3_vl/modeling_qwen3_vl.py
	```

	That file is best understood as a repo-side override copy. Most users evaluating
	`zaleni/MagicBot-VGA-Robotwin` should not need it unless they intentionally want to
	reproduce a specific local patched behavior.

	## 4. Prepare RoboTwin for Evaluation

	This section is specifically for RoboTwin evaluation. If you only want to load the
	model or run other parts of the codebase, the extra RoboTwin setup below is not required.

	### Option A: initialize the bundled RoboTwin submodule

	```bash
	git submodule update --init third_party/RoboTwin
	```

	### Option B: copy an existing RoboTwin checkout

	You do not have to download RoboTwin from scratch if you already have a prepared copy.
	You can copy it into this repository instead.

	The evaluation code assumes RoboTwin is located exactly at:

	```text
	<repo_root>/third_party/RoboTwin
	```

	So a valid layout looks like:

	```text
	MagicBot-VGA/
	evaluation/
	launch/
	src/
	third_party/
	RoboTwin/
	```

	If your RoboTwin directory already exists elsewhere, either:

	- copy it to `third_party/RoboTwin`, or
	- create a symlink at `third_party/RoboTwin` pointing to your existing RoboTwin directory

	This path requirement comes from the evaluation code, which imports RoboTwin modules
	and task configs from `third_party/RoboTwin` directly.

	### Install RoboTwin-specific system dependency

	RoboTwin rendering requires Vulkan:

	```bash
	sudo apt install -y libvulkan1 mesa-vulkan-drivers vulkan-tools
	```

	### Install RoboTwin Python dependencies and assets

	```bash
	cp evaluation/RoboTwin/requirements.txt third_party/RoboTwin/script/requirements.txt
	cd third_party/RoboTwin
	bash script/_install.sh
	bash script/_download_assets.sh
	cd ../../
	```

	For more RoboTwin installation details, you can also refer to the official documentation:
	https://robotwin-platform.github.io/doc/usage/robotwin-install.html

	## 5. Prepare External Model Assets

	The released checkpoint `zaleni/MagicBot-VGA-Robotwin` is intended to be lightweight.
	For RoboTwin action evaluation, you should provide the external backbone/tokenizer assets explicitly.

	Recommended values:

	- Qwen3-VL backbone and processor: `Qwen/Qwen3-VL-2B-Instruct`
	- Cosmos tokenizer: `nvidia/Cosmos-Tokenizer-CI8x8`

	You can use either:

	- public Hugging Face repo ids
	- local directories downloaded in advance

	Example for offline/local usage:

	```bash
	QWEN3_VL_PATH=/path/to/Qwen3-VL-2B-Instruct
	COSMOS_TOKENIZER_PATH=/path/to/Cosmos-Tokenizer-CI8x8
	```

	For standard RoboTwin action evaluation, we recommend disabling DA3 teacher instantiation:

	```bash
	DISABLE_DA3_TEACHER_FOR_EVAL=true
	```

	This avoids loading the frozen DA3 teacher during evaluation while keeping the policy architecture compatible.

	## 6. Single-Task Evaluation

	The most direct way is to call `evaluation/RoboTwin/inference.py` on a single RoboTwin task.

	Example: evaluate task `0` (`adjust_bottle`) on `demo_clean`:

	```bash
	cd third_party/RoboTwin

	python ../../evaluation/RoboTwin/inference.py \
	--args.ckpt_path zaleni/MagicBot-VGA-Robotwin \
	--args.video_dir ../../evaluation/RoboTwin/output_magicbot/demo_clean/task_00 \
	--args.task_config demo_clean \
	--args.task_idx 0 \
	--args.action_mode delta \
	--args.stats_key aloha \
	--args.dtype bfloat16 \
	--args.qwen3_vl_pretrained_path Qwen/Qwen3-VL-2B-Instruct \
	--args.qwen3_vl_processor_path Qwen/Qwen3-VL-2B-Instruct \
	--args.cosmos_tokenizer_path_or_name nvidia/Cosmos-Tokenizer-CI8x8 \
	--args.disable_3d_teacher_for_eval
	```

	If you use local asset directories, replace the public repo ids with your local paths.

	Important arguments:

	- `--args.ckpt_path`: model repo id or local `pretrained_model` directory
	- `--args.task_config`: `demo_clean` or `demo_randomized`
	- `--args.task_idx`: task index in `evaluation/RoboTwin/inference.py`
	- `--args.action_mode`: usually `delta` for this model
	- `--args.stats_key`: usually `aloha` for RoboTwin
	- `--args.dtype`: `bfloat16` is recommended on modern GPUs

	Outputs are written to `--args.video_dir`, including:

	- replay videos
	- `summary.json`
	- `summary.txt`

	## 7. 50-Task Randomized Evaluation

	For batch evaluation on RoboTwin randomized tasks, use:

	```bash
	PRETRAINED_CKPT=zaleni/MagicBot-VGA-Robotwin \
	QWEN3_VL_PRETRAINED_PATH=Qwen/Qwen3-VL-2B-Instruct \
	QWEN3_VL_PROCESSOR_PATH=Qwen/Qwen3-VL-2B-Instruct \
	COSMOS_TOKENIZER_PATH_OR_NAME=nvidia/Cosmos-Tokenizer-CI8x8 \
	DISABLE_DA3_TEACHER_FOR_EVAL=true \
	GPU_IDS=0,1 \
	MAX_JOBS_PER_GPU=2 \
	bash evaluation/RoboTwin/eval_randomized_50.sh
	```

	Useful environment variables:

	- `PRETRAINED_CKPT`: model repo id or local checkpoint directory
	- `GPU_IDS`: comma-separated GPU ids, for example `0,1,2,3`
	- `MAX_JOBS_PER_GPU`: parallel RoboTwin jobs per GPU
	- `TASK_CONFIG`: defaults to `demo_randomized`
	- `TEST_NUM`: number of episodes per task
	- `DTYPE`: `bfloat16` or `float32`
	- `BASE_OUTPUT_PATH`: output root directory

	This script writes:

	- per-task logs and videos under `tasks/`
	- aggregated `summary.json`
	- aggregated `summary.txt`

	## 8. Evaluate a Continuous Task Range

	`eval_randomized_50.sh` supports continuous ranges through:

	- `START_TASK_IDX`
	- `TASK_COUNT`

	Example: evaluate tasks `10` to `19`:

	```bash
	PRETRAINED_CKPT=zaleni/MagicBot-VGA-Robotwin \
	QWEN3_VL_PRETRAINED_PATH=Qwen/Qwen3-VL-2B-Instruct \
	QWEN3_VL_PROCESSOR_PATH=Qwen/Qwen3-VL-2B-Instruct \
	COSMOS_TOKENIZER_PATH_OR_NAME=nvidia/Cosmos-Tokenizer-CI8x8 \
	DISABLE_DA3_TEACHER_FOR_EVAL=true \
	START_TASK_IDX=10 \
	TASK_COUNT=10 \
	bash evaluation/RoboTwin/eval_randomized_50.sh
	```

	## 9. Evaluate the CVPR 2026 RoboTwin Track 11-Task Subset

	For the Hugging Face leaderboard
	[`open-gigaai/CVPR-2026-RoboTwin-Track-LeaderBoard`](https://huggingface.co/spaces/open-gigaai/CVPR-2026-RoboTwin-Track-LeaderBoard),
	we use the following 11-task subset:

	```text
	[2, 3, 9, 10, 12, 15, 17, 25, 28, 30, 44]
	```

	The exact task names in `evaluation/RoboTwin/inference.py` are:

	- `blocks_ranking_rgb`
	- `blocks_ranking_size`
	- `handover_mic`
	- `hanging_mug`
	- `move_can_pot`
	- `move_stapler_pad`
	- `open_microwave`
	- `place_can_basket`
	- `place_dual_shoes`
	- `place_fan`
	- `stack_blocks_three`

	The current batch script does not take a sparse task list directly, so the recommended approach is to run a shell loop:

	```bash
	cd third_party/RoboTwin

	TASKS=(2 3 9 10 12 15 17 25 28 30 44)
	for t in "${TASKS[@]}"; do
	python ../../evaluation/RoboTwin/inference.py \
	--args.ckpt_path zaleni/MagicBot-VGA-Robotwin \
	--args.video_dir ../../evaluation/RoboTwin/output_magicbot/custom_subset/task_${t} \
	--args.task_config demo_randomized \
	--args.task_idx "${t}" \
	--args.action_mode delta \
	--args.stats_key aloha \
	--args.dtype bfloat16 \
	--args.qwen3_vl_pretrained_path Qwen/Qwen3-VL-2B-Instruct \
	--args.qwen3_vl_processor_path Qwen/Qwen3-VL-2B-Instruct \
	--args.cosmos_tokenizer_path_or_name nvidia/Cosmos-Tokenizer-CI8x8 \
	--args.disable_3d_teacher_for_eval
	done
	```

	This produces one output directory per task, each containing replay videos plus `summary.json` and `summary.txt`.

	## 10. Package the 11-Task Submission and Export Success Rates

	After you finish the randomized evaluation run, you can convert those 11 tasks into a submission-style folder with:

	```bash
	python util_scripts/package_robotwin_submission.py \
	--run /path/to/output_randomized_50/<run_name>/summary.txt \
	--dst /path/to/output_randomized_50/<run_name>/submission_package \
	--overwrite
	```

	If you also want to bundle a policy folder, add:

	```bash
	--policy-dir /path/to/policy/Your_Policy
	```

	The packaging script will:

	- create `submission_package/<task_name>/episode0.mp4`, `episode1.mp4`, ...
	- preserve the 11-task ordering by task index
	- write `package_manifest.txt`
	- write `selected_task_summary.json`
	- write `selected_task_summary.txt`

	The selected-task summary files include:

	- per-task `success_rate`
	- per-task `success_count` and `test_num`
	- `avg_task_success_rate` across the 11 tasks
	- `overall_episode_success_rate` across all episodes in the 11-task subset

	This is useful when you want a leaderboard-facing summary for the competition subset rather than the full randomized-50 report.

	## 11. Task Index Reference

	Task indices are defined in [`evaluation/RoboTwin/inference.py`](evaluation/RoboTwin/inference.py).

	For example:

	- `0`: `adjust_bottle`
	- `2`: `blocks_ranking_rgb`
	- `3`: `blocks_ranking_size`
	- `9`: `handover_mic`
	- `10`: `hanging_mug`
	- `12`: `move_can_pot`
	- `15`: `move_stapler_pad`
	- `17`: `open_microwave`
	- `25`: `place_can_basket`
	- `28`: `place_dual_shoes`
	- `30`: `place_fan`
	- `44`: `stack_blocks_three`

	## 12. Common Notes

	- `inference.py` can load checkpoints from either a local directory or a Hugging Face repo id.
	- If your server cannot access Hugging Face online, download the external assets in advance and pass local paths.
	- If you use the lightweight checkpoint release for action evaluation, keeping `--args.disable_3d_teacher_for_eval` enabled is recommended.
	- If you want to inspect reconstructed future images during inference, enable `--args.decode_image_flag`, though this is not required for standard RoboTwin scoring.

	## 13. Model Link

	Released RoboTwin checkpoint:

	- https://huggingface.co/zaleni/MagicBot-VGA-Robotwin

	## 14. Acknowledgments

	MagicBot-VGA is developed on top of the excellent InternVLA framework. Our codebase
	started from that foundation and has since been substantially modified and extended
	for our own model architecture, training pipeline, and evaluation workflow.

	We sincerely thank the [InternVLA](https://github.com/InternRobotics/InternVLA-A1)
	authors and contributors for open-sourcing their framework and making follow-up
	research and development much easier.

	We also thank the following open-source projects:

	- [InternVLA](https://github.com/InternRobotics/InternVLA-A1)
	- [LeRobot](https://github.com/huggingface/lerobot)
	- [RoboTwin](https://github.com/RoboTwin-Platform/RoboTwin)
	- [Qwen3-VL](https://github.com/QwenLM/Qwen3-VL)
	- [NVIDIA Cosmos](https://github.com/nvidia-cosmos)