YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
- MagicBot-VGA
- 1. Requirements
- 2. Install the MagicBot Base Environment
- 3. Qwen3-VL Dependency
- 4. Prepare RoboTwin for Evaluation
- 5. Prepare External Model Assets
- 6. Single-Task Evaluation
- 7. 50-Task Randomized Evaluation
- 8. Evaluate a Continuous Task Range
- 9. Evaluate the CVPR 2026 RoboTwin Track 11-Task Subset
- 10. Package the 11-Task Submission and Export Success Rates
- 11. Task Index Reference
- 12. Common Notes
- 13. Model Link
- 14. Acknowledgments
- 1. Requirements
MagicBot-VGA
This repository documents how to evaluate our RoboTwin model
zaleni/MagicBot-VGA-Robotwin
with the MagicBot-VGA codebase.
This README focuses on RoboTwin 2.0 environment preparation and evaluation.
It covers:
- MagicBot environment installation
- RoboTwin evaluation setup
- required external model assets
- single-task evaluation
- 50-task randomized evaluation
- CVPR 2026 RoboTwin Track 11-task evaluation
- submission package generation for the leaderboard workflow
1. Requirements
The codebase is built and tested with:
- Python 3.10
- CUDA 12.8
- PyTorch 2.7.1
We recommend using a Linux machine with NVIDIA GPUs.
2. Install the MagicBot Base Environment
Clone the repository:
git clone https://github.com/zaleni/MagicBot-VGA.git
cd MagicBot-VGA
Create a conda environment:
conda create -y -n magicbot python=3.10
conda activate magicbot
pip install --upgrade pip
Install the basic system dependencies used by the codebase:
conda install -c conda-forge ffmpeg=7.1.1 svt-av1 -y
Install PyTorch for CUDA 12.8:
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
--index-url https://download.pytorch.org/whl/cu128
Install Python dependencies:
pip install torchcodec numpy scipy transformers==4.57.1 mediapy loguru pytest omegaconf
pip install -e .
3. Qwen3-VL Dependency
For CubeV2, the recommended dependency is the official Hugging Face Qwen3-VL
implementation provided by transformers>=4.57.0.
In this repository, CubeV2 imports Qwen3-VL directly from:
from transformers.models.qwen3_vl import modeling_qwen3_vl
from transformers.models.qwen3_vl import Qwen3VLForConditionalGeneration, Qwen3VLTextModel
So for standard evaluation, you do not need to patch transformers if your environment
already uses a recent enough official version such as transformers==4.57.1.
This repo also contains a vendored replacement file under:
src/lerobot/policies/cubev2/transformers_replace/models/qwen3_vl/modeling_qwen3_vl.py
That file is best understood as a repo-side override copy. Most users evaluating
zaleni/MagicBot-VGA-Robotwin should not need it unless they intentionally want to
reproduce a specific local patched behavior.
4. Prepare RoboTwin for Evaluation
This section is specifically for RoboTwin evaluation. If you only want to load the model or run other parts of the codebase, the extra RoboTwin setup below is not required.
Option A: initialize the bundled RoboTwin submodule
git submodule update --init third_party/RoboTwin
Option B: copy an existing RoboTwin checkout
You do not have to download RoboTwin from scratch if you already have a prepared copy. You can copy it into this repository instead.
The evaluation code assumes RoboTwin is located exactly at:
<repo_root>/third_party/RoboTwin
So a valid layout looks like:
MagicBot-VGA/
evaluation/
launch/
src/
third_party/
RoboTwin/
If your RoboTwin directory already exists elsewhere, either:
- copy it to
third_party/RoboTwin, or - create a symlink at
third_party/RoboTwinpointing to your existing RoboTwin directory
This path requirement comes from the evaluation code, which imports RoboTwin modules
and task configs from third_party/RoboTwin directly.
Install RoboTwin-specific system dependency
RoboTwin rendering requires Vulkan:
sudo apt install -y libvulkan1 mesa-vulkan-drivers vulkan-tools
Install RoboTwin Python dependencies and assets
cp evaluation/RoboTwin/requirements.txt third_party/RoboTwin/script/requirements.txt
cd third_party/RoboTwin
bash script/_install.sh
bash script/_download_assets.sh
cd ../../
For more RoboTwin installation details, you can also refer to the official documentation: https://robotwin-platform.github.io/doc/usage/robotwin-install.html
5. Prepare External Model Assets
The released checkpoint zaleni/MagicBot-VGA-Robotwin is intended to be lightweight.
For RoboTwin action evaluation, you should provide the external backbone/tokenizer assets explicitly.
Recommended values:
- Qwen3-VL backbone and processor:
Qwen/Qwen3-VL-2B-Instruct - Cosmos tokenizer:
nvidia/Cosmos-Tokenizer-CI8x8
You can use either:
- public Hugging Face repo ids
- local directories downloaded in advance
Example for offline/local usage:
QWEN3_VL_PATH=/path/to/Qwen3-VL-2B-Instruct
COSMOS_TOKENIZER_PATH=/path/to/Cosmos-Tokenizer-CI8x8
For standard RoboTwin action evaluation, we recommend disabling DA3 teacher instantiation:
DISABLE_DA3_TEACHER_FOR_EVAL=true
This avoids loading the frozen DA3 teacher during evaluation while keeping the policy architecture compatible.
6. Single-Task Evaluation
The most direct way is to call evaluation/RoboTwin/inference.py on a single RoboTwin task.
Example: evaluate task 0 (adjust_bottle) on demo_clean:
cd third_party/RoboTwin
python ../../evaluation/RoboTwin/inference.py \
--args.ckpt_path zaleni/MagicBot-VGA-Robotwin \
--args.video_dir ../../evaluation/RoboTwin/output_magicbot/demo_clean/task_00 \
--args.task_config demo_clean \
--args.task_idx 0 \
--args.action_mode delta \
--args.stats_key aloha \
--args.dtype bfloat16 \
--args.qwen3_vl_pretrained_path Qwen/Qwen3-VL-2B-Instruct \
--args.qwen3_vl_processor_path Qwen/Qwen3-VL-2B-Instruct \
--args.cosmos_tokenizer_path_or_name nvidia/Cosmos-Tokenizer-CI8x8 \
--args.disable_3d_teacher_for_eval
If you use local asset directories, replace the public repo ids with your local paths.
Important arguments:
--args.ckpt_path: model repo id or localpretrained_modeldirectory--args.task_config:demo_cleanordemo_randomized--args.task_idx: task index inevaluation/RoboTwin/inference.py--args.action_mode: usuallydeltafor this model--args.stats_key: usuallyalohafor RoboTwin--args.dtype:bfloat16is recommended on modern GPUs
Outputs are written to --args.video_dir, including:
- replay videos
summary.jsonsummary.txt
7. 50-Task Randomized Evaluation
For batch evaluation on RoboTwin randomized tasks, use:
PRETRAINED_CKPT=zaleni/MagicBot-VGA-Robotwin \
QWEN3_VL_PRETRAINED_PATH=Qwen/Qwen3-VL-2B-Instruct \
QWEN3_VL_PROCESSOR_PATH=Qwen/Qwen3-VL-2B-Instruct \
COSMOS_TOKENIZER_PATH_OR_NAME=nvidia/Cosmos-Tokenizer-CI8x8 \
DISABLE_DA3_TEACHER_FOR_EVAL=true \
GPU_IDS=0,1 \
MAX_JOBS_PER_GPU=2 \
bash evaluation/RoboTwin/eval_randomized_50.sh
Useful environment variables:
PRETRAINED_CKPT: model repo id or local checkpoint directoryGPU_IDS: comma-separated GPU ids, for example0,1,2,3MAX_JOBS_PER_GPU: parallel RoboTwin jobs per GPUTASK_CONFIG: defaults todemo_randomizedTEST_NUM: number of episodes per taskDTYPE:bfloat16orfloat32BASE_OUTPUT_PATH: output root directory
This script writes:
- per-task logs and videos under
tasks/ - aggregated
summary.json - aggregated
summary.txt
8. Evaluate a Continuous Task Range
eval_randomized_50.sh supports continuous ranges through:
START_TASK_IDXTASK_COUNT
Example: evaluate tasks 10 to 19:
PRETRAINED_CKPT=zaleni/MagicBot-VGA-Robotwin \
QWEN3_VL_PRETRAINED_PATH=Qwen/Qwen3-VL-2B-Instruct \
QWEN3_VL_PROCESSOR_PATH=Qwen/Qwen3-VL-2B-Instruct \
COSMOS_TOKENIZER_PATH_OR_NAME=nvidia/Cosmos-Tokenizer-CI8x8 \
DISABLE_DA3_TEACHER_FOR_EVAL=true \
START_TASK_IDX=10 \
TASK_COUNT=10 \
bash evaluation/RoboTwin/eval_randomized_50.sh
9. Evaluate the CVPR 2026 RoboTwin Track 11-Task Subset
For the Hugging Face leaderboard
open-gigaai/CVPR-2026-RoboTwin-Track-LeaderBoard,
we use the following 11-task subset:
[2, 3, 9, 10, 12, 15, 17, 25, 28, 30, 44]
The exact task names in evaluation/RoboTwin/inference.py are:
blocks_ranking_rgbblocks_ranking_sizehandover_michanging_mugmove_can_potmove_stapler_padopen_microwaveplace_can_basketplace_dual_shoesplace_fanstack_blocks_three
The current batch script does not take a sparse task list directly, so the recommended approach is to run a shell loop:
cd third_party/RoboTwin
TASKS=(2 3 9 10 12 15 17 25 28 30 44)
for t in "${TASKS[@]}"; do
python ../../evaluation/RoboTwin/inference.py \
--args.ckpt_path zaleni/MagicBot-VGA-Robotwin \
--args.video_dir ../../evaluation/RoboTwin/output_magicbot/custom_subset/task_${t} \
--args.task_config demo_randomized \
--args.task_idx "${t}" \
--args.action_mode delta \
--args.stats_key aloha \
--args.dtype bfloat16 \
--args.qwen3_vl_pretrained_path Qwen/Qwen3-VL-2B-Instruct \
--args.qwen3_vl_processor_path Qwen/Qwen3-VL-2B-Instruct \
--args.cosmos_tokenizer_path_or_name nvidia/Cosmos-Tokenizer-CI8x8 \
--args.disable_3d_teacher_for_eval
done
This produces one output directory per task, each containing replay videos plus summary.json and summary.txt.
10. Package the 11-Task Submission and Export Success Rates
After you finish the randomized evaluation run, you can convert those 11 tasks into a submission-style folder with:
python util_scripts/package_robotwin_submission.py \
--run /path/to/output_randomized_50/<run_name>/summary.txt \
--dst /path/to/output_randomized_50/<run_name>/submission_package \
--overwrite
If you also want to bundle a policy folder, add:
--policy-dir /path/to/policy/Your_Policy
The packaging script will:
- create
submission_package/<task_name>/episode0.mp4,episode1.mp4, ... - preserve the 11-task ordering by task index
- write
package_manifest.txt - write
selected_task_summary.json - write
selected_task_summary.txt
The selected-task summary files include:
- per-task
success_rate - per-task
success_countandtest_num avg_task_success_rateacross the 11 tasksoverall_episode_success_rateacross all episodes in the 11-task subset
This is useful when you want a leaderboard-facing summary for the competition subset rather than the full randomized-50 report.
11. Task Index Reference
Task indices are defined in evaluation/RoboTwin/inference.py.
For example:
0:adjust_bottle2:blocks_ranking_rgb3:blocks_ranking_size9:handover_mic10:hanging_mug12:move_can_pot15:move_stapler_pad17:open_microwave25:place_can_basket28:place_dual_shoes30:place_fan44:stack_blocks_three
12. Common Notes
inference.pycan load checkpoints from either a local directory or a Hugging Face repo id.- If your server cannot access Hugging Face online, download the external assets in advance and pass local paths.
- If you use the lightweight checkpoint release for action evaluation, keeping
--args.disable_3d_teacher_for_evalenabled is recommended. - If you want to inspect reconstructed future images during inference, enable
--args.decode_image_flag, though this is not required for standard RoboTwin scoring.
13. Model Link
Released RoboTwin checkpoint:
14. Acknowledgments
MagicBot-VGA is developed on top of the excellent InternVLA framework. Our codebase started from that foundation and has since been substantially modified and extended for our own model architecture, training pipeline, and evaluation workflow.
We sincerely thank the InternVLA authors and contributors for open-sourcing their framework and making follow-up research and development much easier.
We also thank the following open-source projects:
- Downloads last month
- 2