# MagicBot-VGA

This repository documents how to evaluate our RoboTwin model
[`zaleni/MagicBot-VGA-Robotwin`](https://huggingface.co/zaleni/MagicBot-VGA-Robotwin)
with the MagicBot-VGA codebase.

[![Repository](https://img.shields.io/badge/Repository-GitHub-181717?logo=github)](https://github.com/zaleni/MagicBot-VGA)
[![Model](https://img.shields.io/badge/Model-HuggingFace-FFD21E?logo=huggingface&logoColor=000000)](https://huggingface.co/zaleni/MagicBot-VGA-Robotwin)

This README focuses on RoboTwin 2.0 environment preparation and evaluation.

It covers:

- MagicBot environment installation
- RoboTwin evaluation setup
- required external model assets
- single-task evaluation
- 50-task randomized evaluation
- CVPR 2026 RoboTwin Track 11-task evaluation
- submission package generation for the leaderboard workflow

## 1. Requirements

The codebase is built and tested with:

- Python 3.10
- CUDA 12.8
- PyTorch 2.7.1

We recommend using a Linux machine with NVIDIA GPUs.

## 2. Install the MagicBot Base Environment

Clone the repository:

```bash
git clone https://github.com/zaleni/MagicBot-VGA.git
cd MagicBot-VGA
```

Create a conda environment:

```bash
conda create -y -n magicbot python=3.10
conda activate magicbot
pip install --upgrade pip
```

Install the basic system dependencies used by the codebase:

```bash
conda install -c conda-forge ffmpeg=7.1.1 svt-av1 -y
```

Install PyTorch for CUDA 12.8:

```bash
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
  --index-url https://download.pytorch.org/whl/cu128
```

Install Python dependencies:

```bash
pip install torchcodec numpy scipy transformers==4.57.1 mediapy loguru pytest omegaconf
pip install -e .
```

## 3. Qwen3-VL Dependency

For `CubeV2`, the recommended dependency is the official Hugging Face `Qwen3-VL`
implementation provided by `transformers>=4.57.0`.

In this repository, `CubeV2` imports Qwen3-VL directly from:

```python
from transformers.models.qwen3_vl import modeling_qwen3_vl
from transformers.models.qwen3_vl import Qwen3VLForConditionalGeneration, Qwen3VLTextModel
```

So for standard evaluation, you do not need to patch `transformers` if your environment
already uses a recent enough official version such as `transformers==4.57.1`.

This repo also contains a vendored replacement file under:

```text
src/lerobot/policies/cubev2/transformers_replace/models/qwen3_vl/modeling_qwen3_vl.py
```

That file is best understood as a repo-side override copy. Most users evaluating
`zaleni/MagicBot-VGA-Robotwin` should not need it unless they intentionally want to
reproduce a specific local patched behavior.

## 4. Prepare RoboTwin for Evaluation

This section is specifically for RoboTwin evaluation. If you only want to load the
model or run other parts of the codebase, the extra RoboTwin setup below is not required.

### Option A: initialize the bundled RoboTwin submodule

```bash
git submodule update --init third_party/RoboTwin
```

### Option B: copy an existing RoboTwin checkout

You do not have to download RoboTwin from scratch if you already have a prepared copy.
You can copy it into this repository instead.

The evaluation code assumes RoboTwin is located exactly at:

```text
<repo_root>/third_party/RoboTwin
```

So a valid layout looks like:

```text
MagicBot-VGA/
  evaluation/
  launch/
  src/
  third_party/
    RoboTwin/
```

If your RoboTwin directory already exists elsewhere, either:

- copy it to `third_party/RoboTwin`, or
- create a symlink at `third_party/RoboTwin` pointing to your existing RoboTwin directory

This path requirement comes from the evaluation code, which imports RoboTwin modules
and task configs from `third_party/RoboTwin` directly.

### Install RoboTwin-specific system dependency

RoboTwin rendering requires Vulkan:

```bash
sudo apt install -y libvulkan1 mesa-vulkan-drivers vulkan-tools
```

### Install RoboTwin Python dependencies and assets

```bash
cp evaluation/RoboTwin/requirements.txt third_party/RoboTwin/script/requirements.txt
cd third_party/RoboTwin
bash script/_install.sh
bash script/_download_assets.sh
cd ../../
```

For more RoboTwin installation details, you can also refer to the official documentation:
https://robotwin-platform.github.io/doc/usage/robotwin-install.html

## 5. Prepare External Model Assets

The released checkpoint `zaleni/MagicBot-VGA-Robotwin` is intended to be lightweight.
For RoboTwin action evaluation, you should provide the external backbone/tokenizer assets explicitly.

Recommended values:

- Qwen3-VL backbone and processor: `Qwen/Qwen3-VL-2B-Instruct`
- Cosmos tokenizer: `nvidia/Cosmos-Tokenizer-CI8x8`

You can use either:

- public Hugging Face repo ids
- local directories downloaded in advance

Example for offline/local usage:

```bash
QWEN3_VL_PATH=/path/to/Qwen3-VL-2B-Instruct
COSMOS_TOKENIZER_PATH=/path/to/Cosmos-Tokenizer-CI8x8
```

For standard RoboTwin action evaluation, we recommend disabling DA3 teacher instantiation:

```bash
DISABLE_DA3_TEACHER_FOR_EVAL=true
```

This avoids loading the frozen DA3 teacher during evaluation while keeping the policy architecture compatible.

## 6. Single-Task Evaluation

The most direct way is to call `evaluation/RoboTwin/inference.py` on a single RoboTwin task.

Example: evaluate task `0` (`adjust_bottle`) on `demo_clean`:

```bash
cd third_party/RoboTwin

python ../../evaluation/RoboTwin/inference.py \
  --args.ckpt_path zaleni/MagicBot-VGA-Robotwin \
  --args.video_dir ../../evaluation/RoboTwin/output_magicbot/demo_clean/task_00 \
  --args.task_config demo_clean \
  --args.task_idx 0 \
  --args.action_mode delta \
  --args.stats_key aloha \
  --args.dtype bfloat16 \
  --args.qwen3_vl_pretrained_path Qwen/Qwen3-VL-2B-Instruct \
  --args.qwen3_vl_processor_path Qwen/Qwen3-VL-2B-Instruct \
  --args.cosmos_tokenizer_path_or_name nvidia/Cosmos-Tokenizer-CI8x8 \
  --args.disable_3d_teacher_for_eval
```

If you use local asset directories, replace the public repo ids with your local paths.

Important arguments:

- `--args.ckpt_path`: model repo id or local `pretrained_model` directory
- `--args.task_config`: `demo_clean` or `demo_randomized`
- `--args.task_idx`: task index in `evaluation/RoboTwin/inference.py`
- `--args.action_mode`: usually `delta` for this model
- `--args.stats_key`: usually `aloha` for RoboTwin
- `--args.dtype`: `bfloat16` is recommended on modern GPUs

Outputs are written to `--args.video_dir`, including:

- replay videos
- `summary.json`
- `summary.txt`

## 7. 50-Task Randomized Evaluation

For batch evaluation on RoboTwin randomized tasks, use:

```bash
PRETRAINED_CKPT=zaleni/MagicBot-VGA-Robotwin \
QWEN3_VL_PRETRAINED_PATH=Qwen/Qwen3-VL-2B-Instruct \
QWEN3_VL_PROCESSOR_PATH=Qwen/Qwen3-VL-2B-Instruct \
COSMOS_TOKENIZER_PATH_OR_NAME=nvidia/Cosmos-Tokenizer-CI8x8 \
DISABLE_DA3_TEACHER_FOR_EVAL=true \
GPU_IDS=0,1 \
MAX_JOBS_PER_GPU=2 \
bash evaluation/RoboTwin/eval_randomized_50.sh
```

Useful environment variables:

- `PRETRAINED_CKPT`: model repo id or local checkpoint directory
- `GPU_IDS`: comma-separated GPU ids, for example `0,1,2,3`
- `MAX_JOBS_PER_GPU`: parallel RoboTwin jobs per GPU
- `TASK_CONFIG`: defaults to `demo_randomized`
- `TEST_NUM`: number of episodes per task
- `DTYPE`: `bfloat16` or `float32`
- `BASE_OUTPUT_PATH`: output root directory

This script writes:

- per-task logs and videos under `tasks/`
- aggregated `summary.json`
- aggregated `summary.txt`

## 8. Evaluate a Continuous Task Range

`eval_randomized_50.sh` supports continuous ranges through:

- `START_TASK_IDX`
- `TASK_COUNT`

Example: evaluate tasks `10` to `19`:

```bash
PRETRAINED_CKPT=zaleni/MagicBot-VGA-Robotwin \
QWEN3_VL_PRETRAINED_PATH=Qwen/Qwen3-VL-2B-Instruct \
QWEN3_VL_PROCESSOR_PATH=Qwen/Qwen3-VL-2B-Instruct \
COSMOS_TOKENIZER_PATH_OR_NAME=nvidia/Cosmos-Tokenizer-CI8x8 \
DISABLE_DA3_TEACHER_FOR_EVAL=true \
START_TASK_IDX=10 \
TASK_COUNT=10 \
bash evaluation/RoboTwin/eval_randomized_50.sh
```

## 9. Evaluate the CVPR 2026 RoboTwin Track 11-Task Subset

For the Hugging Face leaderboard
[`open-gigaai/CVPR-2026-RoboTwin-Track-LeaderBoard`](https://huggingface.co/spaces/open-gigaai/CVPR-2026-RoboTwin-Track-LeaderBoard),
we use the following 11-task subset:

```text
[2, 3, 9, 10, 12, 15, 17, 25, 28, 30, 44]
```

The exact task names in `evaluation/RoboTwin/inference.py` are:

- `blocks_ranking_rgb`
- `blocks_ranking_size`
- `handover_mic`
- `hanging_mug`
- `move_can_pot`
- `move_stapler_pad`
- `open_microwave`
- `place_can_basket`
- `place_dual_shoes`
- `place_fan`
- `stack_blocks_three`

The current batch script does not take a sparse task list directly, so the recommended approach is to run a shell loop:

```bash
cd third_party/RoboTwin

TASKS=(2 3 9 10 12 15 17 25 28 30 44)
for t in "${TASKS[@]}"; do
  python ../../evaluation/RoboTwin/inference.py \
    --args.ckpt_path zaleni/MagicBot-VGA-Robotwin \
    --args.video_dir ../../evaluation/RoboTwin/output_magicbot/custom_subset/task_${t} \
    --args.task_config demo_randomized \
    --args.task_idx "${t}" \
    --args.action_mode delta \
    --args.stats_key aloha \
    --args.dtype bfloat16 \
    --args.qwen3_vl_pretrained_path Qwen/Qwen3-VL-2B-Instruct \
    --args.qwen3_vl_processor_path Qwen/Qwen3-VL-2B-Instruct \
    --args.cosmos_tokenizer_path_or_name nvidia/Cosmos-Tokenizer-CI8x8 \
    --args.disable_3d_teacher_for_eval
done
```

This produces one output directory per task, each containing replay videos plus `summary.json` and `summary.txt`.

## 10. Package the 11-Task Submission and Export Success Rates

After you finish the randomized evaluation run, you can convert those 11 tasks into a submission-style folder with:

```bash
python util_scripts/package_robotwin_submission.py \
  --run /path/to/output_randomized_50/<run_name>/summary.txt \
  --dst /path/to/output_randomized_50/<run_name>/submission_package \
  --overwrite
```

If you also want to bundle a policy folder, add:

```bash
  --policy-dir /path/to/policy/Your_Policy
```

The packaging script will:

- create `submission_package/<task_name>/episode0.mp4`, `episode1.mp4`, ...
- preserve the 11-task ordering by task index
- write `package_manifest.txt`
- write `selected_task_summary.json`
- write `selected_task_summary.txt`

The selected-task summary files include:

- per-task `success_rate`
- per-task `success_count` and `test_num`
- `avg_task_success_rate` across the 11 tasks
- `overall_episode_success_rate` across all episodes in the 11-task subset

This is useful when you want a leaderboard-facing summary for the competition subset rather than the full randomized-50 report.

## 11. Task Index Reference

Task indices are defined in [`evaluation/RoboTwin/inference.py`](evaluation/RoboTwin/inference.py).

For example:

- `0`: `adjust_bottle`
- `2`: `blocks_ranking_rgb`
- `3`: `blocks_ranking_size`
- `9`: `handover_mic`
- `10`: `hanging_mug`
- `12`: `move_can_pot`
- `15`: `move_stapler_pad`
- `17`: `open_microwave`
- `25`: `place_can_basket`
- `28`: `place_dual_shoes`
- `30`: `place_fan`
- `44`: `stack_blocks_three`

## 12. Common Notes

- `inference.py` can load checkpoints from either a local directory or a Hugging Face repo id.
- If your server cannot access Hugging Face online, download the external assets in advance and pass local paths.
- If you use the lightweight checkpoint release for action evaluation, keeping `--args.disable_3d_teacher_for_eval` enabled is recommended.
- If you want to inspect reconstructed future images during inference, enable `--args.decode_image_flag`, though this is not required for standard RoboTwin scoring.

## 13. Model Link

Released RoboTwin checkpoint:

- https://huggingface.co/zaleni/MagicBot-VGA-Robotwin

## 14. Acknowledgments

MagicBot-VGA is developed on top of the excellent InternVLA framework. Our codebase
started from that foundation and has since been substantially modified and extended
for our own model architecture, training pipeline, and evaluation workflow.

We sincerely thank the [InternVLA](https://github.com/InternRobotics/InternVLA-A1)
authors and contributors for open-sourcing their framework and making follow-up
research and development much easier.

We also thank the following open-source projects:

- [InternVLA](https://github.com/InternRobotics/InternVLA-A1)
- [LeRobot](https://github.com/huggingface/lerobot)
- [RoboTwin](https://github.com/RoboTwin-Platform/RoboTwin)
- [Qwen3-VL](https://github.com/QwenLM/Qwen3-VL)
- [NVIDIA Cosmos](https://github.com/nvidia-cosmos)