VLAarchTestsBench / docs /ENVIRONMENT_NOTES.md
lsnu's picture
Add files using upload-large-folder tool
a1fc554 verified

Environment Notes

This benchmark bundle was validated on a Linux GPU machine with:

  • CoppeliaSim v4.1.0 available at COPPELIASIM_ROOT
  • xvfb and xauth installed
  • Qt xcb runtime libraries installed:
    • libxrender1
    • libxkbcommon0
    • libxkbcommon-x11-0
    • libxcb-icccm4
    • libxcb-image0
    • libxcb-keysyms1
    • libxcb-randr0
    • libxcb-render-util0
    • libxcb-shape0
    • libxcb-shm0
    • libxcb-sync1
    • libxcb-xfixes0
    • libxcb-xinerama0
    • libxcb-xkb1

The successful RLBench2 oven evaluation was run with:

xvfb-run -a -s "-screen 0 1400x900x24" python \
  online_evaluation_rlbench/evaluate_policy.py \
  --checkpoint models/3dfa_peract2/3dfa_peract2.pth \
  --task bimanual_take_tray_out_of_oven \
  --data_dir data/3dfa/peract2_test \
  --dataset Peract2_3dfront_3dwrist \
  --image_size 256,256 \
  --model_type denoise3d \
  --bimanual true \
  --prediction_len 1 \
  --backbone clip \
  --fps_subsampling_factor 4 \
  --embedding_dim 120 \
  --num_attn_heads 8 \
  --num_vis_instr_attn_layers 3 \
  --num_history 3 \
  --num_shared_attn_layers 4 \
  --relative_action false \
  --rotation_format quat_xyzw \
  --denoise_timesteps 5 \
  --denoise_model rectified_flow

The take shoes out of box validation path was run from the bundled PointFlowMatch source on a Blackwell GPU machine after upgrading the RLBench environment to Torch 2.11.0+cu128 / torchvision 0.26.0+cu128 / torchaudio 2.11.0+cu128. The bundled PointFlowMatch tree also contains two local compatibility fixes required for this workspace:

  • pfp/envs/rlbench_env.py
    • adapts PointFlowMatch to the local RLBench camera naming and observation API
    • broadens motion-planning failure recovery to handle simulator-side runtime failures
  • pfp/policy/fm_policy.py
    • adds an inference-only fallback when legacy composer cannot import on modern Torch
    • loads checkpoints with weights_only=False for PyTorch 2.6+

The shoes evaluation command used here was:

scripts/run_pointflowmatch_take_shoes_out_of_box.sh 10 50

That wrapper expands to the equivalent raw command:

export PYTHONPATH=third_party/diffusion_policy:third_party/PointFlowMatch:${PYTHONPATH:-}
export COPPELIASIM_ROOT=/path/to/CoppeliaSim
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH:-}:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT
xvfb-run -a -s "-screen 0 1400x900x24" python \
  third_party/PointFlowMatch/scripts/evaluate.py \
  log_wandb=False \
  env_runner.env_config.vis=False \
  env_runner.num_episodes=10 \
  env_runner.max_episode_length=200 \
  policy.ckpt_name=1717447341-indigo-quokka/1717447341-indigo-quokka \
  policy.num_k_infer=50

Result note for shoes:

  • reports/pointflowmatch_take_shoes_out_of_box_ep10_k50_gpu/summary.json records a verified non-zero result before a later RLBench/PyRep crash in the same longer rollout.