lsnu's picture
Add files using upload-large-folder tool
10471c5 verified

reveal_vla_bimanual

Simulation-first prototype for a language-conditioned bimanual reveal-and-retrieve policy under elastic occlusion.

This repo is not a generalist VLA backbone in the RT-2 / OpenVLA / Octo sense. The current contribution is the reveal-state machinery layered on top of a frozen vision-language encoder.

This repo is structured around five top-level modules:

  • sim_rlbench/: RLBench2 / PerAct2 wrappers, dataset hooks, camera setup, and benchmark evaluation helpers.
  • sim_reveal/: reveal-proxy environments, scripted teachers, and privileged label extraction.
  • models/: shared backbone wrappers, multi-view fusion, bimanual decoder, reveal-state head, world model, and planner.
  • train/: trainers, losses, checkpointing, and Hydra/YAML configs.
  • eval/: benchmark scripts, ablations, metrics, plots, and report generation.

Current bootstrap priorities:

  1. Reproduce the RLBench2 / PerAct2 stack with a fixed 3-camera interface.
  2. Stand up a backbone-only 3-camera policy in the same training/eval harness.
  3. Add reveal-state supervision and short-horizon planning for synthetic reveal proxies.

Upstream dependencies are kept in /workspace/third_party and pinned in docs/upstream_pins.md.

RLBench env A

The RLBench / PerAct2 stack is pinned to Python 3.10 and lives in /workspace/envs/rlbench.

Bring it up with:

/workspace/reveal_vla_bimanual/scripts/setup_env_a_rlbench.sh
/workspace/reveal_vla_bimanual/scripts/setup_rlbench_headless_x.sh
/workspace/reveal_vla_bimanual/scripts/start_rlbench_x.sh

Verify GPU GL on the headless display:

DISPLAY=:99 glxinfo -B

Run the RLBench launch/reset/step smoke test:

env \
  DISPLAY=:99 \
  XDG_RUNTIME_DIR=/tmp/runtime-root \
  COPPELIASIM_ROOT=/workspace/assets/coppeliasim_v4_1_0 \
  LD_LIBRARY_PATH=/workspace/system_shims/nvidia$(nvidia-smi --query-gpu=driver_version --format=csv,noheader | head -n1 | cut -d. -f1)/usr/lib/x86_64-linux-gnu:/workspace/system_shims/nvidia$(nvidia-smi --query-gpu=driver_version --format=csv,noheader | head -n1 | cut -d. -f1)/usr/lib/x86_64-linux-gnu/nvidia:/workspace/assets/coppeliasim_v4_1_0 \
  QT_QPA_PLATFORM_PLUGIN_PATH=/workspace/assets/coppeliasim_v4_1_0 \
  /workspace/.tools/micromamba/bin/micromamba run \
    -r /workspace/.micromamba \
    -p /workspace/envs/rlbench \
    python -m sim_rlbench.launch_smoke --headless

The working benchmark interface is fixed to three cameras only:

  • front
  • wrist_left
  • wrist_right

The smoke test covers launch, bimanual task reset, canonical observation extraction, and one bimanual action step in headless=True, which is the same mode used by the upstream PerAct2-style training stack.

Generate the PerAct2-compatible train command for the fixed 3-camera interface with:

micromamba run -r /workspace/.micromamba -p /workspace/envs/rlbench \
  python -m sim_rlbench.smoke_test --print-train-command

Download the published PerAct2 demos into /workspace/data/rlbench2 with checksum verification:

micromamba run -r /workspace/.micromamba -p /workspace/envs/rlbench \
  python -m sim_rlbench.dataset_download --resolution 256 --splits train

If you want the archives unpacked directly into the demo root expected by RLBench, add --extract:

apt-get install -y squashfs-tools
micromamba run -r /workspace/.micromamba -p /workspace/envs/rlbench \
  python -m sim_rlbench.dataset_download --resolution 256 --splits train --extract