reveal_vla_bimanual
Simulation-first prototype for a language-conditioned bimanual reveal-and-retrieve policy under elastic occlusion.
This repo is not a generalist VLA backbone in the RT-2 / OpenVLA / Octo sense. The current contribution is the reveal-state machinery layered on top of a frozen vision-language encoder.
This repo is structured around five top-level modules:
sim_rlbench/: RLBench2 / PerAct2 wrappers, dataset hooks, camera setup, and benchmark evaluation helpers.sim_reveal/: reveal-proxy environments, scripted teachers, and privileged label extraction.models/: shared backbone wrappers, multi-view fusion, bimanual decoder, reveal-state head, world model, and planner.train/: trainers, losses, checkpointing, and Hydra/YAML configs.eval/: benchmark scripts, ablations, metrics, plots, and report generation.
Current bootstrap priorities:
- Reproduce the RLBench2 / PerAct2 stack with a fixed 3-camera interface.
- Stand up a backbone-only 3-camera policy in the same training/eval harness.
- Add reveal-state supervision and short-horizon planning for synthetic reveal proxies.
Upstream dependencies are kept in /workspace/third_party and pinned in docs/upstream_pins.md.
RLBench env A
The RLBench / PerAct2 stack is pinned to Python 3.10 and lives in /workspace/envs/rlbench.
Bring it up with:
/workspace/reveal_vla_bimanual/scripts/setup_env_a_rlbench.sh
/workspace/reveal_vla_bimanual/scripts/setup_rlbench_headless_x.sh
/workspace/reveal_vla_bimanual/scripts/start_rlbench_x.sh
Verify GPU GL on the headless display:
DISPLAY=:99 glxinfo -B
Run the RLBench launch/reset/step smoke test:
env \
DISPLAY=:99 \
XDG_RUNTIME_DIR=/tmp/runtime-root \
COPPELIASIM_ROOT=/workspace/assets/coppeliasim_v4_1_0 \
LD_LIBRARY_PATH=/workspace/system_shims/nvidia$(nvidia-smi --query-gpu=driver_version --format=csv,noheader | head -n1 | cut -d. -f1)/usr/lib/x86_64-linux-gnu:/workspace/system_shims/nvidia$(nvidia-smi --query-gpu=driver_version --format=csv,noheader | head -n1 | cut -d. -f1)/usr/lib/x86_64-linux-gnu/nvidia:/workspace/assets/coppeliasim_v4_1_0 \
QT_QPA_PLATFORM_PLUGIN_PATH=/workspace/assets/coppeliasim_v4_1_0 \
/workspace/.tools/micromamba/bin/micromamba run \
-r /workspace/.micromamba \
-p /workspace/envs/rlbench \
python -m sim_rlbench.launch_smoke --headless
The working benchmark interface is fixed to three cameras only:
frontwrist_leftwrist_right
The smoke test covers launch, bimanual task reset, canonical observation extraction, and one bimanual action step in headless=True, which is the same mode used by the upstream PerAct2-style training stack.
Generate the PerAct2-compatible train command for the fixed 3-camera interface with:
micromamba run -r /workspace/.micromamba -p /workspace/envs/rlbench \
python -m sim_rlbench.smoke_test --print-train-command
Download the published PerAct2 demos into /workspace/data/rlbench2 with checksum verification:
micromamba run -r /workspace/.micromamba -p /workspace/envs/rlbench \
python -m sim_rlbench.dataset_download --resolution 256 --splits train
If you want the archives unpacked directly into the demo root expected by RLBench, add --extract:
apt-get install -y squashfs-tools
micromamba run -r /workspace/.micromamba -p /workspace/envs/rlbench \
python -m sim_rlbench.dataset_download --resolution 256 --splits train --extract