# Installation Guide This guide provides step-by-step instructions for setting up EVOLVE-VLA. We maintain **two separate conda environments**: 1. **`evolve-vla`**: For RL training (verl framework, OpenVLA, LIBERO) 2. **`vlac`**: For VLAC reward model service --- ## System Requirements - **OS**: Linux (Ubuntu 20.04/22.04 recommended) - **GPU**: NVIDIA GPU with CUDA 12.1 support. Recommended: H100 80GB for distributed training - **CUDA**: 12.1 - **Python**: 3.10 --- ## Environment 1: RL Training (evolve-vla) **Important**: Follow the exact order below to avoid dependency conflicts. ```bash # Create conda environment conda create -n evolve-vla python=3.10 -y conda activate evolve-vla # Update pip and setuptools (critical for LIBERO installation) pip install setuptools==78.1.1 pip==23.0 # Install verl framework cd /path/to/EVOLVE-VLA pip install --no-deps -e verl/ # Install OpenVLA-OFT (will install its own dependencies including torch) cd /path/to/workspace git clone https://github.com/moojink/openvla-oft.git cd openvla-oft pip install -e . # Install LIBERO benchmark cd /path/to/workspace git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git cd LIBERO pip install -e . pip install -r experiments/robot/libero/libero_requirements.txt # Install additional tools pip install packaging ninja pip install git+https://github.com/NICTA/pyairports.git # CRITICAL: Reinstall correct PyTorch version (OpenVLA-OFT/LIBERO may have installed different versions) pip uninstall -y torch torchvision torchaudio pip3 install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \ --index-url https://download.pytorch.org/whl/cu121 # Install customized transformers for OpenVLA pip install transformers@git+https://github.com/moojink/transformers-openvla-oft.git # Install Flash Attention pip uninstall -y flash_attn pip install flash-attn==2.5.5 --no-build-isolation --no-cache-dir # Install remaining dependencies pip install tensordict==0.9.0 click==8.2.1 pip install "ray[default]==2.9.0" pip install wandb # For experiment tracking # Install MuJoCo rendering dependencies conda install -c conda-forge -y libegl-devel libstdcxx-ng # System packages (requires sudo, mainly for simulation rendering) sudo apt install -y libosmesa6 libosmesa6-dev sudo apt-get install -y libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev libglew-dev ``` --- ## Environment 2: Reward Model Service (vlac) ```bash # Create conda environment conda create -n vlac python=3.10 -y conda activate vlac # Install VLAC dependencies first (before PyTorch) pip install ms-swift==3.3 transformers==4.51.0 peft==0.15.2 pip install opencv-python loguru timm # Install PyTorch pip3 install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \ --index-url https://download.pytorch.org/whl/cu121 # Install Flash Attention pip install packaging ninja pip install flash-attn==2.5.5 --no-build-isolation --no-cache-dir # Download VLAC checkpoint cd /path/to/EVOLVE-VLA mkdir -p checkpoints/VLAC # Download checkpoint from HuggingFace: https://huggingface.co/InternRobotics/VLAC # Set VLAC checkpoint path for service startup export VLAC_CKPT_PATH=/path/to/EVOLVE-VLA/checkpoints/VLAC ``` --- ## Ray Cluster Setup (Optional, for Multi-Node Training) Ray is used for distributed training across multiple nodes. **If you're training on a single node, you can skip this section** - Ray will be automatically initialized by the training script. For multi-node distributed training (recommended for reproducing paper results): **On Head Node (Machine 1):** ```bash # Activate environment conda activate evolve-vla # Start Ray head MUJOCO_GL=osmesa PYOPENGL_PLATFORM=osmesa ray start --head --port=6379 # The shell will show the head node IP ``` **On Worker Nodes (Machine 2, 3, ...):** ```bash # Activate environment conda activate evolve-vla # Connect to head node (replace with actual IP from above) MUJOCO_GL=osmesa PYOPENGL_PLATFORM=osmesa ray start --address=':6379' # Example: # ray start --address='10.124.104.163:6379' ``` **Verify Cluster:** ```bash # On any node ray status ``` You should see all nodes with their CPU/GPU resources. **Stopping Ray:** ```bash ray stop # Stop Ray on current node ray stop --force # Stop and clean up ``` --- ## Next Steps After successful installation: 1. **Setup VLAC Service**: Follow [README Quick Start](../README.md#-quick-start) 2. **Set training environment variables**: - `EVOLVE_SFT_CHECKPOINT` - `EVOLVE_OUTPUT_DIR` - `EVOLVE_ALIGN_JSON` 3. **Run reproduction checklist**: see [REPRODUCTION.md](REPRODUCTION.md) 4. **Run training**: check [Quick Start](../README.md#-quick-start) in main README