| # VLAC Integration with SimpleVLA-RL |
|
|
| This document describes the integration of Vision-Language-Action-Critic (VLAC) into the SimpleVLA-RL training framework. |
|
|
| ## Overview |
|
|
| The VLAC integration replaces the simulator's terminal success signal during training with VLAC-predicted values, while preserving the simulator signal for evaluation. This enables more nuanced reward signals and better task progress estimation. |
|
|
| ## Architecture |
|
|
| ``` |
| SimpleVLA-RL Training Process: |
| βββββββββββββββββββ HTTP/JSON βββββββββββββββββββ |
| β Training ββββββββββββββββββββΊβ VLAC Service β |
| β (rob_rollout) β (done check, β (port 8111) β |
| β β terminal value) β β |
| βββββββββββββββββββ βββββββββββββββββββ |
| ``` |
|
|
| ### Key Components |
|
|
| 1. **VLAC Service** (`vlac_service.py`): HTTP API exposing VLAC model functionality |
| 2. **VLAC Client** (`verl/utils/vlac_client.py`): Python client for service communication |
| 3. **Enhanced Rollout** (`verl/workers/rollout/rob_rollout.py`): Modified rollout with VLAC integration |
|
|
| ## Usage |
|
|
| ### 1. Start VLAC Service |
|
|
| ```bash |
| # Start VLAC service (required for training) |
| python vlac_service.py --port 8111 --gpu-ids 0,1,2,3 |
| ``` |
|
|
| ### 2. Training with VLAC |
|
|
| ```bash |
| # Run training with VLAC integration enabled |
| bash examples/run_openvla_oft_rl_vlac.sh |
| ``` |
|
|
| **Key Variables (edit at top of script):** |
| ```bash |
| PROJECT_NAME='SimpleVLA-RL-VLAC' |
| EXPERIMENT_NAME='vlac-libero10-sftall_node1_trial' |
| SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall" |
| DATASET_NAME="libero_10" |
| VLAC_SERVICE_URL="http://localhost:8111" |
| ``` |
|
|
| **Key VLAC Configuration:** |
| ```yaml |
| +actor_rollout_ref.rollout.use_vlac=true |
| +actor_rollout_ref.rollout.vlac_service_url=$VLAC_SERVICE_URL |
| trainer.val_before_train=False # Avoid val_only issue |
| ``` |
|
|
| ### 3. Evaluation (Environment Done) |
|
|
| ```bash |
| # Run evaluation with environment done signal |
| bash examples/eval_openvla_oft_vlac.sh |
| ``` |
|
|
| **Key Variables (edit at top of script):** |
| ```bash |
| PROJECT_NAME='SimpleVLA-RL-VLAC-Eval' |
| EXPERIMENT_NAME='vlac-libero10-sftall_node1_eval' |
| SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall" |
| DATASET_NAME="libero_10" |
| ``` |
|
|
| **Key Configuration:** |
| ```yaml |
| trainer.val_only=True # Pure evaluation mode |
| +actor_rollout_ref.rollout.use_vlac=false # Explicit disable |
| ``` |
|
|
| ## Training Logic |
|
|
| ### Training Mode (`use_vlac=true`, `val_only=false`) |
|
|
| 1. **Episode Step**: After each action, collect trajectory frames |
| 2. **Done Check**: Call VLAC `/done` endpoint with (first_frame, prev_frame, curr_frame) |
| - If VLAC says `done=true`: terminate episode, reward = 1.0 |
| - Otherwise: continue episode |
| 3. **Max Steps**: When max steps reached: |
| - Call VLAC `/trajectory-critic` endpoint |
| - Use final value as terminal reward (normalized to 0-1) |
| |
| ### Evaluation Mode (`val_only=true`) |
|
|
| - Use original environment `done` signal |
| - No VLAC service calls required |
| - Compute success rate using simulator feedback |
|
|
| ## Integration Details |
|
|
| ### Environment Worker (`env_worker`) |
| |
| **New Features:** |
| - Trajectory frame collection (`trajectory_frames[]`) |
| - VLAC client initialization per worker process |
| - VLAC done detection after each step |
| - Terminal value computation at episode end |
|
|
| **New Output Fields:** |
| ```python |
| { |
| 'vlac_done': bool, # Whether VLAC detected completion |
| 'terminal_reward': float # VLAC-computed terminal reward (0-1) |
| } |
| ``` |
|
|
| ### Rollout Class (`RobHFRollout`) |
|
|
| **New Configuration:** |
| ```python |
| self.use_vlac = getattr(config, 'use_vlac', False) |
| self.vlac_service_url = getattr(config, 'vlac_service_url', 'http://localhost:8111') |
| ``` |
|
|
| **New Batch Fields:** |
| ```python |
| batch["vlac_done"] = torch.tensor(...) # VLAC termination flags |
| batch["terminal_reward"] = torch.tensor(...) # VLAC terminal rewards |
| ``` |
|
|
| ### VLAC Client (`VLACClient`) |
|
|
| **Key Methods:** |
| - `check_done()`: Episode termination detection |
| - `compute_trajectory_values()`: Terminal value computation |
| - `pairwise_critic()`: Frame comparison (optional) |
|
|
| **Error Handling:** |
| - Graceful fallback if VLAC service unavailable |
| - Automatic retry logic for transient failures |
| - Timeout protection for long-running requests |
|
|
| ## Configuration Options |
|
|
| | Parameter | Default | Description | |
| |-----------|---------|-------------| |
| | `use_vlac` | `false` | Enable VLAC integration | |
| | `vlac_service_url` | `http://localhost:8111` | VLAC service endpoint | |
| | `val_only` | `false` | Evaluation mode (disables VLAC) | |
|
|
| ## Performance Considerations |
|
|
| ### GPU Memory Sharing |
| - VLAC service: ~20-30 GB during inference |
| - SimpleVLA-RL: ~60-70 GB during training |
| - Total: fits comfortably on H100 80GB cards |
|
|
| ### Latency Impact |
| - Done check: ~300-800ms per step (depends on frames) |
| - Terminal value: ~1-5s per episode (depends on trajectory length) |
| - Overall training throughput: ~10-20% slower due to VLAC calls |
|
|
| ### Scaling |
| - Multiple VLAC service instances on different GPUs |
| - Load balancing across service instances |
| - Batch optimization for trajectory processing |
|
|
| ## Debugging & Monitoring |
|
|
| ### Service Health |
| ```bash |
| # Check VLAC service status |
| curl -X POST http://localhost:8111/healthcheck |
| |
| # Enable debug image saving |
| export VLAC_SAVE_INPUTS=1 |
| ``` |
|
|
| ### Training Logs |
| ``` |
| VLAC integration enabled. Service URL: http://localhost:8111 |
| Training mode: True |
| VLAC detected task completion at step 45 (prob: 0.847) |
| Max steps reached, computing VLAC terminal value... |
| VLAC terminal value: 0.632 |
| ``` |
|
|
| ### Common Issues |
|
|
| **Service Connection Failed:** |
| - Verify VLAC service is running: `ps aux | grep vlac_service` |
| - Check service logs for errors |
| - Test manual service calls |
|
|
| **Out of Memory:** |
| - Reduce VLAC batch sizes in service |
| - Use fewer reference images |
| - Monitor GPU usage: `nvidia-smi` |
|
|
| **Slow Training:** |
| - Check VLAC service response times |
| - Reduce trajectory frame collection frequency |
| - Use multiple VLAC service instances |
|
|
| ## File Structure |
|
|
| ``` |
| SimpleVLA-RL/ |
| βββ vlac_service.py # VLAC HTTP service |
| βββ test_vlac_service.py # Service test suite |
| βββ vlac_service_contract.md # API specification |
| βββ README_VLAC_SERVICE.md # Service documentation |
| βββ requirements_vlac_service.txt # Service dependencies |
| βββ examples/ |
| β βββ run_openvla_oft_rl_vlac.sh # Training with VLAC |
| β βββ eval_openvla_oft_vlac.sh # Evaluation script |
| βββ verl/ |
| βββ utils/vlac_client.py # VLAC service client |
| βββ workers/rollout/rob_rollout.py # Enhanced rollout worker |
| ``` |
|
|
| ## Next Steps |
|
|
| 1. **Performance Optimization**: |
| - Implement request batching |
| - Add async processing |
| - Cache frequent computations |
|
|
| 2. **Robustness**: |
| - Add circuit breaker pattern |
| - Implement request queuing |
| - Add health monitoring |
|
|
| 3. **Advanced Features**: |
| - Reference frame caching |
| - Multi-task adaptation |
| - Progressive difficulty scaling |
|
|
| ## Support |
|
|
| For questions or issues with VLAC integration: |
| 1. Check service health endpoints |
| 2. Review training logs for VLAC messages |
| 3. Test service manually with `test_vlac_service.py` |
| 4. Verify configuration parameters match examples |
|
|