TTI / Dev /README_VLAC_INTEGRATION.md
JosephBai's picture
Upload folder using huggingface_hub
857c2e9 verified
# VLAC Integration with SimpleVLA-RL
This document describes the integration of Vision-Language-Action-Critic (VLAC) into the SimpleVLA-RL training framework.
## Overview
The VLAC integration replaces the simulator's terminal success signal during training with VLAC-predicted values, while preserving the simulator signal for evaluation. This enables more nuanced reward signals and better task progress estimation.
## Architecture
```
SimpleVLA-RL Training Process:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” HTTP/JSON β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Training │◄─────────────────►│ VLAC Service β”‚
β”‚ (rob_rollout) β”‚ (done check, β”‚ (port 8111) β”‚
β”‚ β”‚ terminal value) β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### Key Components
1. **VLAC Service** (`vlac_service.py`): HTTP API exposing VLAC model functionality
2. **VLAC Client** (`verl/utils/vlac_client.py`): Python client for service communication
3. **Enhanced Rollout** (`verl/workers/rollout/rob_rollout.py`): Modified rollout with VLAC integration
## Usage
### 1. Start VLAC Service
```bash
# Start VLAC service (required for training)
python vlac_service.py --port 8111 --gpu-ids 0,1,2,3
```
### 2. Training with VLAC
```bash
# Run training with VLAC integration enabled
bash examples/run_openvla_oft_rl_vlac.sh
```
**Key Variables (edit at top of script):**
```bash
PROJECT_NAME='SimpleVLA-RL-VLAC'
EXPERIMENT_NAME='vlac-libero10-sftall_node1_trial'
SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
DATASET_NAME="libero_10"
VLAC_SERVICE_URL="http://localhost:8111"
```
**Key VLAC Configuration:**
```yaml
+actor_rollout_ref.rollout.use_vlac=true
+actor_rollout_ref.rollout.vlac_service_url=$VLAC_SERVICE_URL
trainer.val_before_train=False # Avoid val_only issue
```
### 3. Evaluation (Environment Done)
```bash
# Run evaluation with environment done signal
bash examples/eval_openvla_oft_vlac.sh
```
**Key Variables (edit at top of script):**
```bash
PROJECT_NAME='SimpleVLA-RL-VLAC-Eval'
EXPERIMENT_NAME='vlac-libero10-sftall_node1_eval'
SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
DATASET_NAME="libero_10"
```
**Key Configuration:**
```yaml
trainer.val_only=True # Pure evaluation mode
+actor_rollout_ref.rollout.use_vlac=false # Explicit disable
```
## Training Logic
### Training Mode (`use_vlac=true`, `val_only=false`)
1. **Episode Step**: After each action, collect trajectory frames
2. **Done Check**: Call VLAC `/done` endpoint with (first_frame, prev_frame, curr_frame)
- If VLAC says `done=true`: terminate episode, reward = 1.0
- Otherwise: continue episode
3. **Max Steps**: When max steps reached:
- Call VLAC `/trajectory-critic` endpoint
- Use final value as terminal reward (normalized to 0-1)
### Evaluation Mode (`val_only=true`)
- Use original environment `done` signal
- No VLAC service calls required
- Compute success rate using simulator feedback
## Integration Details
### Environment Worker (`env_worker`)
**New Features:**
- Trajectory frame collection (`trajectory_frames[]`)
- VLAC client initialization per worker process
- VLAC done detection after each step
- Terminal value computation at episode end
**New Output Fields:**
```python
{
'vlac_done': bool, # Whether VLAC detected completion
'terminal_reward': float # VLAC-computed terminal reward (0-1)
}
```
### Rollout Class (`RobHFRollout`)
**New Configuration:**
```python
self.use_vlac = getattr(config, 'use_vlac', False)
self.vlac_service_url = getattr(config, 'vlac_service_url', 'http://localhost:8111')
```
**New Batch Fields:**
```python
batch["vlac_done"] = torch.tensor(...) # VLAC termination flags
batch["terminal_reward"] = torch.tensor(...) # VLAC terminal rewards
```
### VLAC Client (`VLACClient`)
**Key Methods:**
- `check_done()`: Episode termination detection
- `compute_trajectory_values()`: Terminal value computation
- `pairwise_critic()`: Frame comparison (optional)
**Error Handling:**
- Graceful fallback if VLAC service unavailable
- Automatic retry logic for transient failures
- Timeout protection for long-running requests
## Configuration Options
| Parameter | Default | Description |
|-----------|---------|-------------|
| `use_vlac` | `false` | Enable VLAC integration |
| `vlac_service_url` | `http://localhost:8111` | VLAC service endpoint |
| `val_only` | `false` | Evaluation mode (disables VLAC) |
## Performance Considerations
### GPU Memory Sharing
- VLAC service: ~20-30 GB during inference
- SimpleVLA-RL: ~60-70 GB during training
- Total: fits comfortably on H100 80GB cards
### Latency Impact
- Done check: ~300-800ms per step (depends on frames)
- Terminal value: ~1-5s per episode (depends on trajectory length)
- Overall training throughput: ~10-20% slower due to VLAC calls
### Scaling
- Multiple VLAC service instances on different GPUs
- Load balancing across service instances
- Batch optimization for trajectory processing
## Debugging & Monitoring
### Service Health
```bash
# Check VLAC service status
curl -X POST http://localhost:8111/healthcheck
# Enable debug image saving
export VLAC_SAVE_INPUTS=1
```
### Training Logs
```
VLAC integration enabled. Service URL: http://localhost:8111
Training mode: True
VLAC detected task completion at step 45 (prob: 0.847)
Max steps reached, computing VLAC terminal value...
VLAC terminal value: 0.632
```
### Common Issues
**Service Connection Failed:**
- Verify VLAC service is running: `ps aux | grep vlac_service`
- Check service logs for errors
- Test manual service calls
**Out of Memory:**
- Reduce VLAC batch sizes in service
- Use fewer reference images
- Monitor GPU usage: `nvidia-smi`
**Slow Training:**
- Check VLAC service response times
- Reduce trajectory frame collection frequency
- Use multiple VLAC service instances
## File Structure
```
SimpleVLA-RL/
β”œβ”€β”€ vlac_service.py # VLAC HTTP service
β”œβ”€β”€ test_vlac_service.py # Service test suite
β”œβ”€β”€ vlac_service_contract.md # API specification
β”œβ”€β”€ README_VLAC_SERVICE.md # Service documentation
β”œβ”€β”€ requirements_vlac_service.txt # Service dependencies
β”œβ”€β”€ examples/
β”‚ β”œβ”€β”€ run_openvla_oft_rl_vlac.sh # Training with VLAC
β”‚ └── eval_openvla_oft_vlac.sh # Evaluation script
└── verl/
β”œβ”€β”€ utils/vlac_client.py # VLAC service client
└── workers/rollout/rob_rollout.py # Enhanced rollout worker
```
## Next Steps
1. **Performance Optimization**:
- Implement request batching
- Add async processing
- Cache frequent computations
2. **Robustness**:
- Add circuit breaker pattern
- Implement request queuing
- Add health monitoring
3. **Advanced Features**:
- Reference frame caching
- Multi-task adaptation
- Progressive difficulty scaling
## Support
For questions or issues with VLAC integration:
1. Check service health endpoints
2. Review training logs for VLAC messages
3. Test service manually with `test_vlac_service.py`
4. Verify configuration parameters match examples