File size: 7,342 Bytes

857c2e9

# VLAC Integration with SimpleVLA-RL

This document describes the integration of Vision-Language-Action-Critic (VLAC) into the SimpleVLA-RL training framework.

## Overview

The VLAC integration replaces the simulator's terminal success signal during training with VLAC-predicted values, while preserving the simulator signal for evaluation. This enables more nuanced reward signals and better task progress estimation.

## Architecture

```
SimpleVLA-RL Training Process:
┌─────────────────┐    HTTP/JSON     ┌─────────────────┐
│   Training      │◄─────────────────►│   VLAC Service  │
│   (rob_rollout) │    (done check,   │   (port 8111)   │
│                 │   terminal value) │                 │
└─────────────────┘                   └─────────────────┘
```

### Key Components

1. **VLAC Service** (`vlac_service.py`): HTTP API exposing VLAC model functionality
2. **VLAC Client** (`verl/utils/vlac_client.py`): Python client for service communication  
3. **Enhanced Rollout** (`verl/workers/rollout/rob_rollout.py`): Modified rollout with VLAC integration

## Usage

### 1. Start VLAC Service

```bash
# Start VLAC service (required for training)
python vlac_service.py --port 8111 --gpu-ids 0,1,2,3
```

### 2. Training with VLAC

```bash
# Run training with VLAC integration enabled
bash examples/run_openvla_oft_rl_vlac.sh
```

**Key Variables (edit at top of script):**
```bash
PROJECT_NAME='SimpleVLA-RL-VLAC'
EXPERIMENT_NAME='vlac-libero10-sftall_node1_trial'
SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
DATASET_NAME="libero_10"
VLAC_SERVICE_URL="http://localhost:8111"
```

**Key VLAC Configuration:**
```yaml
+actor_rollout_ref.rollout.use_vlac=true
+actor_rollout_ref.rollout.vlac_service_url=$VLAC_SERVICE_URL
trainer.val_before_train=False  # Avoid val_only issue
```

### 3. Evaluation (Environment Done)

```bash
# Run evaluation with environment done signal  
bash examples/eval_openvla_oft_vlac.sh
```

**Key Variables (edit at top of script):**
```bash
PROJECT_NAME='SimpleVLA-RL-VLAC-Eval'
EXPERIMENT_NAME='vlac-libero10-sftall_node1_eval'
SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
DATASET_NAME="libero_10"
```

**Key Configuration:**
```yaml
trainer.val_only=True  # Pure evaluation mode
+actor_rollout_ref.rollout.use_vlac=false  # Explicit disable
```

## Training Logic

### Training Mode (`use_vlac=true`, `val_only=false`)

1. **Episode Step**: After each action, collect trajectory frames
2. **Done Check**: Call VLAC `/done` endpoint with (first_frame, prev_frame, curr_frame)
   - If VLAC says `done=true`: terminate episode, reward = 1.0
   - Otherwise: continue episode
3. **Max Steps**: When max steps reached:
   - Call VLAC `/trajectory-critic` endpoint  
   - Use final value as terminal reward (normalized to 0-1)

### Evaluation Mode (`val_only=true`)

- Use original environment `done` signal
- No VLAC service calls required
- Compute success rate using simulator feedback

## Integration Details

### Environment Worker (`env_worker`)

**New Features:**
- Trajectory frame collection (`trajectory_frames[]`)
- VLAC client initialization per worker process
- VLAC done detection after each step
- Terminal value computation at episode end

**New Output Fields:**
```python
{
  'vlac_done': bool,         # Whether VLAC detected completion
  'terminal_reward': float   # VLAC-computed terminal reward (0-1)
}
```

### Rollout Class (`RobHFRollout`)

**New Configuration:**
```python
self.use_vlac = getattr(config, 'use_vlac', False)
self.vlac_service_url = getattr(config, 'vlac_service_url', 'http://localhost:8111')
```

**New Batch Fields:**
```python
batch["vlac_done"] = torch.tensor(...)      # VLAC termination flags
batch["terminal_reward"] = torch.tensor(...)  # VLAC terminal rewards
```

### VLAC Client (`VLACClient`)

**Key Methods:**
- `check_done()`: Episode termination detection
- `compute_trajectory_values()`: Terminal value computation  
- `pairwise_critic()`: Frame comparison (optional)

**Error Handling:**
- Graceful fallback if VLAC service unavailable
- Automatic retry logic for transient failures
- Timeout protection for long-running requests

## Configuration Options

| Parameter | Default | Description |
|-----------|---------|-------------|
| `use_vlac` | `false` | Enable VLAC integration |
| `vlac_service_url` | `http://localhost:8111` | VLAC service endpoint |
| `val_only` | `false` | Evaluation mode (disables VLAC) |

## Performance Considerations

### GPU Memory Sharing
- VLAC service: ~20-30 GB during inference
- SimpleVLA-RL: ~60-70 GB during training
- Total: fits comfortably on H100 80GB cards

### Latency Impact
- Done check: ~300-800ms per step (depends on frames)
- Terminal value: ~1-5s per episode (depends on trajectory length)
- Overall training throughput: ~10-20% slower due to VLAC calls

### Scaling
- Multiple VLAC service instances on different GPUs
- Load balancing across service instances
- Batch optimization for trajectory processing

## Debugging & Monitoring

### Service Health
```bash
# Check VLAC service status
curl -X POST http://localhost:8111/healthcheck

# Enable debug image saving
export VLAC_SAVE_INPUTS=1
```

### Training Logs
```
VLAC integration enabled. Service URL: http://localhost:8111
Training mode: True
VLAC detected task completion at step 45 (prob: 0.847)
Max steps reached, computing VLAC terminal value...
VLAC terminal value: 0.632
```

### Common Issues

**Service Connection Failed:**
- Verify VLAC service is running: `ps aux | grep vlac_service`
- Check service logs for errors
- Test manual service calls

**Out of Memory:**
- Reduce VLAC batch sizes in service
- Use fewer reference images
- Monitor GPU usage: `nvidia-smi`

**Slow Training:**
- Check VLAC service response times
- Reduce trajectory frame collection frequency
- Use multiple VLAC service instances

## File Structure

```
SimpleVLA-RL/
├── vlac_service.py                    # VLAC HTTP service
├── test_vlac_service.py              # Service test suite
├── vlac_service_contract.md          # API specification
├── README_VLAC_SERVICE.md            # Service documentation
├── requirements_vlac_service.txt     # Service dependencies
├── examples/
│   ├── run_openvla_oft_rl_vlac.sh   # Training with VLAC
│   └── eval_openvla_oft_vlac.sh     # Evaluation script
└── verl/
    ├── utils/vlac_client.py         # VLAC service client
    └── workers/rollout/rob_rollout.py # Enhanced rollout worker
```

## Next Steps

1. **Performance Optimization**: 
   - Implement request batching
   - Add async processing
   - Cache frequent computations

2. **Robustness**: 
   - Add circuit breaker pattern
   - Implement request queuing
   - Add health monitoring

3. **Advanced Features**:
   - Reference frame caching  
   - Multi-task adaptation
   - Progressive difficulty scaling

## Support

For questions or issues with VLAC integration:
1. Check service health endpoints
2. Review training logs for VLAC messages  
3. Test service manually with `test_vlac_service.py`
4. Verify configuration parameters match examples