TTI / Dev /README_VLAC_INTEGRATION.md

Upload folder using huggingface_hub

857c2e9 verified about 1 month ago

7.34 kB

	# VLAC Integration with SimpleVLA-RL

	This document describes the integration of Vision-Language-Action-Critic (VLAC) into the SimpleVLA-RL training framework.

	## Overview

	The VLAC integration replaces the simulator's terminal success signal during training with VLAC-predicted values, while preserving the simulator signal for evaluation. This enables more nuanced reward signals and better task progress estimation.

	## Architecture

	```
	SimpleVLA-RL Training Process:
	┌─────────────────┐ HTTP/JSON ┌─────────────────┐
	│ Training │◄─────────────────►│ VLAC Service │
	│ (rob_rollout) │ (done check, │ (port 8111) │
	│ │ terminal value) │ │
	└─────────────────┘ └─────────────────┘
	```

	### Key Components

	1. VLAC Service (`vlac_service.py`): HTTP API exposing VLAC model functionality
	2. VLAC Client (`verl/utils/vlac_client.py`): Python client for service communication
	3. Enhanced Rollout (`verl/workers/rollout/rob_rollout.py`): Modified rollout with VLAC integration

	## Usage

	### 1. Start VLAC Service

	```bash
	# Start VLAC service (required for training)
	python vlac_service.py --port 8111 --gpu-ids 0,1,2,3
	```

	### 2. Training with VLAC

	```bash
	# Run training with VLAC integration enabled
	bash examples/run_openvla_oft_rl_vlac.sh
	```

	Key Variables (edit at top of script):
	```bash
	PROJECT_NAME='SimpleVLA-RL-VLAC'
	EXPERIMENT_NAME='vlac-libero10-sftall_node1_trial'
	SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
	DATASET_NAME="libero_10"
	VLAC_SERVICE_URL="http://localhost:8111"
	```

	Key VLAC Configuration:
	```yaml
	+actor_rollout_ref.rollout.use_vlac=true
	+actor_rollout_ref.rollout.vlac_service_url=$VLAC_SERVICE_URL
	trainer.val_before_train=False # Avoid val_only issue
	```

	### 3. Evaluation (Environment Done)

	```bash
	# Run evaluation with environment done signal
	bash examples/eval_openvla_oft_vlac.sh
	```

	Key Variables (edit at top of script):
	```bash
	PROJECT_NAME='SimpleVLA-RL-VLAC-Eval'
	EXPERIMENT_NAME='vlac-libero10-sftall_node1_eval'
	SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
	DATASET_NAME="libero_10"
	```

	Key Configuration:
	```yaml
	trainer.val_only=True # Pure evaluation mode
	+actor_rollout_ref.rollout.use_vlac=false # Explicit disable
	```

	## Training Logic

	### Training Mode (`use_vlac=true`, `val_only=false`)

	1. Episode Step: After each action, collect trajectory frames
	2. Done Check: Call VLAC `/done` endpoint with (first_frame, prev_frame, curr_frame)
	- If VLAC says `done=true`: terminate episode, reward = 1.0
	- Otherwise: continue episode
	3. Max Steps: When max steps reached:
	- Call VLAC `/trajectory-critic` endpoint
	- Use final value as terminal reward (normalized to 0-1)

	### Evaluation Mode (`val_only=true`)

	- Use original environment `done` signal
	- No VLAC service calls required
	- Compute success rate using simulator feedback

	## Integration Details

	### Environment Worker (`env_worker`)

	New Features:
	- Trajectory frame collection (`trajectory_frames[]`)
	- VLAC client initialization per worker process
	- VLAC done detection after each step
	- Terminal value computation at episode end

	New Output Fields:
	```python
	{
	'vlac_done': bool, # Whether VLAC detected completion
	'terminal_reward': float # VLAC-computed terminal reward (0-1)
	}
	```

	### Rollout Class (`RobHFRollout`)

	New Configuration:
	```python
	self.use_vlac = getattr(config, 'use_vlac', False)
	self.vlac_service_url = getattr(config, 'vlac_service_url', 'http://localhost:8111')
	```

	New Batch Fields:
	```python
	batch["vlac_done"] = torch.tensor(...) # VLAC termination flags
	batch["terminal_reward"] = torch.tensor(...) # VLAC terminal rewards
	```

	### VLAC Client (`VLACClient`)

	Key Methods:
	- `check_done()`: Episode termination detection
	- `compute_trajectory_values()`: Terminal value computation
	- `pairwise_critic()`: Frame comparison (optional)

	Error Handling:
	- Graceful fallback if VLAC service unavailable
	- Automatic retry logic for transient failures
	- Timeout protection for long-running requests

	## Configuration Options

	\| Parameter \| Default \| Description \|
	\|-----------\|---------\|-------------\|
	\| `use_vlac` \| `false` \| Enable VLAC integration \|
	\| `vlac_service_url` \| `http://localhost:8111` \| VLAC service endpoint \|
	\| `val_only` \| `false` \| Evaluation mode (disables VLAC) \|

	## Performance Considerations

	### GPU Memory Sharing
	- VLAC service: ~20-30 GB during inference
	- SimpleVLA-RL: ~60-70 GB during training
	- Total: fits comfortably on H100 80GB cards

	### Latency Impact
	- Done check: ~300-800ms per step (depends on frames)
	- Terminal value: ~1-5s per episode (depends on trajectory length)
	- Overall training throughput: ~10-20% slower due to VLAC calls

	### Scaling
	- Multiple VLAC service instances on different GPUs
	- Load balancing across service instances
	- Batch optimization for trajectory processing

	## Debugging & Monitoring

	### Service Health
	```bash
	# Check VLAC service status
	curl -X POST http://localhost:8111/healthcheck

	# Enable debug image saving
	export VLAC_SAVE_INPUTS=1
	```

	### Training Logs
	```
	VLAC integration enabled. Service URL: http://localhost:8111
	Training mode: True
	VLAC detected task completion at step 45 (prob: 0.847)
	Max steps reached, computing VLAC terminal value...
	VLAC terminal value: 0.632
	```

	### Common Issues

	Service Connection Failed:
	- Verify VLAC service is running: `ps aux \| grep vlac_service`
	- Check service logs for errors
	- Test manual service calls

	Out of Memory:
	- Reduce VLAC batch sizes in service
	- Use fewer reference images
	- Monitor GPU usage: `nvidia-smi`

	Slow Training:
	- Check VLAC service response times
	- Reduce trajectory frame collection frequency
	- Use multiple VLAC service instances

	## File Structure

	```
	SimpleVLA-RL/
	├── vlac_service.py # VLAC HTTP service
	├── test_vlac_service.py # Service test suite
	├── vlac_service_contract.md # API specification
	├── README_VLAC_SERVICE.md # Service documentation
	├── requirements_vlac_service.txt # Service dependencies
	├── examples/
	│ ├── run_openvla_oft_rl_vlac.sh # Training with VLAC
	│ └── eval_openvla_oft_vlac.sh # Evaluation script
	└── verl/
	├── utils/vlac_client.py # VLAC service client
	└── workers/rollout/rob_rollout.py # Enhanced rollout worker
	```

	## Next Steps

	1. Performance Optimization:
	- Implement request batching
	- Add async processing
	- Cache frequent computations

	2. Robustness:
	- Add circuit breaker pattern
	- Implement request queuing
	- Add health monitoring

	3. Advanced Features:
	- Reference frame caching
	- Multi-task adaptation
	- Progressive difficulty scaling

	## Support

	For questions or issues with VLAC integration:
	1. Check service health endpoints
	2. Review training logs for VLAC messages
	3. Test service manually with `test_vlac_service.py`
	4. Verify configuration parameters match examples