TTI / Dev /README_VLAC_SERVICE.md
JosephBai's picture
Upload folder using huggingface_hub
857c2e9 verified
# VLAC Service
A minimal HTTP API service that exposes the Vision-Language-Action-Critic (VLAC) model for use in SimpleVLA-RL training.
## Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements_vlac_service.txt
```
### 2. Start the Service
```bash
python vlac_service.py --port 8111 --gpu-ids 0,1,2,3
```
### 3. Test the Service
```bash
python test_vlac_service.py --url http://localhost:8111
```
## Usage
### Command Line Options
```bash
python vlac_service.py --help
```
- `--port`: Port to run on (default: 8111)
- `--host`: Host to bind to (default: 0.0.0.0)
- `--ckpt-path`: Path to VLAC checkpoint (default: /home/zechen/SimpleVLA-RL/CKPT/VLAC)
- `--gpu-ids`: Comma-separated GPU IDs (default: "0")
- `--workers`: Number of workers (default: 1)
### Environment Variables
- `VLAC_SAVE_INPUTS=1`: Save decoded images to `/tmp/vlac_debug/` for debugging
## API Endpoints
### Health Check
```bash
curl -X POST http://localhost:8111/healthcheck
```
### Pairwise Critic
```bash
curl -X POST http://localhost:8111/pairwise-critic \
-H "Content-Type: application/json" \
-d '{
"task": "Pick up the bowl and put it in the box",
"image_a": "<base64_image>",
"image_b": "<base64_image>",
"rich": false
}'
```
### Done Detection
```bash
curl -X POST http://localhost:8111/done \
-H "Content-Type: application/json" \
-d '{
"task": "Pick up the bowl and put it in the box",
"first_frame": "<base64_image>",
"prev_frame": "<base64_image>",
"curr_frame": "<base64_image>",
"reference": ["<base64_image>"]
}'
```
### Trajectory Critic
```bash
curl -X POST http://localhost:8111/trajectory-critic \
-H "Content-Type: application/json" \
-d '{
"task": "Pick up the bowl and put it in the box",
"frames": ["<base64_image>", "<base64_image>"],
"skip": 5,
"ref_num": 6,
"batch_size": 10,
"think": false,
"return_video": false
}'
```
## Integration with SimpleVLA-RL
The service is designed to be called from the `verl` training framework:
1. **During training**: Call `/done` after each step to determine episode termination
2. **At episode end**: Call `/trajectory-critic` to get value estimates for terminal rewards
3. **During evaluation**: Use environment `done` signal (skip VLAC)
See `vlac_service_contract.md` for the full API specification.
## Architecture
- **Single process, single GPU**: Each service instance uses one GPU selected automatically
- **Automatic batching**: Large requests are chunked into batches ≤ 8 frames
- **Image processing**: All images auto-resized to 448×448, base64 encoded
- **Simple deployment**: No Docker or complex orchestration required
## Troubleshooting
### Service won't start
- Check that the checkpoint path exists: `/home/zechen/SimpleVLA-RL/CKPT/VLAC`
- Verify GPU availability with `nvidia-smi`
- Check that all dependencies are installed
### Out of memory errors
- Reduce batch size in requests
- Use fewer reference images
- Check GPU memory usage with `nvidia-smi`
### Slow responses
- Use fewer reference images in `/done` requests
- Reduce `skip` parameter in `/trajectory-critic`
- Consider running multiple service instances on different GPUs
## Files
- `vlac_service.py`: Main service implementation
- `test_vlac_service.py`: Test script with sample requests
- `requirements_vlac_service.txt`: Python dependencies
- `vlac_service_contract.md`: Full API specification
- `guidelines.md`: Integration guidelines for SimpleVLA-RL
## Performance Notes
- GPU memory usage: ~20-30 GB during inference
- Typical latency:
- `/healthcheck`: <10ms
- `/pairwise-critic`: ~200-500ms
- `/done`: ~300-800ms (depends on reference images)
- `/trajectory-critic`: ~1-5s (depends on trajectory length)
The service is optimized for the SimpleVLA-RL use case where GPU memory is shared with the main training process.