File size: 3,892 Bytes
857c2e9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | # VLAC Service
A minimal HTTP API service that exposes the Vision-Language-Action-Critic (VLAC) model for use in SimpleVLA-RL training.
## Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements_vlac_service.txt
```
### 2. Start the Service
```bash
python vlac_service.py --port 8111 --gpu-ids 0,1,2,3
```
### 3. Test the Service
```bash
python test_vlac_service.py --url http://localhost:8111
```
## Usage
### Command Line Options
```bash
python vlac_service.py --help
```
- `--port`: Port to run on (default: 8111)
- `--host`: Host to bind to (default: 0.0.0.0)
- `--ckpt-path`: Path to VLAC checkpoint (default: /home/zechen/SimpleVLA-RL/CKPT/VLAC)
- `--gpu-ids`: Comma-separated GPU IDs (default: "0")
- `--workers`: Number of workers (default: 1)
### Environment Variables
- `VLAC_SAVE_INPUTS=1`: Save decoded images to `/tmp/vlac_debug/` for debugging
## API Endpoints
### Health Check
```bash
curl -X POST http://localhost:8111/healthcheck
```
### Pairwise Critic
```bash
curl -X POST http://localhost:8111/pairwise-critic \
-H "Content-Type: application/json" \
-d '{
"task": "Pick up the bowl and put it in the box",
"image_a": "<base64_image>",
"image_b": "<base64_image>",
"rich": false
}'
```
### Done Detection
```bash
curl -X POST http://localhost:8111/done \
-H "Content-Type: application/json" \
-d '{
"task": "Pick up the bowl and put it in the box",
"first_frame": "<base64_image>",
"prev_frame": "<base64_image>",
"curr_frame": "<base64_image>",
"reference": ["<base64_image>"]
}'
```
### Trajectory Critic
```bash
curl -X POST http://localhost:8111/trajectory-critic \
-H "Content-Type: application/json" \
-d '{
"task": "Pick up the bowl and put it in the box",
"frames": ["<base64_image>", "<base64_image>"],
"skip": 5,
"ref_num": 6,
"batch_size": 10,
"think": false,
"return_video": false
}'
```
## Integration with SimpleVLA-RL
The service is designed to be called from the `verl` training framework:
1. **During training**: Call `/done` after each step to determine episode termination
2. **At episode end**: Call `/trajectory-critic` to get value estimates for terminal rewards
3. **During evaluation**: Use environment `done` signal (skip VLAC)
See `vlac_service_contract.md` for the full API specification.
## Architecture
- **Single process, single GPU**: Each service instance uses one GPU selected automatically
- **Automatic batching**: Large requests are chunked into batches ≤ 8 frames
- **Image processing**: All images auto-resized to 448×448, base64 encoded
- **Simple deployment**: No Docker or complex orchestration required
## Troubleshooting
### Service won't start
- Check that the checkpoint path exists: `/home/zechen/SimpleVLA-RL/CKPT/VLAC`
- Verify GPU availability with `nvidia-smi`
- Check that all dependencies are installed
### Out of memory errors
- Reduce batch size in requests
- Use fewer reference images
- Check GPU memory usage with `nvidia-smi`
### Slow responses
- Use fewer reference images in `/done` requests
- Reduce `skip` parameter in `/trajectory-critic`
- Consider running multiple service instances on different GPUs
## Files
- `vlac_service.py`: Main service implementation
- `test_vlac_service.py`: Test script with sample requests
- `requirements_vlac_service.txt`: Python dependencies
- `vlac_service_contract.md`: Full API specification
- `guidelines.md`: Integration guidelines for SimpleVLA-RL
## Performance Notes
- GPU memory usage: ~20-30 GB during inference
- Typical latency:
- `/healthcheck`: <10ms
- `/pairwise-critic`: ~200-500ms
- `/done`: ~300-800ms (depends on reference images)
- `/trajectory-critic`: ~1-5s (depends on trajectory length)
The service is optimized for the SimpleVLA-RL use case where GPU memory is shared with the main training process.
|