| # VLAC Service |
|
|
| A minimal HTTP API service that exposes the Vision-Language-Action-Critic (VLAC) model for use in SimpleVLA-RL training. |
|
|
| ## Quick Start |
|
|
| ### 1. Install Dependencies |
| ```bash |
| pip install -r requirements_vlac_service.txt |
| ``` |
|
|
| ### 2. Start the Service |
| ```bash |
| python vlac_service.py --port 8111 --gpu-ids 0,1,2,3 |
| ``` |
|
|
| ### 3. Test the Service |
| ```bash |
| python test_vlac_service.py --url http://localhost:8111 |
| ``` |
|
|
| ## Usage |
|
|
| ### Command Line Options |
| ```bash |
| python vlac_service.py --help |
| ``` |
|
|
| - `--port`: Port to run on (default: 8111) |
| - `--host`: Host to bind to (default: 0.0.0.0) |
| - `--ckpt-path`: Path to VLAC checkpoint (default: /home/zechen/SimpleVLA-RL/CKPT/VLAC) |
| - `--gpu-ids`: Comma-separated GPU IDs (default: "0") |
| - `--workers`: Number of workers (default: 1) |
|
|
| ### Environment Variables |
| - `VLAC_SAVE_INPUTS=1`: Save decoded images to `/tmp/vlac_debug/` for debugging |
|
|
| ## API Endpoints |
|
|
| ### Health Check |
| ```bash |
| curl -X POST http://localhost:8111/healthcheck |
| ``` |
|
|
| ### Pairwise Critic |
| ```bash |
| curl -X POST http://localhost:8111/pairwise-critic \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "task": "Pick up the bowl and put it in the box", |
| "image_a": "<base64_image>", |
| "image_b": "<base64_image>", |
| "rich": false |
| }' |
| ``` |
|
|
| ### Done Detection |
| ```bash |
| curl -X POST http://localhost:8111/done \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "task": "Pick up the bowl and put it in the box", |
| "first_frame": "<base64_image>", |
| "prev_frame": "<base64_image>", |
| "curr_frame": "<base64_image>", |
| "reference": ["<base64_image>"] |
| }' |
| ``` |
|
|
| ### Trajectory Critic |
| ```bash |
| curl -X POST http://localhost:8111/trajectory-critic \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "task": "Pick up the bowl and put it in the box", |
| "frames": ["<base64_image>", "<base64_image>"], |
| "skip": 5, |
| "ref_num": 6, |
| "batch_size": 10, |
| "think": false, |
| "return_video": false |
| }' |
| ``` |
|
|
| ## Integration with SimpleVLA-RL |
|
|
| The service is designed to be called from the `verl` training framework: |
|
|
| 1. **During training**: Call `/done` after each step to determine episode termination |
| 2. **At episode end**: Call `/trajectory-critic` to get value estimates for terminal rewards |
| 3. **During evaluation**: Use environment `done` signal (skip VLAC) |
|
|
| See `vlac_service_contract.md` for the full API specification. |
|
|
| ## Architecture |
|
|
| - **Single process, single GPU**: Each service instance uses one GPU selected automatically |
| - **Automatic batching**: Large requests are chunked into batches ≤ 8 frames |
| - **Image processing**: All images auto-resized to 448×448, base64 encoded |
| - **Simple deployment**: No Docker or complex orchestration required |
|
|
| ## Troubleshooting |
|
|
| ### Service won't start |
| - Check that the checkpoint path exists: `/home/zechen/SimpleVLA-RL/CKPT/VLAC` |
| - Verify GPU availability with `nvidia-smi` |
| - Check that all dependencies are installed |
|
|
| ### Out of memory errors |
| - Reduce batch size in requests |
| - Use fewer reference images |
| - Check GPU memory usage with `nvidia-smi` |
|
|
| ### Slow responses |
| - Use fewer reference images in `/done` requests |
| - Reduce `skip` parameter in `/trajectory-critic` |
| - Consider running multiple service instances on different GPUs |
|
|
| ## Files |
|
|
| - `vlac_service.py`: Main service implementation |
| - `test_vlac_service.py`: Test script with sample requests |
| - `requirements_vlac_service.txt`: Python dependencies |
| - `vlac_service_contract.md`: Full API specification |
| - `guidelines.md`: Integration guidelines for SimpleVLA-RL |
|
|
| ## Performance Notes |
|
|
| - GPU memory usage: ~20-30 GB during inference |
| - Typical latency: |
| - `/healthcheck`: <10ms |
| - `/pairwise-critic`: ~200-500ms |
| - `/done`: ~300-800ms (depends on reference images) |
| - `/trajectory-critic`: ~1-5s (depends on trajectory length) |
|
|
| The service is optimized for the SimpleVLA-RL use case where GPU memory is shared with the main training process. |
|
|