# VLAC Service A minimal HTTP API service that exposes the Vision-Language-Action-Critic (VLAC) model for use in SimpleVLA-RL training. ## Quick Start ### 1. Install Dependencies ```bash pip install -r requirements_vlac_service.txt ``` ### 2. Start the Service ```bash python vlac_service.py --port 8111 --gpu-ids 0,1,2,3 ``` ### 3. Test the Service ```bash python test_vlac_service.py --url http://localhost:8111 ``` ## Usage ### Command Line Options ```bash python vlac_service.py --help ``` - `--port`: Port to run on (default: 8111) - `--host`: Host to bind to (default: 0.0.0.0) - `--ckpt-path`: Path to VLAC checkpoint (default: /home/zechen/SimpleVLA-RL/CKPT/VLAC) - `--gpu-ids`: Comma-separated GPU IDs (default: "0") - `--workers`: Number of workers (default: 1) ### Environment Variables - `VLAC_SAVE_INPUTS=1`: Save decoded images to `/tmp/vlac_debug/` for debugging ## API Endpoints ### Health Check ```bash curl -X POST http://localhost:8111/healthcheck ``` ### Pairwise Critic ```bash curl -X POST http://localhost:8111/pairwise-critic \ -H "Content-Type: application/json" \ -d '{ "task": "Pick up the bowl and put it in the box", "image_a": "", "image_b": "", "rich": false }' ``` ### Done Detection ```bash curl -X POST http://localhost:8111/done \ -H "Content-Type: application/json" \ -d '{ "task": "Pick up the bowl and put it in the box", "first_frame": "", "prev_frame": "", "curr_frame": "", "reference": [""] }' ``` ### Trajectory Critic ```bash curl -X POST http://localhost:8111/trajectory-critic \ -H "Content-Type: application/json" \ -d '{ "task": "Pick up the bowl and put it in the box", "frames": ["", ""], "skip": 5, "ref_num": 6, "batch_size": 10, "think": false, "return_video": false }' ``` ## Integration with SimpleVLA-RL The service is designed to be called from the `verl` training framework: 1. **During training**: Call `/done` after each step to determine episode termination 2. **At episode end**: Call `/trajectory-critic` to get value estimates for terminal rewards 3. **During evaluation**: Use environment `done` signal (skip VLAC) See `vlac_service_contract.md` for the full API specification. ## Architecture - **Single process, single GPU**: Each service instance uses one GPU selected automatically - **Automatic batching**: Large requests are chunked into batches ≤ 8 frames - **Image processing**: All images auto-resized to 448×448, base64 encoded - **Simple deployment**: No Docker or complex orchestration required ## Troubleshooting ### Service won't start - Check that the checkpoint path exists: `/home/zechen/SimpleVLA-RL/CKPT/VLAC` - Verify GPU availability with `nvidia-smi` - Check that all dependencies are installed ### Out of memory errors - Reduce batch size in requests - Use fewer reference images - Check GPU memory usage with `nvidia-smi` ### Slow responses - Use fewer reference images in `/done` requests - Reduce `skip` parameter in `/trajectory-critic` - Consider running multiple service instances on different GPUs ## Files - `vlac_service.py`: Main service implementation - `test_vlac_service.py`: Test script with sample requests - `requirements_vlac_service.txt`: Python dependencies - `vlac_service_contract.md`: Full API specification - `guidelines.md`: Integration guidelines for SimpleVLA-RL ## Performance Notes - GPU memory usage: ~20-30 GB during inference - Typical latency: - `/healthcheck`: <10ms - `/pairwise-critic`: ~200-500ms - `/done`: ~300-800ms (depends on reference images) - `/trajectory-critic`: ~1-5s (depends on trajectory length) The service is optimized for the SimpleVLA-RL use case where GPU memory is shared with the main training process.