TTI / Dev /README_VLAC_SERVICE.md

Upload folder using huggingface_hub

857c2e9 verified about 1 month ago

3.89 kB

	# VLAC Service

	A minimal HTTP API service that exposes the Vision-Language-Action-Critic (VLAC) model for use in SimpleVLA-RL training.

	## Quick Start

	### 1. Install Dependencies
	```bash
	pip install -r requirements_vlac_service.txt
	```

	### 2. Start the Service
	```bash
	python vlac_service.py --port 8111 --gpu-ids 0,1,2,3
	```

	### 3. Test the Service
	```bash
	python test_vlac_service.py --url http://localhost:8111
	```

	## Usage

	### Command Line Options
	```bash
	python vlac_service.py --help
	```

	- `--port`: Port to run on (default: 8111)
	- `--host`: Host to bind to (default: 0.0.0.0)
	- `--ckpt-path`: Path to VLAC checkpoint (default: /home/zechen/SimpleVLA-RL/CKPT/VLAC)
	- `--gpu-ids`: Comma-separated GPU IDs (default: "0")
	- `--workers`: Number of workers (default: 1)

	### Environment Variables
	- `VLAC_SAVE_INPUTS=1`: Save decoded images to `/tmp/vlac_debug/` for debugging

	## API Endpoints

	### Health Check
	```bash
	curl -X POST http://localhost:8111/healthcheck
	```

	### Pairwise Critic
	```bash
	curl -X POST http://localhost:8111/pairwise-critic \
	-H "Content-Type: application/json" \
	-d '{
	"task": "Pick up the bowl and put it in the box",
	"image_a": "<base64_image>",
	"image_b": "<base64_image>",
	"rich": false
	}'
	```

	### Done Detection
	```bash
	curl -X POST http://localhost:8111/done \
	-H "Content-Type: application/json" \
	-d '{
	"task": "Pick up the bowl and put it in the box",
	"first_frame": "<base64_image>",
	"prev_frame": "<base64_image>",
	"curr_frame": "<base64_image>",
	"reference": ["<base64_image>"]
	}'
	```

	### Trajectory Critic
	```bash
	curl -X POST http://localhost:8111/trajectory-critic \
	-H "Content-Type: application/json" \
	-d '{
	"task": "Pick up the bowl and put it in the box",
	"frames": ["<base64_image>", "<base64_image>"],
	"skip": 5,
	"ref_num": 6,
	"batch_size": 10,
	"think": false,
	"return_video": false
	}'
	```

	## Integration with SimpleVLA-RL

	The service is designed to be called from the `verl` training framework:

	1. During training: Call `/done` after each step to determine episode termination
	2. At episode end: Call `/trajectory-critic` to get value estimates for terminal rewards
	3. During evaluation: Use environment `done` signal (skip VLAC)

	See `vlac_service_contract.md` for the full API specification.

	## Architecture

	- Single process, single GPU: Each service instance uses one GPU selected automatically
	- Automatic batching: Large requests are chunked into batches ≤ 8 frames
	- Image processing: All images auto-resized to 448×448, base64 encoded
	- Simple deployment: No Docker or complex orchestration required

	## Troubleshooting

	### Service won't start
	- Check that the checkpoint path exists: `/home/zechen/SimpleVLA-RL/CKPT/VLAC`
	- Verify GPU availability with `nvidia-smi`
	- Check that all dependencies are installed

	### Out of memory errors
	- Reduce batch size in requests
	- Use fewer reference images
	- Check GPU memory usage with `nvidia-smi`

	### Slow responses
	- Use fewer reference images in `/done` requests
	- Reduce `skip` parameter in `/trajectory-critic`
	- Consider running multiple service instances on different GPUs

	## Files

	- `vlac_service.py`: Main service implementation
	- `test_vlac_service.py`: Test script with sample requests
	- `requirements_vlac_service.txt`: Python dependencies
	- `vlac_service_contract.md`: Full API specification
	- `guidelines.md`: Integration guidelines for SimpleVLA-RL

	## Performance Notes

	- GPU memory usage: ~20-30 GB during inference
	- Typical latency:
	- `/healthcheck`: <10ms
	- `/pairwise-critic`: ~200-500ms
	- `/done`: ~300-800ms (depends on reference images)
	- `/trajectory-critic`: ~1-5s (depends on trajectory length)

	The service is optimized for the SimpleVLA-RL use case where GPU memory is shared with the main training process.