# VLAC Service

A minimal HTTP API service that exposes the Vision-Language-Action-Critic (VLAC) model for use in SimpleVLA-RL training.

## Quick Start

### 1. Install Dependencies
```bash
pip install -r requirements_vlac_service.txt
```

### 2. Start the Service
```bash
python vlac_service.py --port 8111 --gpu-ids 0,1,2,3
```

### 3. Test the Service
```bash
python test_vlac_service.py --url http://localhost:8111
```

## Usage

### Command Line Options
```bash
python vlac_service.py --help
```

- `--port`: Port to run on (default: 8111)
- `--host`: Host to bind to (default: 0.0.0.0)  
- `--ckpt-path`: Path to VLAC checkpoint (default: /home/zechen/SimpleVLA-RL/CKPT/VLAC)
- `--gpu-ids`: Comma-separated GPU IDs (default: "0")
- `--workers`: Number of workers (default: 1)

### Environment Variables
- `VLAC_SAVE_INPUTS=1`: Save decoded images to `/tmp/vlac_debug/` for debugging

## API Endpoints

### Health Check
```bash
curl -X POST http://localhost:8111/healthcheck
```

### Pairwise Critic
```bash
curl -X POST http://localhost:8111/pairwise-critic \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "image_a": "<base64_image>",
    "image_b": "<base64_image>",
    "rich": false
  }'
```

### Done Detection
```bash
curl -X POST http://localhost:8111/done \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "first_frame": "<base64_image>",
    "prev_frame": "<base64_image>", 
    "curr_frame": "<base64_image>",
    "reference": ["<base64_image>"]
  }'
```

### Trajectory Critic
```bash
curl -X POST http://localhost:8111/trajectory-critic \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "frames": ["<base64_image>", "<base64_image>"],
    "skip": 5,
    "ref_num": 6,
    "batch_size": 10,
    "think": false,
    "return_video": false
  }'
```

## Integration with SimpleVLA-RL

The service is designed to be called from the `verl` training framework:

1. **During training**: Call `/done` after each step to determine episode termination
2. **At episode end**: Call `/trajectory-critic` to get value estimates for terminal rewards  
3. **During evaluation**: Use environment `done` signal (skip VLAC)

See `vlac_service_contract.md` for the full API specification.

## Architecture

- **Single process, single GPU**: Each service instance uses one GPU selected automatically
- **Automatic batching**: Large requests are chunked into batches ≤ 8 frames
- **Image processing**: All images auto-resized to 448×448, base64 encoded
- **Simple deployment**: No Docker or complex orchestration required

## Troubleshooting

### Service won't start
- Check that the checkpoint path exists: `/home/zechen/SimpleVLA-RL/CKPT/VLAC`
- Verify GPU availability with `nvidia-smi`
- Check that all dependencies are installed

### Out of memory errors  
- Reduce batch size in requests
- Use fewer reference images
- Check GPU memory usage with `nvidia-smi`

### Slow responses
- Use fewer reference images in `/done` requests
- Reduce `skip` parameter in `/trajectory-critic`
- Consider running multiple service instances on different GPUs

## Files

- `vlac_service.py`: Main service implementation
- `test_vlac_service.py`: Test script with sample requests
- `requirements_vlac_service.txt`: Python dependencies
- `vlac_service_contract.md`: Full API specification
- `guidelines.md`: Integration guidelines for SimpleVLA-RL

## Performance Notes

- GPU memory usage: ~20-30 GB during inference
- Typical latency: 
  - `/healthcheck`: <10ms
  - `/pairwise-critic`: ~200-500ms  
  - `/done`: ~300-800ms (depends on reference images)
  - `/trajectory-critic`: ~1-5s (depends on trajectory length)

The service is optimized for the SimpleVLA-RL use case where GPU memory is shared with the main training process.