File size: 3,892 Bytes
857c2e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# VLAC Service

A minimal HTTP API service that exposes the Vision-Language-Action-Critic (VLAC) model for use in SimpleVLA-RL training.

## Quick Start

### 1. Install Dependencies
```bash
pip install -r requirements_vlac_service.txt
```

### 2. Start the Service
```bash
python vlac_service.py --port 8111 --gpu-ids 0,1,2,3
```

### 3. Test the Service
```bash
python test_vlac_service.py --url http://localhost:8111
```

## Usage

### Command Line Options
```bash
python vlac_service.py --help
```

- `--port`: Port to run on (default: 8111)
- `--host`: Host to bind to (default: 0.0.0.0)  
- `--ckpt-path`: Path to VLAC checkpoint (default: /home/zechen/SimpleVLA-RL/CKPT/VLAC)
- `--gpu-ids`: Comma-separated GPU IDs (default: "0")
- `--workers`: Number of workers (default: 1)

### Environment Variables
- `VLAC_SAVE_INPUTS=1`: Save decoded images to `/tmp/vlac_debug/` for debugging

## API Endpoints

### Health Check
```bash
curl -X POST http://localhost:8111/healthcheck
```

### Pairwise Critic
```bash
curl -X POST http://localhost:8111/pairwise-critic \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "image_a": "<base64_image>",
    "image_b": "<base64_image>",
    "rich": false
  }'
```

### Done Detection
```bash
curl -X POST http://localhost:8111/done \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "first_frame": "<base64_image>",
    "prev_frame": "<base64_image>", 
    "curr_frame": "<base64_image>",
    "reference": ["<base64_image>"]
  }'
```

### Trajectory Critic
```bash
curl -X POST http://localhost:8111/trajectory-critic \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "frames": ["<base64_image>", "<base64_image>"],
    "skip": 5,
    "ref_num": 6,
    "batch_size": 10,
    "think": false,
    "return_video": false
  }'
```

## Integration with SimpleVLA-RL

The service is designed to be called from the `verl` training framework:

1. **During training**: Call `/done` after each step to determine episode termination
2. **At episode end**: Call `/trajectory-critic` to get value estimates for terminal rewards  
3. **During evaluation**: Use environment `done` signal (skip VLAC)

See `vlac_service_contract.md` for the full API specification.

## Architecture

- **Single process, single GPU**: Each service instance uses one GPU selected automatically
- **Automatic batching**: Large requests are chunked into batches ≤ 8 frames
- **Image processing**: All images auto-resized to 448×448, base64 encoded
- **Simple deployment**: No Docker or complex orchestration required

## Troubleshooting

### Service won't start
- Check that the checkpoint path exists: `/home/zechen/SimpleVLA-RL/CKPT/VLAC`
- Verify GPU availability with `nvidia-smi`
- Check that all dependencies are installed

### Out of memory errors  
- Reduce batch size in requests
- Use fewer reference images
- Check GPU memory usage with `nvidia-smi`

### Slow responses
- Use fewer reference images in `/done` requests
- Reduce `skip` parameter in `/trajectory-critic`
- Consider running multiple service instances on different GPUs

## Files

- `vlac_service.py`: Main service implementation
- `test_vlac_service.py`: Test script with sample requests
- `requirements_vlac_service.txt`: Python dependencies
- `vlac_service_contract.md`: Full API specification
- `guidelines.md`: Integration guidelines for SimpleVLA-RL

## Performance Notes

- GPU memory usage: ~20-30 GB during inference
- Typical latency: 
  - `/healthcheck`: <10ms
  - `/pairwise-critic`: ~200-500ms  
  - `/done`: ~300-800ms (depends on reference images)
  - `/trajectory-critic`: ~1-5s (depends on trajectory length)

The service is optimized for the SimpleVLA-RL use case where GPU memory is shared with the main training process.