File size: 3,044 Bytes
030876e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# verl with SkyPilot

Run verl reinforcement learning training jobs on Kubernetes clusters or cloud platforms with GPU nodes using [SkyPilot](https://github.com/skypilot-org/skypilot).

## Installation and Configuration

### Step 1: Install SkyPilot

Choose the installation based on your target platform:

```bash
# For Kubernetes only
pip install "skypilot[kubernetes]"

# For AWS
pip install "skypilot[aws]"

# For Google Cloud Platform
pip install "skypilot[gcp]"

# For Azure
pip install "skypilot[azure]"

# For multiple platforms
pip install "skypilot[kubernetes,aws,gcp,azure]"
```

### Step 2: Configure Your Platform

See https://docs.skypilot.co/en/latest/getting-started/installation.html

### Step 3: Set Up Environment Variables

Export necessary API keys for experiment tracking:

```bash
# For Weights & Biases tracking
export WANDB_API_KEY="your-wandb-api-key"

# For HuggingFace gated models (if needed)
export HF_TOKEN="your-huggingface-token"
```

## Examples

### PPO Training
```bash
sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y
```
Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in [`../ppo_trainer/`](../ppo_trainer/).

### GRPO Training  
```bash
sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y
```
Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in [`../grpo_trainer/`](../grpo_trainer/).

### Multi-turn Tool Usage Training
```bash
sky launch -c verl-multiturn verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y
```
Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in [`../sglang_multiturn/`](../sglang_multiturn/) but uses vLLM instead of sglang.

## Configuration

The example YAML files are pre-configured with:

- **Infrastructure**: Kubernetes clusters (`infra: k8s`) - can be changed to `infra: aws` or `infra: gcp`, etc.
- **Docker Image**: verl's official Docker image with CUDA 12.6 support
- **Setup**: Automatically clones and installs verl from source
- **Datasets**: Downloads required datasets during setup phase
- **Ray Cluster**: Configures distributed training across nodes
- **Logging**: Supports Weights & Biases via `--secret WANDB_API_KEY`
- **Models**: Supports gated HuggingFace models via `--secret HF_TOKEN`

## Launch Command Options

- `-c <name>`: Cluster name for managing the job
- `--secret KEY`: Pass secrets for API keys (can be used multiple times)
- `-y`: Skip confirmation prompt

## Monitoring Your Jobs

### Check cluster status
```bash
sky status
```

### View logs
```bash
sky logs verl-ppo  # View logs for the PPO job
```

### SSH into head node
```bash
ssh verl-ppo
```

### Access Ray dashboard
```bash
sky status --endpoint 8265 verl-ppo  # Get dashboard URL
```

### Stop a cluster
```bash
sky down verl-ppo
```