File size: 3,044 Bytes
030876e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | # verl with SkyPilot
Run verl reinforcement learning training jobs on Kubernetes clusters or cloud platforms with GPU nodes using [SkyPilot](https://github.com/skypilot-org/skypilot).
## Installation and Configuration
### Step 1: Install SkyPilot
Choose the installation based on your target platform:
```bash
# For Kubernetes only
pip install "skypilot[kubernetes]"
# For AWS
pip install "skypilot[aws]"
# For Google Cloud Platform
pip install "skypilot[gcp]"
# For Azure
pip install "skypilot[azure]"
# For multiple platforms
pip install "skypilot[kubernetes,aws,gcp,azure]"
```
### Step 2: Configure Your Platform
See https://docs.skypilot.co/en/latest/getting-started/installation.html
### Step 3: Set Up Environment Variables
Export necessary API keys for experiment tracking:
```bash
# For Weights & Biases tracking
export WANDB_API_KEY="your-wandb-api-key"
# For HuggingFace gated models (if needed)
export HF_TOKEN="your-huggingface-token"
```
## Examples
### PPO Training
```bash
sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y
```
Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in [`../ppo_trainer/`](../ppo_trainer/).
### GRPO Training
```bash
sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y
```
Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in [`../grpo_trainer/`](../grpo_trainer/).
### Multi-turn Tool Usage Training
```bash
sky launch -c verl-multiturn verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y
```
Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in [`../sglang_multiturn/`](../sglang_multiturn/) but uses vLLM instead of sglang.
## Configuration
The example YAML files are pre-configured with:
- **Infrastructure**: Kubernetes clusters (`infra: k8s`) - can be changed to `infra: aws` or `infra: gcp`, etc.
- **Docker Image**: verl's official Docker image with CUDA 12.6 support
- **Setup**: Automatically clones and installs verl from source
- **Datasets**: Downloads required datasets during setup phase
- **Ray Cluster**: Configures distributed training across nodes
- **Logging**: Supports Weights & Biases via `--secret WANDB_API_KEY`
- **Models**: Supports gated HuggingFace models via `--secret HF_TOKEN`
## Launch Command Options
- `-c <name>`: Cluster name for managing the job
- `--secret KEY`: Pass secrets for API keys (can be used multiple times)
- `-y`: Skip confirmation prompt
## Monitoring Your Jobs
### Check cluster status
```bash
sky status
```
### View logs
```bash
sky logs verl-ppo # View logs for the PPO job
```
### SSH into head node
```bash
ssh verl-ppo
```
### Access Ray dashboard
```bash
sky status --endpoint 8265 verl-ppo # Get dashboard URL
```
### Stop a cluster
```bash
sky down verl-ppo
```
|