File size: 4,211 Bytes
ff8fd11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
SkyPilot Examples
=================

Last updated: 09/04/2025.

This guide provides examples of running VERL reinforcement learning training on Kubernetes clusters or cloud platforms with GPU nodes using `SkyPilot <https://github.com/skypilot-org/skypilot>`_.

Installation and Configuration
-------------------------------

Step 1: Install SkyPilot
~~~~~~~~~~~~~~~~~~~~~~~~~

Choose the installation based on your target platform:

.. code-block:: bash

   # For Kubernetes only
   pip install "skypilot[kubernetes]"
   
   # For AWS
   pip install "skypilot[aws]"
   
   # For Google Cloud Platform
   pip install "skypilot[gcp]"
   
   # For Azure
   pip install "skypilot[azure]"
   
   # For multiple platforms
   pip install "skypilot[kubernetes,aws,gcp,azure]"

Step 2: Configure Your Platform
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See https://docs.skypilot.co/en/latest/getting-started/installation.html

Step 3: Set Up Environment Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Export necessary API keys for experiment tracking:

.. code-block:: bash

   # For Weights & Biases tracking
   export WANDB_API_KEY="your-wandb-api-key"
   
   # For HuggingFace gated models (if needed)
   export HF_TOKEN="your-huggingface-token"

Examples
--------

All example configurations are available in the `examples/skypilot/ <https://github.com/volcengine/verl/tree/main/examples/skypilot>`_ directory on GitHub. See the `README <https://github.com/volcengine/verl/blob/main/examples/skypilot/README.md>`_ for additional details.

PPO Training
~~~~~~~~~~~~

.. code-block:: bash

   sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y

Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in ``examples/ppo_trainer/``.

`View verl-ppo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-ppo.yaml>`_

GRPO Training
~~~~~~~~~~~~~

.. code-block:: bash

   sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y

Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in ``examples/grpo_trainer/``.

`View verl-grpo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-grpo.yaml>`_

Multi-turn Tool Usage Training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   sky launch -c verl-multiturn verl-multiturn-tools.yaml \
     --secret WANDB_API_KEY --secret HF_TOKEN -y

Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in ``examples/sglang_multiturn/`` but uses vLLM instead of sglang.

`View verl-multiturn-tools.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-multiturn-tools.yaml>`_

Configuration
-------------

The example YAML files are pre-configured with:

- **Infrastructure**: Kubernetes clusters (``infra: k8s``) - can be changed to ``infra: aws`` or ``infra: gcp``, etc.
- **Docker Image**: VERL's official Docker image with CUDA 12.6 support
- **Setup**: Automatically clones and installs VERL from source
- **Datasets**: Downloads required datasets during setup phase
- **Ray Cluster**: Configures distributed training across nodes
- **Logging**: Supports Weights & Biases via ``--secret WANDB_API_KEY``
- **Models**: Supports gated HuggingFace models via ``--secret HF_TOKEN``

Launch Command Options
----------------------

- ``-c <name>``: Cluster name for managing the job
- ``--secret KEY``: Pass secrets for API keys (can be used multiple times)
- ``-y``: Skip confirmation prompt

Monitoring Your Jobs
--------------------

Check Cluster Status
~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   sky status

View Logs
~~~~~~~~~

.. code-block:: bash

   sky logs verl-ppo  # View logs for the PPO job

SSH into Head Node
~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   ssh verl-ppo

Access Ray Dashboard
~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   sky status --endpoint 8265 verl-ppo  # Get dashboard URL

Stop a Cluster
~~~~~~~~~~~~~~

.. code-block:: bash

   sky down verl-ppo