Add files using upload-large-folder tool

ff8fd11 verified 28 days ago

4.21 kB

	SkyPilot Examples
	=================

	Last updated: 09/04/2025.

	This guide provides examples of running VERL reinforcement learning training on Kubernetes clusters or cloud platforms with GPU nodes using `SkyPilot <https://github.com/skypilot-org/skypilot>`_.

	Installation and Configuration
	-------------------------------

	Step 1: Install SkyPilot
	~~~~~~~~~~~~~~~~~~~~~~~~~

	Choose the installation based on your target platform:

	.. code-block:: bash

	# For Kubernetes only
	pip install "skypilot[kubernetes]"

	# For AWS
	pip install "skypilot[aws]"

	# For Google Cloud Platform
	pip install "skypilot[gcp]"

	# For Azure
	pip install "skypilot[azure]"

	# For multiple platforms
	pip install "skypilot[kubernetes,aws,gcp,azure]"

	Step 2: Configure Your Platform
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	See https://docs.skypilot.co/en/latest/getting-started/installation.html

	Step 3: Set Up Environment Variables
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Export necessary API keys for experiment tracking:

	.. code-block:: bash

	# For Weights & Biases tracking
	export WANDB_API_KEY="your-wandb-api-key"

	# For HuggingFace gated models (if needed)
	export HF_TOKEN="your-huggingface-token"

	Examples
	--------

	All example configurations are available in the `examples/skypilot/ <https://github.com/volcengine/verl/tree/main/examples/skypilot>`_ directory on GitHub. See the `README <https://github.com/volcengine/verl/blob/main/examples/skypilot/README.md>`_ for additional details.

	PPO Training
	~~~~~~~~~~~~

	.. code-block:: bash

	sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y

	Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in ``examples/ppo_trainer/``.

	`View verl-ppo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-ppo.yaml>`_

	GRPO Training
	~~~~~~~~~~~~~

	.. code-block:: bash

	sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y

	Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in ``examples/grpo_trainer/``.

	`View verl-grpo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-grpo.yaml>`_

	Multi-turn Tool Usage Training
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. code-block:: bash

	sky launch -c verl-multiturn verl-multiturn-tools.yaml \
	--secret WANDB_API_KEY --secret HF_TOKEN -y

	Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in ``examples/sglang_multiturn/`` but uses vLLM instead of sglang.

	`View verl-multiturn-tools.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-multiturn-tools.yaml>`_

	Configuration
	-------------

	The example YAML files are pre-configured with:

	- Infrastructure: Kubernetes clusters (``infra: k8s``) - can be changed to ``infra: aws`` or ``infra: gcp``, etc.
	- Docker Image: VERL's official Docker image with CUDA 12.6 support
	- Setup: Automatically clones and installs VERL from source
	- Datasets: Downloads required datasets during setup phase
	- Ray Cluster: Configures distributed training across nodes
	- Logging: Supports Weights & Biases via ``--secret WANDB_API_KEY``
	- Models: Supports gated HuggingFace models via ``--secret HF_TOKEN``

	Launch Command Options
	----------------------

	- ``-c <name>``: Cluster name for managing the job
	- ``--secret KEY``: Pass secrets for API keys (can be used multiple times)
	- ``-y``: Skip confirmation prompt

	Monitoring Your Jobs
	--------------------

	Check Cluster Status
	~~~~~~~~~~~~~~~~~~~~

	.. code-block:: bash

	sky status

	View Logs
	~~~~~~~~~

	.. code-block:: bash

	sky logs verl-ppo # View logs for the PPO job

	SSH into Head Node
	~~~~~~~~~~~~~~~~~~

	.. code-block:: bash

	ssh verl-ppo

	Access Ray Dashboard
	~~~~~~~~~~~~~~~~~~~~

	.. code-block:: bash

	sky status --endpoint 8265 verl-ppo # Get dashboard URL

	Stop a Cluster
	~~~~~~~~~~~~~~

	.. code-block:: bash

	sky down verl-ppo