mshahidul
Initial commit of readCtrl code without large models
030876e

verl with SkyPilot

Run verl reinforcement learning training jobs on Kubernetes clusters or cloud platforms with GPU nodes using SkyPilot.

Installation and Configuration

Step 1: Install SkyPilot

Choose the installation based on your target platform:

# For Kubernetes only
pip install "skypilot[kubernetes]"

# For AWS
pip install "skypilot[aws]"

# For Google Cloud Platform
pip install "skypilot[gcp]"

# For Azure
pip install "skypilot[azure]"

# For multiple platforms
pip install "skypilot[kubernetes,aws,gcp,azure]"

Step 2: Configure Your Platform

See https://docs.skypilot.co/en/latest/getting-started/installation.html

Step 3: Set Up Environment Variables

Export necessary API keys for experiment tracking:

# For Weights & Biases tracking
export WANDB_API_KEY="your-wandb-api-key"

# For HuggingFace gated models (if needed)
export HF_TOKEN="your-huggingface-token"

Examples

PPO Training

sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y

Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in ../ppo_trainer/.

GRPO Training

sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y

Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in ../grpo_trainer/.

Multi-turn Tool Usage Training

sky launch -c verl-multiturn verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y

Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in ../sglang_multiturn/ but uses vLLM instead of sglang.

Configuration

The example YAML files are pre-configured with:

  • Infrastructure: Kubernetes clusters (infra: k8s) - can be changed to infra: aws or infra: gcp, etc.
  • Docker Image: verl's official Docker image with CUDA 12.6 support
  • Setup: Automatically clones and installs verl from source
  • Datasets: Downloads required datasets during setup phase
  • Ray Cluster: Configures distributed training across nodes
  • Logging: Supports Weights & Biases via --secret WANDB_API_KEY
  • Models: Supports gated HuggingFace models via --secret HF_TOKEN

Launch Command Options

  • -c <name>: Cluster name for managing the job
  • --secret KEY: Pass secrets for API keys (can be used multiple times)
  • -y: Skip confirmation prompt

Monitoring Your Jobs

Check cluster status

sky status

View logs

sky logs verl-ppo  # View logs for the PPO job

SSH into head node

ssh verl-ppo

Access Ray dashboard

sky status --endpoint 8265 verl-ppo  # Get dashboard URL

Stop a cluster

sky down verl-ppo