# PyTorch Profiling in verl Last updated: 01/13/2026. This guide explains how to use the native [PyTorch Profiler](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html) for profiling verl training runs. ## Configuration Profiling in verl can be configured through parameters in the trainer configuration file (e.g., `ppo_trainer.yaml`). ### Global Profiling Control In `global_profiler`, you can control when and how profiling occurs globally: * **`global_profiler.steps`**: List of step numbers to profile. E.g., `[1, 2, 5]` profiles steps 1, 2, and 5. Set to `null` to disable. * **`global_profiler.save_path`**: Directory to save the profiling results. Default is `outputs/profile`. ### Role Profiling Control Each RL role (Actor, Critic, etc.) has its own `profiler` configuration: * **`enable`**: Whether to enable profiling for this role. * **`all_ranks`**: If `True`, profiles all ranks. * **`ranks`**: List of specific ranks to profile if `all_ranks` is `False`. * **`tool_config.torch`**: Configuration specific to the PyTorch Profiler. #### PyTorch Profiler Options (`tool_config.torch`) You can customize the PyTorch Profiler behavior using the following fields under `tool_config.torch`: * **`contents`**: List of contents to profile. * **`cpu`**: Profile CPU activities. * **`cuda`**: Profile CUDA activities. * **`memory`**: Track tensor memory allocation/free. * **`shapes`**: Record shapes of operator inputs. * **`stack`**: Record source code file and line number. * **`schedule`**: (Advanced) configuration for `wait`, `warmup`, `active`, `repeat` cycles. ## Examples ### 1. End-to-End Collection Collects performance data for all steps in a single trace file. ```yaml global_profiler: steps: [1, 2, 5] save_path: ./outputs/profile actor_rollout_ref: actor: profiler: enable: True all_ranks: True tool_config: torch: discrete: False contents: [cpu, cuda] # rollout & ref follow actor settings ``` ### 2. Discrete Mode Collection Discrete mode saves separate trace files for each step. This is useful for detailed analysis and is **mandatory** when using Agent Loop. **Configuration Example** This configuration supports profiling both Training (Actor) and Inference (Rollout). You can enable/disable them independently. ```yaml actor_rollout_ref: actor: profiler: enable: True # Set to True to profile training all_ranks: False ranks: [0] # Global Rank 0 tool_config: torch: discrete: True contents: [cpu, cuda] rollout: profiler: enable: True # Set to True to profile inference all_ranks: False ranks: [0] # In Agent Loop, this is the Replica Rank (e.g. 0-th instance) tool_config: torch: discrete: True # REQUIRED # ref follow actor settings ``` **Agent Loop Mode Description** When Rollout runs in [Agent Loop](../advance/agent_loop.rst) mode, performance data for the Rollout phase **must be collected using discrete mode**. In this case, the Profiler is triggered by the inference engine backend. 1. Rank Definition: ranks in the Rollout configuration refers to Replica Rank (inference instance index), not Global Rank. 2. Inference Engine Support: Currently, vLLM and SGLang engines are supported without additional settings. Specific details are as follows: * **vLLM Engine**: Automatically collects AsyncLLM scheduling stacks and inference process performance data. * **SGLang Engine**: Automatically collects inference process performance data. Does not support the memory option in contents. ## Visualization Collected trace files (usually `.json` or `.json.gz`) are stored in the configured `save_path`. You can visualize them using: 1. **Chrome Tracing**: Open `chrome://tracing` in a Chrome browser and load the JSON file. 2. **Perfetto**: Open [ui.perfetto.dev](https://ui.perfetto.dev/) and load the file (recommended for large traces). 3. **TensorBoard**: If using the TensorBoard plugin for PyTorch Profiler.