arithmetic-grpo / docs /workers /trtllm_worker.rst

initial clean commit

1faccd4 about 1 month ago

2.71 kB

	TensorRT-LLM Backend
	====================

	Last updated: 12/31/2025.

	Authored By TensorRT-LLM Team

	Introduction
	------------
	`TensorRT-LLM <https://github.com/NVIDIA/TensorRT-LLM>`_ is a high-performance LLM inference engine with state-of-the-art optimizations for NVIDIA GPUs.
	The verl integration of TensorRT-LLM is based on TensorRT-LLM's `Ray orchestrator <https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/ray_orchestrator>`_. This integration is in its early stage, with more features and optimizations to come.

	The TensorRT-LLM rollout engine primarily targets the colocated mode. Instead of relying purely on standard colocated mode, we adopted a mixed design combining aspects of the hybrid engine and colocated mode.

	Installation
	------------
	We provide ``docker/Dockerfile.stable.trtllm`` for building a docker image with TensorRT-LLM pre-installed. The verl integration is supported from ``nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6``, and you can choose other TensorRT-LLM versions via ``TRTLLM_BASE_IMAGE`` from the `NGC Catalog <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release>`_.

	Alternatively, refer to the `TensorRT-LLM installation guide <https://nvidia.github.io/TensorRT-LLM/installation/index.html>`_ for compatible environments if you want to build your own.

	Install verl with TensorRT-LLM:

	.. code-block:: bash

	pip install --upgrade pip
	pip install -e ".[trtllm]"

	.. note::

	Using the TensorRT-LLM rollout requires setting the following environment variables before launching the Ray cluster. These have been included in all the example scripts:

	.. code-block:: bash

	# Clean all SLURM/MPI/PMIx env to avoid PMIx mismatch error.
	for v in $(env \| awk -F= '/^(PMI\|PMIX\|MPI\|OMPI\|SLURM)_/{print $1}'); do
	unset "$v"
	done

	Using TensorRT-LLM as the Rollout Engine for GRPO
	-------------------------------------------------

	We provide the following GRPO recipe scripts for you to test the performance and accuracy curve of TensorRT-LLM as the rollout engine:

	.. code-block:: bash

	## For FSDP training engine
	bash examples/grpo_trainer/run_qwen2-7b_math_trtllm.sh
	## For Megatron-Core training engine
	bash examples/grpo_trainer/run_qwen2-7b_math_megatron_trtllm.sh

	Using TensorRT-LLM as the Rollout Engine for DAPO
	-------------------------------------------------

	We provide a DAPO recipe script ``recipe/dapo/test_dapo_7b_math_trtllm.sh``.

	.. code-block:: bash

	## For FSDP training engine
	bash recipe/dapo/test_dapo_7b_math_trtllm.sh
	## For Megatron-Core training engine
	TRAIN_ENGINE=megatron bash recipe/dapo/test_dapo_7b_math_trtllm.sh