File size: 855 Bytes
ff8fd11 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | Trainer Interface
================================
Last updated: 06/08/2025 (API docstrings are auto-generated).
Trainers drive the training loop. Introducing new trainer classes in case of new training paradiam is encouraged.
.. autosummary::
:nosignatures:
verl.trainer.ppo.ray_trainer.RayPPOTrainer
Core APIs
~~~~~~~~~~~~~~~~~
.. autoclass:: verl.trainer.ppo.ray_trainer.RayPPOTrainer
:members: __init__, init_workers, fit
.. automodule:: verl.utils.tokenizer
:members: hf_tokenizer
.. automodule:: verl.trainer.ppo.core_algos
:members: agg_loss, kl_penalty, compute_policy_loss, kl_penalty
.. automodule:: verl.trainer.ppo.reward
:members: load_reward_manager, compute_reward, compute_reward_async
.. autoclass:: verl.workers.reward_manager.NaiveRewardManager
.. autoclass:: verl.workers.reward_manager.DAPORewardManager
|