| Trainer Interface | |
| ================================ | |
| Last updated: 06/08/2025 (API docstrings are auto-generated). | |
| Trainers drive the training loop. Introducing new trainer classes in case of new training paradiam is encouraged. | |
| .. autosummary:: | |
| :nosignatures: | |
| verl.trainer.ppo.ray_trainer.RayPPOTrainer | |
| Core APIs | |
| ~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: verl.trainer.ppo.ray_trainer.RayPPOTrainer | |
| :members: __init__, init_workers, fit | |
| .. automodule:: verl.utils.tokenizer | |
| :members: hf_tokenizer | |
| .. automodule:: verl.trainer.ppo.core_algos | |
| :members: agg_loss, kl_penalty, compute_policy_loss, kl_penalty | |
| .. automodule:: verl.trainer.ppo.reward | |
| :members: load_reward_manager, compute_reward, compute_reward_async | |
| .. autoclass:: verl.workers.reward_manager.NaiveRewardManager | |
| .. autoclass:: verl.workers.reward_manager.DAPORewardManager | |