Reward Functions
This module contains some useful reward functions, primarily intended for use with the [GRPOTrainer] and [RLOOTrainer].
accuracy_reward
[[autodoc]] rewards.accuracy_reward
reasoning_accuracy_reward
[[autodoc]] rewards.reasoning_accuracy_reward
think_format_reward
[[autodoc]] rewards.think_format_reward
get_soft_overlong_punishment
[[autodoc]] rewards.get_soft_overlong_punishment