trl-mcsd / docs /source /rewards.md
ihbkaiser's picture
Implement MCSD for experimental SDPO
1fa3c6c verified
# Reward Functions
This module contains some useful reward functions, primarily intended for use with the [`GRPOTrainer`] and [`RLOOTrainer`].
## accuracy_reward
[[autodoc]] rewards.accuracy_reward
## reasoning_accuracy_reward
[[autodoc]] rewards.reasoning_accuracy_reward
## think_format_reward
[[autodoc]] rewards.think_format_reward
## get_soft_overlong_punishment
[[autodoc]] rewards.get_soft_overlong_punishment