ihbkaiser
/

trl-mcsd

Model card Files Files and versions

trl-mcsd / docs /source /rewards.md

ihbkaiser's picture

Implement MCSD for experimental SDPO

1fa3c6c verified 26 days ago

|

history blame contribute delete

445 Bytes

	# Reward Functions

	This module contains some useful reward functions, primarily intended for use with the [`GRPOTrainer`] and [`RLOOTrainer`].

	## accuracy_reward

	[[autodoc]] rewards.accuracy_reward

	## reasoning_accuracy_reward

	[[autodoc]] rewards.reasoning_accuracy_reward

	## think_format_reward

	[[autodoc]] rewards.think_format_reward

	## get_soft_overlong_punishment

	[[autodoc]] rewards.get_soft_overlong_punishment