Commit History

fix: clamp grader rewards to strictly (0, 1) to pass OpenEnv validation bounds
f3f7bc4

Sibam commited on

feat: PreferenceLab complete - RLHF preference simulation OpenEnv environment
b9664a2

Sibam commited on

fix: conform to OpenEnv base interface contract
7574c9a

Sibam commited on

PreferenceLab OpenEnv environment for RLHF preference simulation
cdf485e

Sibam commited on