Reward_model / README.md
ZSYNOTZSH's picture
Update README.md
931c608 verified

rmx represent prm that trained on (no required for terminal) node pair under same parent node.

rm3 has the best overall test set accuracy

rm4.5 and rm6 have higher accuracy on shallow depth and relative good accuracy on overall test set

rm2-x represent orm that trained on terminal node pair (no required for same parent node).

rm2-9.5 has the best overall test set accuracy

rm2-2, rm2-9, rm2-2.5 have relative good accuracy on overall test set

rm3-x represent prm that trained on (no required for terminal) node pair (no required for same parent node).

rm3-5 has the best overall test set accuracy

rm3-2, rm3-8, rm3-2.5 have relative good accuracy on overall test set