trl-mcsd / trl /experimental
1.48 MB
ihbkaiser's picture
Implement MCSD for experimental SDPO
1fa3c6c verified