461 kB
nihalaninihal's picture
Claude Opus 4.6
Fix critical RL reward function exploits and training hyperparameters
803c93e