grpo: add --eval_best_of (best-of-N eval for SFT and GRPO) 8a2235b verified yashvyasop commited on Apr 26
grpo: add --eval_best_of (best-of-N eval for SFT and GRPO) 97a5555 verified yashvyasop commited on Apr 26
GRPO v2_delta_phase reward + diversified states + tuned smoke schedule da8eeea verified yashvyasop commited on Apr 26
GRPO v2_delta_phase reward + diversified states + tuned smoke schedule 715e7e0 verified yashvyasop commited on Apr 26
Fix notebook: unsloth-first imports, HF spaces clone, GPU check 86e410d verified yashvyasop commited on Apr 26
Patch missing warnings_issued attr for transformers 5.x compat 9eb8e28 verified yashvyasop commited on Apr 26
Tighten pydantic pin to <2.11 to match mergekit ~=2.10.6 1994102 verified yashvyasop commited on Apr 26
Pin pydantic <2.12 after editable install to fix mergekit compat 80f9fa1 verified yashvyasop commited on Apr 26
Add weave dep (another TRL 0.24 hard import in callbacks.py) fdd1260 verified yashvyasop commited on Apr 26