grpo: add --eval_best_of (best-of-N eval for SFT and GRPO) 8a2235b verified yashvyasop commited on 22 days ago
grpo: add --eval_best_of (best-of-N eval for SFT and GRPO) 97a5555 verified yashvyasop commited on 22 days ago
GRPO v2_delta_phase reward + diversified states + tuned smoke schedule da8eeea verified yashvyasop commited on 22 days ago
GRPO v2_delta_phase reward + diversified states + tuned smoke schedule 715e7e0 verified yashvyasop commited on 22 days ago
Fix notebook: unsloth-first imports, HF spaces clone, GPU check 86e410d verified yashvyasop commited on 22 days ago
Patch missing warnings_issued attr for transformers 5.x compat 9eb8e28 verified yashvyasop commited on 22 days ago
Tighten pydantic pin to <2.11 to match mergekit ~=2.10.6 1994102 verified yashvyasop commited on 22 days ago
Pin pydantic <2.12 after editable install to fix mergekit compat 80f9fa1 verified yashvyasop commited on 22 days ago
Add weave dep (another TRL 0.24 hard import in callbacks.py) fdd1260 verified yashvyasop commited on 22 days ago