Fix CUDA device-side assert by adjusting max_prompt_length and disabling use_cache fe123ff mayank1365 commited on Apr 26
fix: resolve tensor size mismatch in reward_function by indexing correctly 134ff83 mayank1365 commited on Apr 26
fix: resolve Gradio JSON error by passing dicts directly instead of serialized strings 731ed55 mayank1365 commited on Apr 26
fix: resolve 422 errors by sanitizing JSON and fix python syntax error 117c380 mayank1365 commited on Apr 26
fix: improve GRPO learning signal and handle 422 environment errors f5df9dc mayank1365 commited on Apr 26
feat: randomize number of facts (2-4) per episode for better training diversity b676c5d mayank1365 commited on Apr 26
fix: address reward plateau by adding format rewards and improving GRPO logic b588360 mayank1365 commited on Apr 26