Fix CUDA device-side assert by adjusting max_prompt_length and disabling use_cache fe123ff mayank1365 commited on 24 days ago
fix: resolve tensor size mismatch in reward_function by indexing correctly 134ff83 mayank1365 commited on 24 days ago
fix: resolve Gradio JSON error by passing dicts directly instead of serialized strings 731ed55 mayank1365 commited on 24 days ago
fix: resolve 422 errors by sanitizing JSON and fix python syntax error 117c380 mayank1365 commited on 24 days ago
fix: explicitly add unsloth_zoo to requirements to resolve ImportError f8ddd90 mayank1365 commited on 24 days ago
fix: unpin torch and use simpler unsloth install to resolve HF build cache miss 091e56d mayank1365 commited on 24 days ago
fix: downgrade CUDA to 12.1 and pin torch versions for HF Space compatibility 723a53e mayank1365 commited on 24 days ago
fix: improve GRPO learning signal and handle 422 environment errors f5df9dc mayank1365 commited on 24 days ago
feat: randomize number of facts (2-4) per episode for better training diversity b676c5d mayank1365 commited on 24 days ago
fix: address reward plateau by adding format rewards and improving GRPO logic b588360 mayank1365 commited on 24 days ago