trainer / app.py

Commit History

Fix CUDA device-side assert by adjusting max_prompt_length and disabling use_cache
fe123ff

mayank1365 commited on

Fix IndentationError and duplicate reward logic
0d40379

mayank1365 commited on

Fix dtype mismatch in training and update blog
463c260

mayank1365 commited on

fix: resolve tensor size mismatch in reward_function by indexing correctly
134ff83

mayank1365 commited on

fix: resolve Gradio JSON error by passing dicts directly instead of serialized strings
731ed55

mayank1365 commited on

fix: resolve 422 errors by sanitizing JSON and fix python syntax error
117c380

mayank1365 commited on

fix: improve GRPO learning signal and handle 422 environment errors
f5df9dc

mayank1365 commited on

feat: randomize number of facts (2-4) per episode for better training diversity
b676c5d

mayank1365 commited on

fix: address reward plateau by adding format rewards and improving GRPO logic
b588360

mayank1365 commited on

fix: resolve GPU detection issue on HF Spaces
ef2c4bf

mayank1365 commited on

Update app.py
69d4d30
verified

ayaan-ai commited on

Rename ap.py to app.py
01057a2
verified

ayaan-ai commited on