Commit History

Fix CUDA device-side assert by adjusting max_prompt_length and disabling use_cache
fe123ff

mayank1365 commited on

Fix IndentationError and duplicate reward logic
0d40379

mayank1365 commited on

Fix dtype mismatch in training and update blog
463c260

mayank1365 commited on

fix: resolve tensor size mismatch in reward_function by indexing correctly
134ff83

mayank1365 commited on

fix: resolve Gradio JSON error by passing dicts directly instead of serialized strings
731ed55

mayank1365 commited on

fix: resolve 422 errors by sanitizing JSON and fix python syntax error
117c380

mayank1365 commited on

fix: explicitly add unsloth_zoo to requirements to resolve ImportError
f8ddd90

mayank1365 commited on

fix: unpin torch and use simpler unsloth install to resolve HF build cache miss
091e56d

mayank1365 commited on

fix: downgrade CUDA to 12.1 and pin torch versions for HF Space compatibility
723a53e

mayank1365 commited on

fix: improve GRPO learning signal and handle 422 environment errors
f5df9dc

mayank1365 commited on

feat: randomize number of facts (2-4) per episode for better training diversity
b676c5d

mayank1365 commited on

fix: address reward plateau by adding format rewards and improving GRPO logic
b588360

mayank1365 commited on

fix: resolve GPU detection issue on HF Spaces
ef2c4bf

mayank1365 commited on

Update app.py
69d4d30
verified

ayaan-ai commited on

Update requirements.txt
b8fb5aa
verified

ayaan-ai commited on

Update requirements.txt
392faa7
verified

ayaan-ai commited on

Update requirements.txt
b4a6032
verified

ayaan-ai commited on

Update requirements.txt
47363ae
verified

ayaan-ai commited on

Create requirements.txt
dad4e20
verified

ayaan-ai commited on

Rename ap.py to app.py
01057a2
verified

ayaan-ai commited on

Rename train_suspect_x.py to ap.py
021896a
verified

ayaan-ai commited on

Upload train_suspect_x.py with huggingface_hub
0355697
verified

ayaan-ai commited on

initial commit
1d47818
verified

ayaan-ai commited on