trainer

Paused

App Files Files Community

trainer

Commit History

Fix CUDA device-side assert by adjusting max_prompt_length and disabling use_cache

fe123ff

mayank1365 commited on 24 days ago

Fix IndentationError and duplicate reward logic

0d40379

mayank1365 commited on 24 days ago

Fix dtype mismatch in training and update blog

463c260

mayank1365 commited on 24 days ago

fix: resolve tensor size mismatch in reward_function by indexing correctly

134ff83

mayank1365 commited on 24 days ago

fix: resolve Gradio JSON error by passing dicts directly instead of serialized strings

731ed55

mayank1365 commited on 24 days ago

fix: resolve 422 errors by sanitizing JSON and fix python syntax error

117c380

mayank1365 commited on 24 days ago

fix: explicitly add unsloth_zoo to requirements to resolve ImportError

f8ddd90

mayank1365 commited on 24 days ago

fix: unpin torch and use simpler unsloth install to resolve HF build cache miss

091e56d

mayank1365 commited on 24 days ago

fix: downgrade CUDA to 12.1 and pin torch versions for HF Space compatibility

723a53e

mayank1365 commited on 24 days ago

fix: improve GRPO learning signal and handle 422 environment errors

f5df9dc

mayank1365 commited on 24 days ago

feat: randomize number of facts (2-4) per episode for better training diversity

b676c5d

mayank1365 commited on 24 days ago

fix: address reward plateau by adding format rewards and improving GRPO logic

b588360

mayank1365 commited on 24 days ago

fix: resolve GPU detection issue on HF Spaces

ef2c4bf

mayank1365 commited on 24 days ago

Update app.py

69d4d30
verified

ayaan-ai commited on 24 days ago

Update requirements.txt

b8fb5aa
verified

ayaan-ai commited on 24 days ago

Update requirements.txt

392faa7
verified

ayaan-ai commited on 24 days ago

Update requirements.txt

b4a6032
verified

ayaan-ai commited on 24 days ago

Update requirements.txt

47363ae
verified

ayaan-ai commited on 24 days ago

Create requirements.txt

dad4e20
verified

ayaan-ai commited on 24 days ago

Rename ap.py to app.py

01057a2
verified

ayaan-ai commited on 24 days ago

Rename train_suspect_x.py to ap.py

021896a
verified

ayaan-ai commited on 24 days ago

Upload train_suspect_x.py with huggingface_hub

0355697
verified

ayaan-ai commited on 24 days ago

initial commit

1d47818
verified

ayaan-ai commited on 24 days ago

Commit History

Fix CUDA device-side assert by adjusting max_prompt_length and disabling use_cache fe123ff

Fix IndentationError and duplicate reward logic 0d40379

Fix dtype mismatch in training and update blog 463c260

fix: resolve tensor size mismatch in reward_function by indexing correctly 134ff83

fix: resolve Gradio JSON error by passing dicts directly instead of serialized strings 731ed55

fix: resolve 422 errors by sanitizing JSON and fix python syntax error 117c380

fix: explicitly add unsloth_zoo to requirements to resolve ImportError f8ddd90

fix: unpin torch and use simpler unsloth install to resolve HF build cache miss 091e56d

fix: downgrade CUDA to 12.1 and pin torch versions for HF Space compatibility 723a53e

fix: improve GRPO learning signal and handle 422 environment errors f5df9dc

feat: randomize number of facts (2-4) per episode for better training diversity b676c5d

fix: address reward plateau by adding format rewards and improving GRPO logic b588360

fix: resolve GPU detection issue on HF Spaces ef2c4bf

Update app.py 69d4d30 verified

Update requirements.txt b8fb5aa verified

Update requirements.txt 392faa7 verified

Update requirements.txt b4a6032 verified

Update requirements.txt 47363ae verified

Create requirements.txt dad4e20 verified

Rename ap.py to app.py 01057a2 verified

Rename train_suspect_x.py to ap.py 021896a verified

Upload train_suspect_x.py with huggingface_hub 0355697 verified

initial commit 1d47818 verified

Fix CUDA device-side assert by adjusting max_prompt_length and disabling use_cache

fe123ff

Fix IndentationError and duplicate reward logic

0d40379

Fix dtype mismatch in training and update blog

463c260

fix: resolve tensor size mismatch in reward_function by indexing correctly

134ff83

fix: resolve Gradio JSON error by passing dicts directly instead of serialized strings

731ed55

fix: resolve 422 errors by sanitizing JSON and fix python syntax error

117c380

fix: explicitly add unsloth_zoo to requirements to resolve ImportError

f8ddd90

fix: unpin torch and use simpler unsloth install to resolve HF build cache miss

091e56d

fix: downgrade CUDA to 12.1 and pin torch versions for HF Space compatibility

723a53e

fix: improve GRPO learning signal and handle 422 environment errors

f5df9dc

feat: randomize number of facts (2-4) per episode for better training diversity

b676c5d

fix: address reward plateau by adding format rewards and improving GRPO logic

b588360

fix: resolve GPU detection issue on HF Spaces

ef2c4bf

Update app.py

69d4d30
verified

Update requirements.txt

b8fb5aa
verified

Update requirements.txt

392faa7
verified

Update requirements.txt

b4a6032
verified

Update requirements.txt

47363ae
verified

Create requirements.txt

dad4e20
verified

Rename ap.py to app.py

01057a2
verified

Rename train_suspect_x.py to ap.py

021896a
verified

Upload train_suspect_x.py with huggingface_hub

0355697
verified

initial commit

1d47818
verified