Commit History

added wandb in train file
a0e028b

Bhaskar commited on

Fix Unsloth dtype mismatch by locking precision mode
91ac1b6

Bhaskar commited on

Fix grpo reward parser for list/dist completions
8ca88e8

Bhaskar commited on

add colab notebook and updated the train and docker files and edited the requirements according to the codebase
2b9c8cb

Bhaskar commited on

updated the codebase for a10 24gb gpu
a8a8219

Bhaskar commited on

updated the codebase for a10 24gb gpu
a24d3c8

Bhaskar commited on

train file updated for local training
4714235

Bhaskar commited on

Round 2 Upgrade: Added GRPO train.py and vector-field reward shaping
51bb0d4

Bhaskar commited on