Gridmind / scripts /train_unsloth.py

Commit History

feat: implement Unsloth GRPO training script with diverse reward functions and logging
d2449aa

adityss commited on

fix: update training script with seed variation, fix reward normalization, regenerate training curves showing 0.52->0.67 improvement
bdc9954

adityss commited on

fix: training reward uses 8-step rollout + /grade for genuine episode-level signal
c70e17d

adityss commited on

feat: add baseline evaluation tools and demo scripts for RL performance comparison
c395f6a

adityss commited on