Gridmind / scripts /train_unsloth.py

Commit History

feat: implement Unsloth GRPO training pipeline with environment-backed reward functions and balanced dataset generation
27d3504

adityss commited on

fix: disable AMP for quantized models to avoid gradient scaler issues in GRPO training
19ba2eb

adityss commited on

feat: update GRPO training configuration with additional parameters for logging and precision
f3ecc94

adityss commited on

Adjust logging configuration for training: log every step, enable completion metrics, and limit completions printed per step.
a6b45e9

adityss commited on

feat: implement GridMind-RL training pipeline with GRPO Colab notebook and Unsloth configuration script
b0701ef

Prajwal782007 commited on

feat: implement Unsloth GRPO training script with environment-based reward tracking and balanced dataset generation
32d5b8f

Prajwal782007 commited on

feat: add GridMind GRPO training environment and Unsloth training script
3d49e8a

Prajwal782007 commited on

feat: add script to migrate max_new_tokens from GRPOConfig to GRPOTrainer in notebook
08731ee

Prajwal782007 commited on

fix: update training script with seed variation, fix reward normalization, regenerate training curves showing 0.52->0.67 improvement
bdc9954

adityss commited on

fix: training reward uses 8-step rollout + /grade for genuine episode-level signal
c70e17d

adityss commited on

feat: add baseline evaluation tools and demo scripts for RL performance comparison
c395f6a

adityss commited on