feat: implement Unsloth GRPO training pipeline with environment-backed reward functions and balanced dataset generation 27d3504 adityss commited on 23 days ago
fix: disable AMP for quantized models to avoid gradient scaler issues in GRPO training 19ba2eb adityss commited on 23 days ago
feat: update GRPO training configuration with additional parameters for logging and precision f3ecc94 adityss commited on 23 days ago
Adjust logging configuration for training: log every step, enable completion metrics, and limit completions printed per step. a6b45e9 adityss commited on 23 days ago
feat: implement GridMind-RL training pipeline with GRPO Colab notebook and Unsloth configuration script b0701ef Prajwal782007 commited on 23 days ago
feat: implement Unsloth GRPO training script with environment-based reward tracking and balanced dataset generation 32d5b8f Prajwal782007 commited on 23 days ago
feat: add GridMind GRPO training environment and Unsloth training script 3d49e8a Prajwal782007 commited on 23 days ago
feat: add script to migrate max_new_tokens from GRPOConfig to GRPOTrainer in notebook 08731ee Prajwal782007 commited on 23 days ago
fix: update training script with seed variation, fix reward normalization, regenerate training curves showing 0.52->0.67 improvement bdc9954 adityss commited on 24 days ago
fix: training reward uses 8-step rollout + /grade for genuine episode-level signal c70e17d adityss commited on 24 days ago
feat: add baseline evaluation tools and demo scripts for RL performance comparison c395f6a adityss commited on 24 days ago