feat: implement Unsloth GRPO training script with diverse reward functions and logging d2449aa adityss commited on 25 days ago
fix: update training script with seed variation, fix reward normalization, regenerate training curves showing 0.52->0.67 improvement bdc9954 adityss commited on 25 days ago
feat: add scripts/full_demo.py — unified 10-step demo proving all 4 hackathon themes operational 5636c9d adityss commited on 25 days ago
fix: training reward uses 8-step rollout + /grade for genuine episode-level signal c70e17d adityss commited on 25 days ago
feat: commit training evidence, update README with real scores, add demo scripts 8204dc0 adityss commited on 25 days ago
feat: add baseline evaluation tools and demo scripts for RL performance comparison c395f6a adityss commited on 25 days ago
feat: add GridMind GRPO training notebook using Unsloth and HF TRL bdadba1 adityss commited on 25 days ago
Add Task 4 instruction following, Curriculum Manager for self-improvement, and world modeling simulation 0af208b adityss commited on 28 days ago
feat: add OpenEnv submission validator script to check HF Space status and Docker build e82aa27 adityss commited on Apr 5