blog: expand GRPO mechanics, reward shaping narrative, and lessons learned 2fd537a Mohammed-Altaf Claude Sonnet 4.6 commited on 17 days ago
fix: sync README with current codebase 832ac5b Mohammed-Altaf Claude Sonnet 4.6 commited on 17 days ago
docs: replace README with clean developer-facing doc; blog post in BLOG.md 85118cd Mohammed-Altaf Claude Sonnet 4.6 commited on 17 days ago
fix: add training plots and clean up duplicate README headings 5905982 Mohammed-Altaf Claude Sonnet 4.6 commited on 17 days ago
docs: rewrite README as technical writeup for HF Spaces submission 93702ff Mohammed-Altaf Claude Sonnet 4.6 commited on 18 days ago
chore: refresh training artifacts and rename consume_reward_components to private 7909885 Mohammed-Altaf commited on 18 days ago
feat: add episode trace, refresh training dataset, and update eval metrics a422c8d Mohammed-Altaf commited on 18 days ago
refactor: move training code to scripts/, add train/eval split, tune GRPO hyperparams fad16c9 Mohammed-Altaf commited on 18 days ago
feat: add OpenEnv TRL wrapper, expand dataset, and add W&B eval tracking 6fa4fbd Mohammed-Altaf commited on 18 days ago
feat: add structured pruning action and random baseline policy d064b19 Mohammed-Altaf commited on 18 days ago
refactor: harden imports, add training extras, and rewrite README 5dd60b9 Mohammed-Altaf commited on 18 days ago
fix: resolve merge conflict markers in openenv.yaml and uv.lock c524b25 Mohammed-Altaf commited on 18 days ago
implement NeuralTuner RL environment for Snapdragon quantization 782222a Mohammed-Altaf commited on 18 days ago