refactor: remove Unsloth, use standard transformers + PEFT 355b2d5 RishbhaJain Claude Sonnet 4.6 commited on Mar 8
fix: align Unsloth config with recommended GRPO settings d1c6fd5 RishbhaJain Claude Sonnet 4.6 commited on Mar 8
Merge branch 'main' of https://github.com/ademcodesproducts/OpenEnv-Inventory-Simulations 84565ee ademarteau commited on Mar 8
fix: pipeline-aware ordering, YoY demand signal, reward rebalancing c10dcd0 RishbhaJain Claude Sonnet 4.6 commited on Mar 8
feat: integrate Unsloth into GRPO training pipeline 4d42a14 RishbhaJain Claude Sonnet 4.6 commited on Mar 8
feat: crash-resilient training with dataset caching and iteration resume 9ebd26d Arvind Sreenivas commited on Mar 8
feat: improve GRPO training logging and fix torch_dtype deprecation 7dea3a9 Arvind Sreenivas commited on Mar 8
Merge teammate changes, unify reward via reward.py, add PPO model 043e4e9 ademarteau commited on Mar 8
feat: improve training logging with tqdm, timings, GPU memory, ETA 766dc8c Arvind Sreenivas commited on Mar 8
Add P&L reward function, daily spoilage, stochastic lead time, and reward visualization c041c09 RishbhaJain Claude Sonnet 4.6 commited on Mar 8
Add three-agent system: Claude LLM, PPO RL, and GRPO fine-tuned Qwen de52704 Arvind Sreenivas commited on Mar 7