Commit History

refactor: remove Unsloth, use standard transformers + PEFT
355b2d5

RishbhaJain Claude Sonnet 4.6 commited on

fix: align Unsloth config with recommended GRPO settings
d1c6fd5

RishbhaJain Claude Sonnet 4.6 commited on

Merge branch 'main' of https://github.com/ademcodesproducts/OpenEnv-Inventory-Simulations
84565ee

ademarteau commited on

fix: pipeline-aware ordering, YoY demand signal, reward rebalancing
c10dcd0

RishbhaJain Claude Sonnet 4.6 commited on

feat: integrate Unsloth into GRPO training pipeline
4d42a14

RishbhaJain Claude Sonnet 4.6 commited on

feat: full-horizon lookahead reward (365 days, <0.5ms)
af5c3c7

Arvind Sreenivas commited on

feat: crash-resilient training with dataset caching and iteration resume
9ebd26d

Arvind Sreenivas commited on

Remove PPO MLP agent, update README
e21ed94

ademarteau commited on

feat: improve GRPO training logging and fix torch_dtype deprecation
7dea3a9

Arvind Sreenivas commited on

Merge teammate changes, unify reward via reward.py, add PPO model
043e4e9

ademarteau commited on

Added PPO model and reward.py
7ed1454

ademarteau commited on

feat: improve training logging with tqdm, timings, GPU memory, ETA
766dc8c

Arvind Sreenivas commited on

Added trained PPO model + app.py UI changes for HF Spaces
3cad082

ademarteau commited on

Add P&L reward function, daily spoilage, stochastic lead time, and reward visualization
c041c09

RishbhaJain Claude Sonnet 4.6 commited on

Add three-agent system: Claude LLM, PPO RL, and GRPO fine-tuned Qwen
de52704

Arvind Sreenivas commited on