Spaces:

ademarteau
/

RL-Inventory-Simulations

Runtime error

App Files Files Community

RL-Inventory-Simulations / agent

Commit History

refactor: remove Unsloth, use standard transformers + PEFT

355b2d5

RishbhaJain Claude Sonnet 4.6 commited on Mar 8

fix: align Unsloth config with recommended GRPO settings

d1c6fd5

RishbhaJain Claude Sonnet 4.6 commited on Mar 8

Merge branch 'main' of https://github.com/ademcodesproducts/OpenEnv-Inventory-Simulations

84565ee

ademarteau commited on Mar 8

fix: pipeline-aware ordering, YoY demand signal, reward rebalancing

c10dcd0

RishbhaJain Claude Sonnet 4.6 commited on Mar 8

feat: integrate Unsloth into GRPO training pipeline

4d42a14

RishbhaJain Claude Sonnet 4.6 commited on Mar 8

feat: full-horizon lookahead reward (365 days, <0.5ms)

af5c3c7

Arvind Sreenivas commited on Mar 8

feat: crash-resilient training with dataset caching and iteration resume

9ebd26d

Arvind Sreenivas commited on Mar 8

Remove PPO MLP agent, update README

e21ed94

ademarteau commited on Mar 8

feat: improve GRPO training logging and fix torch_dtype deprecation

7dea3a9

Arvind Sreenivas commited on Mar 8

Merge teammate changes, unify reward via reward.py, add PPO model

043e4e9

ademarteau commited on Mar 8

Added PPO model and reward.py

7ed1454

ademarteau commited on Mar 8

feat: improve training logging with tqdm, timings, GPU memory, ETA

766dc8c

Arvind Sreenivas commited on Mar 8

Added trained PPO model + app.py UI changes for HF Spaces

3cad082

ademarteau commited on Mar 8

Add P&L reward function, daily spoilage, stochastic lead time, and reward visualization

c041c09

RishbhaJain Claude Sonnet 4.6 commited on Mar 8

Add three-agent system: Claude LLM, PPO RL, and GRPO fine-tuned Qwen

de52704

Arvind Sreenivas commited on Mar 7

Commit History

refactor: remove Unsloth, use standard transformers + PEFT 355b2d5

fix: align Unsloth config with recommended GRPO settings d1c6fd5

Merge branch 'main' of https://github.com/ademcodesproducts/OpenEnv-Inventory-Simulations 84565ee

fix: pipeline-aware ordering, YoY demand signal, reward rebalancing c10dcd0

feat: integrate Unsloth into GRPO training pipeline 4d42a14

feat: full-horizon lookahead reward (365 days, <0.5ms) af5c3c7

feat: crash-resilient training with dataset caching and iteration resume 9ebd26d

Remove PPO MLP agent, update README e21ed94

feat: improve GRPO training logging and fix torch_dtype deprecation 7dea3a9

Merge teammate changes, unify reward via reward.py, add PPO model 043e4e9

Added PPO model and reward.py 7ed1454

feat: improve training logging with tqdm, timings, GPU memory, ETA 766dc8c

Added trained PPO model + app.py UI changes for HF Spaces 3cad082

Add P&L reward function, daily spoilage, stochastic lead time, and reward visualization c041c09

Add three-agent system: Claude LLM, PPO RL, and GRPO fine-tuned Qwen de52704

refactor: remove Unsloth, use standard transformers + PEFT

355b2d5

fix: align Unsloth config with recommended GRPO settings

d1c6fd5

Merge branch 'main' of https://github.com/ademcodesproducts/OpenEnv-Inventory-Simulations

84565ee

fix: pipeline-aware ordering, YoY demand signal, reward rebalancing

c10dcd0

feat: integrate Unsloth into GRPO training pipeline

4d42a14

feat: full-horizon lookahead reward (365 days, <0.5ms)

af5c3c7

feat: crash-resilient training with dataset caching and iteration resume

9ebd26d

Remove PPO MLP agent, update README

e21ed94

feat: improve GRPO training logging and fix torch_dtype deprecation

7dea3a9

Merge teammate changes, unify reward via reward.py, add PPO model

043e4e9

Added PPO model and reward.py

7ed1454

feat: improve training logging with tqdm, timings, GPU memory, ETA

766dc8c

Added trained PPO model + app.py UI changes for HF Spaces

3cad082

Add P&L reward function, daily spoilage, stochastic lead time, and reward visualization

c041c09

Add three-agent system: Claude LLM, PPO RL, and GRPO fine-tuned Qwen

de52704