@danielhanchen on Hugging Face: "You can now do reinforcement learning training with 7× longer context and no…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Jan 15

Post

2935

You can now do reinforcement learning training with 7× longer context and no accuracy loss, via our new batching algorithms.

Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.

Blog: https://unsloth.ai/docs/new/grpo-long-context

In this post

danielhanchen Daniel (Unsloth)