Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
danielhanchen 
posted an update 10 days ago
Post
2754
You can now do reinforcement learning training with 7× longer context and no accuracy loss, via our new batching algorithms.

Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.

Blog: https://unsloth.ai/docs/new/grpo-long-context