Qwen3 From Scratch GRPO Checkpoints
This repository contains GRPO training checkpoints for the rasbt/qwen3-from-scratch model.
- grpo_original_no_kl: Checkpoints for the original GRPO algorithm without KL divergence term. For more information, see the code that generated these checkpoints.
- Hyperparameters:
num_rollouts=8,max_new_tokens=512,temperature=0.8,top_p=0.9,lr=1e-5
- Hyperparameters:
See chapter 6 section 3.12 for more information on how to load and use these checkpoints
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support