Qwen3 From Scratch GRPO Checkpoints

This repository contains GRPO training checkpoints for the rasbt/qwen3-from-scratch model.

  • grpo_original_no_kl: Checkpoints for the original GRPO algorithm without KL divergence term. For more information, see the code that generated these checkpoints.
    • Hyperparameters: num_rollouts=8, max_new_tokens=512, temperature=0.8, top_p=0.9, lr=1e-5

See chapter 6 section 3.12 for more information on how to load and use these checkpoints

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support