openai/gsm8k
Benchmark • Updated • 17.6k • 923k • 1.32k
Base model: Qwen/Qwen2.5-0.5B Training method: SoRL (Self-Organizing Reinforcement Learning) post-training Dataset: GSM8K (train split) Accuracy: 35.6% on GSM8K test (vs 27.1% SFT baseline)
| Parameter | Value |
|---|---|
| K (insertion frequency) | 4 |
| abstract_vocab_size | 128 |
| max_iterations | 2 |
| alpha_info_gain | 10.0 |
| alpha_abs | 0.1 |
| alpha_soft_zipf | 5.0 |
| num_rollouts | 4 |
| lr | 1e-5 |
| batch_size | 2 |
| num_epochs | 3 |
| max_length | 512 |
PyTorch checkpoint (.pt) with keys: model, optimizer, loss_fn, config, step, epoch.
Load with SorlModelWrapper from the sorl package.
Base model
Qwen/Qwen2.5-0.5B