SoRL Post-Trained Qwen2.5-0.5B on GSM8K
Base model: Qwen/Qwen2.5-0.5B Training method: SoRL (Self-Organizing Reinforcement Learning) post-training Dataset: GSM8K (train split) Accuracy: 35.6% on GSM8K test (vs 27.1% SFT baseline)
SoRL Config
| Parameter | Value |
|---|---|
| K (insertion frequency) | 4 |
| abstract_vocab_size | 128 |
| max_iterations | 2 |
| alpha_info_gain | 10.0 |
| alpha_abs | 0.1 |
| alpha_soft_zipf | 5.0 |
| num_rollouts | 4 |
| lr | 1e-5 |
| batch_size | 2 |
| num_epochs | 3 |
| max_length | 512 |
Checkpoint Format
PyTorch checkpoint (.pt) with keys: model, optimizer, loss_fn, config, step, epoch.
Load with SorlModelWrapper from the sorl package.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Ksgk-fy/sorl_pt
Base model
Qwen/Qwen2.5-0.5B