SoRL Post-Trained Qwen2.5-0.5B on GSM8K

Base model: Qwen/Qwen2.5-0.5B Training method: SoRL (Self-Organizing Reinforcement Learning) post-training Dataset: GSM8K (train split) Accuracy: 35.6% on GSM8K test (vs 27.1% SFT baseline)

SoRL Config

Parameter Value
K (insertion frequency) 4
abstract_vocab_size 128
max_iterations 2
alpha_info_gain 10.0
alpha_abs 0.1
alpha_soft_zipf 5.0
num_rollouts 4
lr 1e-5
batch_size 2
num_epochs 3
max_length 512

Checkpoint Format

PyTorch checkpoint (.pt) with keys: model, optimizer, loss_fn, config, step, epoch.

Load with SorlModelWrapper from the sorl package.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ksgk-fy/sorl_pt

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(529)
this model

Dataset used to train Ksgk-fy/sorl_pt