SoRL Post-Trained Qwen2.5-0.5B on GSM8K

Base model: Qwen/Qwen2.5-0.5B Training method: SoRL (Self-Organizing Reinforcement Learning) post-training Dataset: GSM8K (train split) Accuracy: 35.6% on GSM8K test (vs 27.1% SFT baseline)

SoRL Config

Parameter	Value
K (insertion frequency)	4
abstract_vocab_size	128
max_iterations	2
alpha_info_gain	10.0
alpha_abs	0.1
alpha_soft_zipf	5.0
num_rollouts	4
lr	1e-5
batch_size	2
num_epochs	3
max_length	512

Checkpoint Format

PyTorch checkpoint (.pt) with keys: model, optimizer, loss_fn, config, step, epoch.

Load with SorlModelWrapper from the sorl package.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ksgk-fy/sorl_pt

Base model

Qwen/Qwen2.5-0.5B

Finetuned

(666)

this model

Ksgk-fy
/

sorl_pt

SoRL Post-Trained Qwen2.5-0.5B on GSM8K

SoRL Config

Checkpoint Format

Model tree for Ksgk-fy/sorl_pt

Dataset used to train Ksgk-fy/sorl_pt