performed Expert Iteration (EI) on top of michaelbzhu/Qwen2.5-Math-1.5B-GSM8K-SFT

used the following hyperparameters:

n_ei_steps = 60
d_batch_size = 250
G = 30
micro_batch_size = 16
gradient_accum_steps = 2
lr = 1e-5

got these stats on gsm8k test set

correct format: 1312/1319
correct reward: 878/1319

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for michaelbzhu/Qwen2.5-Math-1.5B-GSM8K-EI

Base model

Finetuned

Finetuned

Finetuned

(1)

this model

michaelbzhu
/

Qwen2.5-Math-1.5B-GSM8K-EI