performed Expert Iteration (EI) on top of michaelbzhu/Qwen2.5-Math-1.5B-GSM8K-SFT
used the following hyperparameters:
n_ei_steps = 60
d_batch_size = 250
G = 30
micro_batch_size = 16
gradient_accum_steps = 2
lr = 1e-5
got these stats on gsm8k test set
correct format: 1312/1319
correct reward: 878/1319
- Downloads last month
- 4
Model tree for michaelbzhu/Qwen2.5-Math-1.5B-GSM8K-EI
Base model
Qwen/Qwen2.5-1.5B
Finetuned
Qwen/Qwen2.5-Math-1.5B
Finetuned
michaelbzhu/Qwen2.5-Math-1.5B-GSM8K-SFT