m1-32b / train_results.json
Can111's picture
Initial upload of qwen2.5-32b-instruct_deepseek-reasoner_2004_03-10-21_lr1e-5_wd1e-4_epo5_len32768_tbs1
d55c213 verified
raw
history blame
218 Bytes
{
"epoch": 4.924302788844622,
"total_flos": 238832327327744.0,
"train_loss": 0.3164117013254473,
"train_runtime": 47203.2069,
"train_samples_per_second": 0.212,
"train_steps_per_second": 0.007
}