Question about train data

#1
by Michalea - opened

Hello, thank you for the contribution.
I would like to ask you if you regenerated the data using GLM5.1 and then you trained the eagle head on regenerated data, or regeneration was skipped as it is costly process.

For GLM-5.1, the training data was not regenerated, we directly leveraged the regenerated dataset from Kimi k2.5. However, Qwen3.5 397B A22B was trained on a freshly regenerated dataset.

I would like to inquire about the resources used to train GLM5.1. Could you shere detials such as the data size, GPU models and quantities, training duration, and whether online or offline training mode was used? Additionally, what were the epoch and batch size?
Thank you for your great contribution.

Sign up or log in to comment