Recommanded hyperparameters?

#27

by zhilinw6 - opened Aug 7, 2024

Aug 7, 2024

Any recommendations or insights on effective SFT hyperparameter settings, like lr, batch size, epochs, weight decay ...
Any advices on processing training data?

thenlper

Alibaba-NLP org Aug 14, 2024

You can refer to the training parameter settings introduced in the MGTE paper. The MGTE primarily focuses on encoder-only training, while the GTE-QWEN series models use LoRA for training. Apart from this factor, the other training hyperparameters and data strategies are similar.

https://arxiv.org/abs/2407.19669

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment