Ask questions about training data construction

by zzzzz2023 - opened Jan 21, 2025

zzzzz2023

Jan 21, 2025

•

edited Jan 21, 2025

Hello, I have seen the code of your model. I would like to know the construction way of label in training, and how to better calculate the loss by process reward.@Zhenru Thank you for your answer

zzzzz2023

Jan 21, 2025

loss in the model code is calculated as loss_fct(logits.view(-1, self.num_labels), labels.view(-1))，But here the logits are the probabilities of tokens in the assistant, how should labels be constructed and logits directly calculate the cross entropy

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment