Number of tokens used to train 1B

#19

by bczhang - opened May 19, 2025

May 19, 2025

Hi! The model card says the 1B model is trained with 3T tokens, but the paper says it used 2T tokens. Which one is the correct number of training tokens?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment