Batching token length

#144

by mishavee - opened Nov 30, 2022

Discussion

mishavee

Nov 30, 2022

When training bloom how many tokens could the input be? 2048?

A-bhishek-01

Jan 26, 2023

2048
reference: https://arxiv.org/pdf/2211.05100.pdf

TimeRobber

BigScience Workshop org Jan 27, 2023

Actually it was trained with sequence length 2048 but the model supports any length, you can try generating more tokens to infinity (with some performance degradation as you increase the length) it's linked to our use of alibi

mishavee

Jan 27, 2023

how much degradation? Are you saying I can put 100000 words in one training example?
thanks

TimeRobber

BigScience Workshop org Jan 28, 2023

It's specific to your setup.

For more explanation on what ALIBI is: https://arxiv.org/abs/2108.12409
For some plots where you can understand how good it becomes on long sequence, we had a preliminary result (on 1B model) in: https://arxiv.org/abs/2210.15424 (Figure 2)

That's from a modeling perspective. From a pure hardware perspective, longer sequence means more memory footprint, so you might get out of memory issues when using 100_000 words (depending on your setup).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment