regarding of supported image token budgets

#20

by J22 - opened Apr 15

Discussion

J22

Apr 15

•

edited Apr 15

The supported token budgets are: 70, 140, 280, 560, and 1120.

Theoretically (mathematically), the reference model implementation in transformers could support any image size, with at least 3x3 patches and up to
10240x10240 (position_embedding_size) patches (~11648569 LLM tokens).

So, my questions are:

Are other token budgets (such as 100) truly not supported?
Is position_embedding_table fully trained?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment