regarding of supported image token budgets

#20
by J22 - opened

The supported token budgets are: 70, 140, 280, 560, and 1120.

Theoretically (mathematically), the reference model implementation in transformers could support any image size, with at least 3x3 patches and up to
10240x10240 (position_embedding_size) patches (~11648569 LLM tokens).

So, my questions are:

  1. Are other token budgets (such as 100) truly not supported?
  2. Is position_embedding_table fully trained?

Sign up or log in to comment