repeat input_ids.size(1) in dim1

by kangyang - opened Apr 1

Apr 1

modeling_longcat_next.py, function prepare_inputs_for_generation,

input_ids = input_ids.repeat((2, input_ids.size(1)))
attention_mask = attention_mask.repeat((2, attention_mask.size(1)))

input_ids = input_ids.repeat((2, 1))
attention_mask = attention_mask.repeat((2, 1))

why do you repeat input_ids.size(1) in dim1?

Locke

Apr 2

Actually, we repeat input_ids's size at dim=0.

(For the simplicity of implementation, inference currently only supports batch_size=1.) However, during image generation, the CFG strategy is used, which usually needs an unconditional model forward. Therefore, we repeat the input_ids/attention_mask's batch_dim to 2 (masking the condition text tokens of input_ids[1]), combining two forward passes into one.

But the current implementation is actually insufficient; that is, after generating an image, it is impossible to output text or other content after the image (because the batch_size needs to be switched to 1, and the kv_cache needs to be remanaged). We will leave these challenges for the next iteration.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment