Output attention shape

by Yingshu - opened Sep 24, 2024

Sep 24, 2024

I explored the output attentions. I found the output is a tuple containing 4 elements. The size of each element is:
[64,4,49,49]
[16,8,49,49]
[4,16,49,49]
[1,32,49,49]

I know the second dimension, 4, 8, 16, 32 is the num_heads. I want to ask what the first dimension (64, 16, 4, 1) is.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment