Does it contain the Vision-Language Projector MLP?

#2
by MrDojo0 - opened

Does this model contain the MLP, that translates the embedding space of the ViT into the decoder's embedding space?

image.png

Sign up or log in to comment