Virchow output PRISM input latent dims mismatch

#21
by lkuhn - opened

Hello,

This might be a stupid question but i cant seem to find the answer.
I want to use prism to compute slide level embeddings and perform content based image retrieval based on embedding similarity.
As your documentation specifies i am using Virchow to compute patch level embeddings, but Virchow outputs tile embeddings with 1280 latent dimension and PRISM expects 2560 as given by the example and my error message.
In the example it is stated to consult the Virchow repo for guidance on how to compute embeddings but I cant find how to obtain them in 2560, is there another virchow model that does that?

Thank you for your help!

image

image

Paige AI org

Yes, 1280 is the embedding dim size of the Virchow architecture for the class and patch tokens. However, in the Virchow repo documentation we use the concatenation of the 1280-d class token and the 1280-d average of the patch tokens as the full tile embedding:

image

This is how we arrive to a 2560-d input for PRISM.

Oh i see, I assumed that just the class token was used, Thanks for your Answer!

lkuhn changed discussion status to closed

Sign up or log in to comment