Upload a food image and retrieve the most visually similar dishes using image embeddings generated by a Vision Transformer trained with a Masked Autoencoder.
No image selected