| license: apache-2.0 | |
| base_model: | |
| - openai/clip-vit-base-patch32 | |
| # Multimodal Learning for Autoencoders | |
| Repository of my SIGGRAPH Asia publication. | |
| In Multimodal Autoencoder the image is reconstructed using image and text inputs rather than just only image input. | |
| https://dl.acm.org/doi/10.1145/3681756.3697974 |