schrum2's picture
Update README.md
97bb14b verified
metadata
license: mit

Details on the code used to produce and use this model are available at:

https://github.com/schrum2/MarioDiffusion

That repo has instructions to check out this model and apply it to the generation of Super Mario Bros. level scenes. There is also an interactive GUI for constructing complete levels out of model-generated scenes.

This model makes use of https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1 as a text embedding model for use with diffusion to generate Mario levels. Mario scene captions contain multiple period-separated phrases, and this model embeds each phrase with its own sentence embedding vector for the diffusion model to use as text conditioning. It also makes use of negative guidance during diffusion training. Unfortunately, its performance is not great, and it is made available mainly for full transparency.

A model with better performance that uses sentence-transformers/multi-qa-MiniLM-L6-cos-v1
with regular captions and multiple sentence embeddings is https://huggingface.co/schrum2/MarioDiffusion-MiniLM-multiple-regular0. For a model that uses the same text embedding model, but embeds the entire caption as a single vector and also uses negative guidance, see https://huggingface.co/schrum2/MarioDiffusion-MiniLM-single-negative0. To see a model that uses a simple token-based transformer model for text embedding with negative guidance, see https://huggingface.co/schrum2/MarioDiffusion-MLM-negative0.