license: openrail
metrics:
- accuracy
- bertscore
- bleu
- bleurt
- brier_score
- cer
- character
- charcut_mt
- chrf
- code_eval
tags:
- text-to-image
- sygil-devs
- Muse
- Sygil-Muse
pipeline_tag: text-to-image
Model Card for Model ID
This model is based in Muse and trained using the code hosted on ZeroCool940711/muse-maskgit-pytorch, which is based on lucidrains/muse-maskgit-pytorch.
Model Details
This model is a new model trained from scratch based on Muse, trained on the Imaginary Network Expanded Dataset, with the big advantage of allowing the use of multiple namespaces (labeled tags) to control various parts of the final generation. The use of namespaces (eg. “species:seal” or “studio:dc”) stops the model from misinterpreting a seal as the singer Seal, or DC Comics as Washington DC.
Note: As of right now, only the first VAE and MaskGit has been trained, we still need to train the Super Resolution VAE for the model to be usable even tho we might be able to reuse the first VAE depending on the quality of it once the training progresses more.
If you find my work useful, please consider supporting me on GitHub Sponsors!
This model is still in its infancy and it's meant to be constantly updated and trained with more and more data as time goes by, so feel free to give us feedback on our Discord Server or on the discussions section on huggingface. We plan to improve it with more, better tags in the future, so any help is always welcome.
Available Checkpoints:
Stable:
- vae.sygil_muse_v0.1.pt: Trained from scratch for 3.0M steps with dim: 128 and vq_codebook_size: 256.
- maskgit.sygil_muse_v0.1.pt: Maskgit trained from the VAE for 3.46M steps
- vae.sygil_muse_v0.5.pt: Trained from scratch for 1.99M steps with dim: 128 and vq_codebook_size: 8192.
Beta:
- vae.87000.pt: Trained from scratch for 87K steps and higher vq_codebook_dim and vq_codebook_size than before.
- maskgit.39000.pt: Maskgit trained from the VAE for 39K steps using the hyperparameters
heads 16anddepth 22for testing, these values have huge performance effects, the vram usage was also increased so it is just for testing, the quality on this checkpoint did increase a lot and requires a lot less training which is something we want but we need to find a balance between quality and performance.
Note: Checkpoints under the Beta section are updated daily or at least 3-4 times a week. While the beta checkpoints can be used as they are only the latest version is kept on the repo and the older checkpoints are removed when a new one is uploaded to keep the repo clean.
Training
Training Data: The model was trained on the following dataset:
- Imaginary Network Expanded Dataset dataset.
Hardware and others
Hardware: 1 x Nvidia RTX 3050 GPU
Hours Trained: NaN.
Gradient Accumulations: 20
Batch: 1
Learning Rate: 1e-5
Learning Rate Scheduler:
constant_with_warmupScheduler Power: 0.5
Optimizer: Adam
Warmup Steps: 10,000
Number of Cycles: 1
Resolution/Image Size: First trained at a resolution of 64x64, then increased to 256x256 and then to 512x512. Check the notes down below for more details on this.
Dimension: 128
vq_codebook_dim: 4096
vq_codebook_size: 16384
heads: 8
depth: 4
Random Crop: True
Total Training Steps: 87,000
Note: On Muse we can change the image_size or resolution at any time without having to train the model from scratch again, this allows us to first train the model at low resolution using the same
dimandvq_codebook_sizeto train faster and then we can increase theimage_sizeand use a higher resolution once the model has trained enough.
Developed by: ZeroCool at Sygil-Dev
License
This model is open access and available to all, with a CreativeML Open RAIL++-M License further specifying rights and usage.