Disty0
/

sote-diffusion-cascade_pre-alpha0

+---
+pipeline_tag: text-to-image
+license: other
+license_name: stable-cascade-nc-community
+license_link: LICENSE
+---
+# SoteDiffusion Cascade
+Anime finetune of Stable Cascade.
+Currently is in very early state in training.
+No commercial use thanks to StabilityAI.
+## Code Example
+```shell
+pip install diffusers
+```
+```python
+import torch
+from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
+prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body,"
+negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child,"
+prior = StableCascadePriorPipeline.from_pretrained("Disty0|SoteDiffusion-Cascade_pre-alpha0", torch_dtype=torch.float16)
+decoder = StableCascadeDecoderPipeline.from_pretrained("SoteDiffusion-Cascade_Decoder", torch_dtype=torch.float16)
+prior.enable_model_cpu_offload()
+prior_output = prior(
+    prompt=prompt,
+    height=1024,
+    width=1024,
+    negative_prompt=negative_prompt,
+    guidance_scale=6.0,
+    num_images_per_prompt=1,
+    num_inference_steps=30
+)
+decoder.enable_model_cpu_offload()
+decoder_output = decoder(
+    image_embeddings=prior_output.image_embeddings.to(torch.float16),
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    guidance_scale=1.0,
+    output_type="pil",
+    num_inference_steps=10
+).images[0]
+decoder_output.save("cascade.png")
+```
+## Training Status:
+**GPU used for training**: 1x AMD RX 7900 XTX 24GB
+| dataset name | training done | remaining |
+|---|---|---|
+| **newest** | 002 | 218 |
+| **late** | 002 | 204 |
+| **mid** | 002 | 199 |
+| **early** | 002 | 053 |
+| **oldest** | 002 | 014 |
+| **pixiv** | 002 | 072 |
+| **visual novel cg** | 002 | 068 |
+| **anime wallpaper** | 002 | 011 |
+| **Total** | 24 | 863 |
+**Note**: chunks starts from 0 and there are 8000 images per chunk
+## Dataset:
+**GPU used for captioning**: 1x Intel ARC A770 16GB
+**Model used for captioning**: SmilingWolf|wd-v1-4-convnextv2-tagger-v2
+| dataset name | total images | total chunk |
+|---|---|---|
+| **newest** | 1.75M | 221 |
+| **late** | 1.65M | 207 |
+| **mid** | 1.60M | 202 |
+| **early** | 442K | 056 |
+| **oldest** | 128K | 017 |
+| **pixiv** | 594K | 075 |
+| **visual novel cg** | 560K | 071 |
+| **anime wallpaper** | 106K | 014 |
+| **Total** | 6.860.873 | 863 |
+**Note**: Smallest size is 1280x600 | 768.000 pixels
+## Tags:
+### Tag Format:
+```
+aesthetic tags, quality tags, custom tags, date tags, rest of the tags
+```
+### Date:
+| tag | date |
+|---|---|
+| **newest** | 2022 to 2024 |
+| **late** | 2019 to 2021 |
+| **mid** | 2015 to 2018 |
+| **early** | 2011 to 2014 |
+| **oldest** | 2005 to 2010 |
+### Aesthetic Tags:
+**Model used**: shadowlilac/aesthetic-shadow
+| score greater than | tag |
+|---|---|
+| **0.980** | extremely aesthetic |
+| **0.900** | very aesthetic |
+| **0.750** | aesthetic |
+| **0.500** | slightly aesthetic |
+| **0.350** | not displeasing |
+| **0.250** | not aesthetic |
+| **0.125** | slightly displeasing |
+| **0.025** | displeasing |
+| **rest of them** | very displeasing |
+### Quality Tags:
+**Model used**: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth
+| score greater than | tag |
+|---|---|
+| **0.980** | best quality |
+| **0.900** | high quality |
+| **0.750** | great quality |
+| **0.500** | medium quality |
+| **0.250** | normal quality |
+| **0.125** | bad quality |
+| **0.025** | low quality |
+| **rest of them** | worst quality |
+## Custom Tags:
+| dataset name | custom tag |
+|---|---|
+| **booru**: date, |
+| **pixiv**: art by Display_Name, |
+| **visual novel cg**: Full_VN_Name (short_3_letter_name), visual novel cg, |
+| **anime wallpaper**: anime wallpaper, |
+## Training Params:
+**Software used**: Kohya SD-Scripts with Stable Cascade branch
+**Base model**: KBlueLeaf/Stable-Cascade-FP16-fixed
+### Command:
+```
+accelerate launch  --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \
+--mixed_precision fp16 \
+--save_precision fp16 \
+--full_fp16 \
+--sdpa \
+--gradient_checkpointing \
+--resolution "1024,1024" \
+--train_batch_size 2 \
+--gradient_accumulation_steps 32 \
+--adaptive_loss_weight \
+--learning_rate 4e-6 \
+--lr_scheduler constant_with_warmup \
+--lr_warmup_steps 100 \
+--optimizer_type adafactor \
+--optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \
+--max_grad_norm 0 \
+--token_warmup_min 1 \
+--token_warmup_step 0 \
+--shuffle_caption \
+--caption_dropout_rate 0 \
+--caption_tag_dropout_rate 0 \
+--caption_dropout_every_n_epochs 0 \
+--dataset_repeats 1 \
+--save_state \
+--save_every_n_steps 128 \
+--sample_every_n_steps 32 \
+--max_token_length 225 \
+--max_train_epochs 1 \
+--caption_extension ".txt" \
+--max_data_loader_n_workers 2 \
+--persistent_data_loader_workers \
+--enable_bucket \
+--min_bucket_reso 256 \
+--max_bucket_reso 4096 \
+--bucket_reso_steps 64 \
+--bucket_no_upscale \
+--log_with tensorboard \
+--output_name sotediffusion-sc_3b \
+--train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002 \
+--in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002.json \
+--output_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2 \
+--logging_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2/logs \
+--resume /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1-state \
+--stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1.safetensors \
+--effnet_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors \
+--previewer_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/previewer.safetensors \
+--sample_prompts /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-prompt.txt
+```
+## Limitations and Bias
+### Bias
+- This model is intended for anime illustrations.
+  Realistic capabilites are not tested at all.
+- Current version has bias to older anime styles.
+### Limitations
+- Can fall back to realistic.
+  Use "anime illustration" tag to point it into the right direction.
+- Far shot eyes are bad thanks to the heavy latent compression.