gustproof
/

sd-models

Model card Files Files and versions

xet

Community

gustproof commited on Apr 4, 2023

Commit

3250b65

1 Parent(s): 177b4d2

Create posts/1to2.md

Browse files

Files changed (1) hide show

posts/1to2.md +102 -0

posts/1to2.md ADDED Viewed

	@@ -0,0 +1,102 @@

+# 1to2: Training Multiple-Subject Models using only Single-Subject Data (Experimental)
+Updates will be mirrored on both Hugging Face and Civitai.
+## Introduction
+[It has been shown that multiple characters can be trained into the model](https://civitai.com/models/23476/the-idolmster-cinderella-girls-starlight-stage-style-90-characters). A harder task is to create a model that can generate multiple characters simultaneously without modifying the generation pipeline. This document describes a simple technique that has been shown to help generating multiple characters in the same image.
+## Method
+```
+Requirement: Sets of single-character images
+Steps:
+1. Train a multi-concept model using the original dataset
+2. Create an augmentation dataset of joined image pairs from the original dataset
+3. Train on the augmentation dataset
+```
+## Experiment
+### Setup
+3 characters from the game Cinderella Girls are chosen for the experiment. The base model is `anime-final-pruned`. It has been checked that the base model has minimal knowledge of the trained characters.
+For the captions of the joined images, the template format `CharLeft/CharRight/COMPOSITE, TagsLeft, TagsRight` is used.
+A LoRA (Hadamard product) is trained using the config file below:
+```
+[model_arguments]
+v2 = false
+v_parameterization = false
+pretrained_model_name_or_path = "Animefull-final-pruned.ckpt"
+[additional_network_arguments]
+no_metadata = false
+unet_lr = 0.0005
+text_encoder_lr = 0.0005
+network_module = "lycoris.kohya"
+network_dim = 8
+network_alpha = 1
+network_args = [ "conv_dim=0", "conv_alpha=16", "algo=loha",]
+network_train_unet_only = false
+network_train_text_encoder_only = false
+[optimizer_arguments]
+optimizer_type = "AdamW8bit"
+learning_rate = 0.0005
+max_grad_norm = 1.0
+lr_scheduler = "cosine"
+lr_warmup_steps = 0
+[dataset_arguments]
+debug_dataset = false
+# keep token 1
+[training_arguments]
+output_name = "cg3comp"
+save_precision = "fp16"
+save_every_n_epochs = 1
+train_batch_size = 2
+max_token_length = 225
+mem_eff_attn = false
+xformers = true
+max_train_epochs = 40
+max_data_loader_n_workers = 8
+persistent_data_loader_workers = true
+gradient_checkpointing = false
+gradient_accumulation_steps = 1
+mixed_precision = "fp16"
+clip_skip = 2
+lowram = true
+[sample_prompt_arguments]
+sample_every_n_epochs = 1
+sample_sampler = "k_euler_a"
+[saving_arguments]
+save_model_as = "safetensors"
+```
+For the second stage of training, the batch size was reduced to 2 while keeping other settings identical.
+The training took less than 2 hours on a T4 GPU.
+### Results
+(see preview images)
+## Limitations
+* This technique doubles the memory/compute requirement
+* Composites can still be generated despite negative prompting
+* Cloned characters seem to become the primary failure mode in place of blended characters
+## Related Works
+Models been trained on datasets based on anime shows have [demonstrated](https://civitai.com/models/21305/) multi-subject capabilty.
+Simply using concepts distant enough such as `1girl, 1boy`  [has also been shown to be effective](https://civitai.com/models/17640/).
+## Future work
+Below is a list of ideas yet to be explored
+* Synthetic datasets
+* Regularatization
+* Joint training instaed of sequential