Create posts/1to2.md
Browse files- posts/1to2.md +102 -0
posts/1to2.md
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 1to2: Training Multiple-Subject Models using only Single-Subject Data (Experimental)
|
| 2 |
+
|
| 3 |
+
Updates will be mirrored on both Hugging Face and Civitai.
|
| 4 |
+
|
| 5 |
+
## Introduction
|
| 6 |
+
|
| 7 |
+
[It has been shown that multiple characters can be trained into the model](https://civitai.com/models/23476/the-idolmster-cinderella-girls-starlight-stage-style-90-characters). A harder task is to create a model that can generate multiple characters simultaneously without modifying the generation pipeline. This document describes a simple technique that has been shown to help generating multiple characters in the same image.
|
| 8 |
+
|
| 9 |
+
## Method
|
| 10 |
+
|
| 11 |
+
```
|
| 12 |
+
Requirement: Sets of single-character images
|
| 13 |
+
Steps:
|
| 14 |
+
1. Train a multi-concept model using the original dataset
|
| 15 |
+
2. Create an augmentation dataset of joined image pairs from the original dataset
|
| 16 |
+
3. Train on the augmentation dataset
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
## Experiment
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
### Setup
|
| 23 |
+
|
| 24 |
+
3 characters from the game Cinderella Girls are chosen for the experiment. The base model is `anime-final-pruned`. It has been checked that the base model has minimal knowledge of the trained characters.
|
| 25 |
+
|
| 26 |
+
For the captions of the joined images, the template format `CharLeft/CharRight/COMPOSITE, TagsLeft, TagsRight` is used.
|
| 27 |
+
|
| 28 |
+
A LoRA (Hadamard product) is trained using the config file below:
|
| 29 |
+
```
|
| 30 |
+
[model_arguments]
|
| 31 |
+
v2 = false
|
| 32 |
+
v_parameterization = false
|
| 33 |
+
pretrained_model_name_or_path = "Animefull-final-pruned.ckpt"
|
| 34 |
+
|
| 35 |
+
[additional_network_arguments]
|
| 36 |
+
no_metadata = false
|
| 37 |
+
unet_lr = 0.0005
|
| 38 |
+
text_encoder_lr = 0.0005
|
| 39 |
+
network_module = "lycoris.kohya"
|
| 40 |
+
network_dim = 8
|
| 41 |
+
network_alpha = 1
|
| 42 |
+
network_args = [ "conv_dim=0", "conv_alpha=16", "algo=loha",]
|
| 43 |
+
network_train_unet_only = false
|
| 44 |
+
network_train_text_encoder_only = false
|
| 45 |
+
|
| 46 |
+
[optimizer_arguments]
|
| 47 |
+
optimizer_type = "AdamW8bit"
|
| 48 |
+
learning_rate = 0.0005
|
| 49 |
+
max_grad_norm = 1.0
|
| 50 |
+
lr_scheduler = "cosine"
|
| 51 |
+
lr_warmup_steps = 0
|
| 52 |
+
|
| 53 |
+
[dataset_arguments]
|
| 54 |
+
debug_dataset = false
|
| 55 |
+
# keep token 1
|
| 56 |
+
|
| 57 |
+
[training_arguments]
|
| 58 |
+
output_name = "cg3comp"
|
| 59 |
+
save_precision = "fp16"
|
| 60 |
+
save_every_n_epochs = 1
|
| 61 |
+
train_batch_size = 2
|
| 62 |
+
max_token_length = 225
|
| 63 |
+
mem_eff_attn = false
|
| 64 |
+
xformers = true
|
| 65 |
+
max_train_epochs = 40
|
| 66 |
+
max_data_loader_n_workers = 8
|
| 67 |
+
persistent_data_loader_workers = true
|
| 68 |
+
gradient_checkpointing = false
|
| 69 |
+
gradient_accumulation_steps = 1
|
| 70 |
+
mixed_precision = "fp16"
|
| 71 |
+
clip_skip = 2
|
| 72 |
+
lowram = true
|
| 73 |
+
|
| 74 |
+
[sample_prompt_arguments]
|
| 75 |
+
sample_every_n_epochs = 1
|
| 76 |
+
sample_sampler = "k_euler_a"
|
| 77 |
+
|
| 78 |
+
[saving_arguments]
|
| 79 |
+
save_model_as = "safetensors"
|
| 80 |
+
```
|
| 81 |
+
For the second stage of training, the batch size was reduced to 2 while keeping other settings identical.
|
| 82 |
+
The training took less than 2 hours on a T4 GPU.
|
| 83 |
+
|
| 84 |
+
### Results
|
| 85 |
+
(see preview images)
|
| 86 |
+
|
| 87 |
+
## Limitations
|
| 88 |
+
* This technique doubles the memory/compute requirement
|
| 89 |
+
* Composites can still be generated despite negative prompting
|
| 90 |
+
* Cloned characters seem to become the primary failure mode in place of blended characters
|
| 91 |
+
|
| 92 |
+
## Related Works
|
| 93 |
+
|
| 94 |
+
Models been trained on datasets based on anime shows have [demonstrated](https://civitai.com/models/21305/) multi-subject capabilty.
|
| 95 |
+
Simply using concepts distant enough such as `1girl, 1boy` [has also been shown to be effective](https://civitai.com/models/17640/).
|
| 96 |
+
|
| 97 |
+
## Future work
|
| 98 |
+
|
| 99 |
+
Below is a list of ideas yet to be explored
|
| 100 |
+
* Synthetic datasets
|
| 101 |
+
* Regularatization
|
| 102 |
+
* Joint training instaed of sequential
|