gustproof
/

sd-models

Model card Files Files and versions

sd-models / posts /1to2.md

gustproof's picture

Create posts/1to2.md

3250b65 about 3 years ago

|

3.23 kB

	# 1to2: Training Multiple-Subject Models using only Single-Subject Data (Experimental)

	Updates will be mirrored on both Hugging Face and Civitai.

	## Introduction

	[It has been shown that multiple characters can be trained into the model](https://civitai.com/models/23476/the-idolmster-cinderella-girls-starlight-stage-style-90-characters). A harder task is to create a model that can generate multiple characters simultaneously without modifying the generation pipeline. This document describes a simple technique that has been shown to help generating multiple characters in the same image.

	## Method

	```
	Requirement: Sets of single-character images
	Steps:
	1. Train a multi-concept model using the original dataset
	2. Create an augmentation dataset of joined image pairs from the original dataset
	3. Train on the augmentation dataset
	```

	## Experiment


	### Setup

	3 characters from the game Cinderella Girls are chosen for the experiment. The base model is `anime-final-pruned`. It has been checked that the base model has minimal knowledge of the trained characters.

	For the captions of the joined images, the template format `CharLeft/CharRight/COMPOSITE, TagsLeft, TagsRight` is used.

	A LoRA (Hadamard product) is trained using the config file below:
	```
	[model_arguments]
	v2 = false
	v_parameterization = false
	pretrained_model_name_or_path = "Animefull-final-pruned.ckpt"

	[additional_network_arguments]
	no_metadata = false
	unet_lr = 0.0005
	text_encoder_lr = 0.0005
	network_module = "lycoris.kohya"
	network_dim = 8
	network_alpha = 1
	network_args = [ "conv_dim=0", "conv_alpha=16", "algo=loha",]
	network_train_unet_only = false
	network_train_text_encoder_only = false

	[optimizer_arguments]
	optimizer_type = "AdamW8bit"
	learning_rate = 0.0005
	max_grad_norm = 1.0
	lr_scheduler = "cosine"
	lr_warmup_steps = 0

	[dataset_arguments]
	debug_dataset = false
	# keep token 1

	[training_arguments]
	output_name = "cg3comp"
	save_precision = "fp16"
	save_every_n_epochs = 1
	train_batch_size = 2
	max_token_length = 225
	mem_eff_attn = false
	xformers = true
	max_train_epochs = 40
	max_data_loader_n_workers = 8
	persistent_data_loader_workers = true
	gradient_checkpointing = false
	gradient_accumulation_steps = 1
	mixed_precision = "fp16"
	clip_skip = 2
	lowram = true

	[sample_prompt_arguments]
	sample_every_n_epochs = 1
	sample_sampler = "k_euler_a"

	[saving_arguments]
	save_model_as = "safetensors"
	```
	For the second stage of training, the batch size was reduced to 2 while keeping other settings identical.
	The training took less than 2 hours on a T4 GPU.

	### Results
	(see preview images)

	## Limitations
	* This technique doubles the memory/compute requirement
	* Composites can still be generated despite negative prompting
	* Cloned characters seem to become the primary failure mode in place of blended characters

	## Related Works

	Models been trained on datasets based on anime shows have [demonstrated](https://civitai.com/models/21305/) multi-subject capabilty.
	Simply using concepts distant enough such as `1girl, 1boy` [has also been shown to be effective](https://civitai.com/models/17640/).

	## Future work

	Below is a list of ideas yet to be explored
	* Synthetic datasets
	* Regularatization
	* Joint training instaed of sequential