Text-to-Image
anime
girls
FA770 commited on
Commit
9de005d
·
verified ·
1 Parent(s): 9c44893

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -3
README.md CHANGED
@@ -1,3 +1,96 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - KBlueLeaf/danbooru2023-webp-4Mpixel
5
+ - KBlueLeaf/danbooru2023-metadata-database
6
+ base_model:
7
+ - lodestones/Chroma
8
+ - black-forest-labs/FLUX.1-schnell
9
+ pipeline_tag: text-to-image
10
+ tags:
11
+ - anime
12
+ - girls
13
+ ---
14
+
15
+ ![sample_image](./sample_images/1.webp)
16
+
17
+ # Model Information
18
+
19
+ **Note:** This model is a Schnell-based model, but it requires CFG scale of 5 or higher (not guidance scale) with 20 steps or more.
20
+ **At this time, this model cannot be generated in WebUI Forge. Please use it with ComfyUI.**
21
+
22
+ My English is terrible, so I use translation tools.
23
+
24
+ ## Description
25
+ kohada_flux6B is an experimental anime model designed to test whether the attention mechanism from `Chroma(v29)` can be transferred to `Flux.1 Schnell`. It also serves as a lightweight variant, with the number of parameters reduced to `6B`(double blocks 8/ single blocks 22).
26
+
27
+ ## Usage
28
+ - Resolution: Up to 1MP
29
+ - **(Distilled) Guidance Scale:** (Distilled) Guidance Scale: 0 (Does not work due to Schnell-based model)
30
+ - **CFG Scale:** 5 ~ 7 (recommend 6; scale 1 does not generate decent outputs)
31
+ - **Sampling Shift(ModelSamplingFlux:Max_shift):** 1.15 ~ 2 (recommend 1.75)
32
+ - Steps: 20 ~ 30
33
+ - sampler: Euler
34
+ - scheduler: Simple, Beta
35
+
36
+ ## Prompt Format
37
+ Same as the Chroma model.
38
+
39
+ ## Training
40
+
41
+ ### Dataset Preparation
42
+ I used [hakubooru](https://github.com/KohakuBlueleaf/HakuBooru)-based custom scripts.
43
+
44
+ - **Exclude Tags:** `traditional_media, photo_(medium), scan, animated, animated_gif, lowres, non-web_source, variant_set, tall image, duplicate, pixel-perfect_duplicate`
45
+ - **Minimum Post ID:** 1,000,000
46
+
47
+ ### Parameter transfer
48
+ I copied weights from `chroma-unlocked-v29` to `Flux.1 schnell` for the shared keys between `double blocks` and `single blocks`.
49
+ `distilled_guidance_layer`, `final_layer`, `img_in`, and `txt_in` from Chroma were not copied.
50
+
51
+ ### Blocks Reduction
52
+ I reduced the number of parameters by removing the later blocks from both the `double blocks` and `single blocks`. I also tried uniformly removing blocks across the entire network, but this led to severe degradation in the generated outputs. Therefore, I instead removed the blocks mainly responsible for high-frequency features and retrained the model to reassign those high-frequency functions to the remaining blocks.
53
+
54
+ - **Training Hardware:** A single RTX 4090
55
+ - **Method:** LoRA training and merging the results
56
+ - **Training Script:** [sd-scripts](https://github.com/kohya-ss/sd-scripts)
57
+ - **Basic Settings:**
58
+ ```powershell
59
+ accelerate launch --num_cpu_threads_per_process 4 flux_train_network.py ^
60
+ --network_module networks.lora_flux ^
61
+ --xformers --gradient_checkpointing --cache_latents --cache_latents_to_disk --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk ^
62
+ --max_data_loader_n_workers 1 --save_model_as "safetensors" ^
63
+ --mixed_precision "bf16" ^
64
+ --save_precision "bf16" ^
65
+ --min_bucket_reso 64 --max_bucket_reso 1536 --seed 1 ^
66
+ --save_every_n_epochs 1 --max_train_epochs 1 --keep_tokens_separator "|||" ^
67
+ --network_dim 16 --network_alpha 16 ^
68
+ --unet_lr 1e-4 --text_encoder_lr 0 --train_batch_size 1 --gradient_accumulation_steps 4 ^
69
+ --optimizer_type adamw8bit ^
70
+ --lr_scheduler="constant_with_warmup" --lr_warmup_steps 100 ^
71
+ --vae_batch_size 4 --cache_info ^
72
+ --guidance_scale 1 ^
73
+ --timestep_sampling shift --model_prediction_type raw --sigmoid_scale 1.2 --discrete_flow_shift 1 --apply_t5_attn_mask --network_train_unet_only
74
+ ```
75
+
76
+ 1. copied attention weights from chroma v29 to schnell
77
+ 2. remove double blocks 8 ~ 18 / single blocks 22 ~ 37
78
+ 3. 36,000images (res512 bs4 acc2 --timestep_sampling uniform) 3epochs
79
+ 4. 36,000images (res1024 bs1 acc8 --timestep_sampling uniform) 1epoch
80
+ 5. merged into model
81
+ 6. 36,000images (res1024 bs2 acc8 --timestep_sampling uniform --full_bf16 --fp8_base_unet) 1epoch
82
+ 7. merged into model
83
+ 8. 36,000images (res1024 bs1 acc4 --timestep_sampling shift --model_prediction_type raw --sigmoid_scale 1.2 --discrete_flow_shift 1) 1epoch
84
+ 9. merged into model
85
+
86
+ ## Resources (License)
87
+ - **Chroma (Apache2.0)**
88
+ - **FLUX.1-schnell (Apache2.0)**
89
+ - **danbooru2023-webp-4Mpixel (MIT)**
90
+ - **danbooru2023-metadata-database (MIT)**
91
+
92
+ ## Acknowledgements
93
+ - **lodestones** Thanks for extensively training and making the Chroma open-source model publicly available.
94
+ - **black-forest-labs:** Thanks for publishing a great open source model.
95
+ - **kohya-ss:** Thanks for publishing the essential training scripts and for the quick updates.
96
+ - **Kohaku-Blueleaf:** Thanks for the extensive publication of the scripts for the dataset and the various training conditions.