Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,96 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- KBlueLeaf/danbooru2023-webp-4Mpixel
|
| 5 |
+
- KBlueLeaf/danbooru2023-metadata-database
|
| 6 |
+
base_model:
|
| 7 |
+
- lodestones/Chroma
|
| 8 |
+
- black-forest-labs/FLUX.1-schnell
|
| 9 |
+
pipeline_tag: text-to-image
|
| 10 |
+
tags:
|
| 11 |
+
- anime
|
| 12 |
+
- girls
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+

|
| 16 |
+
|
| 17 |
+
# Model Information
|
| 18 |
+
|
| 19 |
+
**Note:** This model is a Schnell-based model, but it requires CFG scale of 5 or higher (not guidance scale) with 20 steps or more.
|
| 20 |
+
**At this time, this model cannot be generated in WebUI Forge. Please use it with ComfyUI.**
|
| 21 |
+
|
| 22 |
+
My English is terrible, so I use translation tools.
|
| 23 |
+
|
| 24 |
+
## Description
|
| 25 |
+
kohada_flux6B is an experimental anime model designed to test whether the attention mechanism from `Chroma(v29)` can be transferred to `Flux.1 Schnell`. It also serves as a lightweight variant, with the number of parameters reduced to `6B`(double blocks 8/ single blocks 22).
|
| 26 |
+
|
| 27 |
+
## Usage
|
| 28 |
+
- Resolution: Up to 1MP
|
| 29 |
+
- **(Distilled) Guidance Scale:** (Distilled) Guidance Scale: 0 (Does not work due to Schnell-based model)
|
| 30 |
+
- **CFG Scale:** 5 ~ 7 (recommend 6; scale 1 does not generate decent outputs)
|
| 31 |
+
- **Sampling Shift(ModelSamplingFlux:Max_shift):** 1.15 ~ 2 (recommend 1.75)
|
| 32 |
+
- Steps: 20 ~ 30
|
| 33 |
+
- sampler: Euler
|
| 34 |
+
- scheduler: Simple, Beta
|
| 35 |
+
|
| 36 |
+
## Prompt Format
|
| 37 |
+
Same as the Chroma model.
|
| 38 |
+
|
| 39 |
+
## Training
|
| 40 |
+
|
| 41 |
+
### Dataset Preparation
|
| 42 |
+
I used [hakubooru](https://github.com/KohakuBlueleaf/HakuBooru)-based custom scripts.
|
| 43 |
+
|
| 44 |
+
- **Exclude Tags:** `traditional_media, photo_(medium), scan, animated, animated_gif, lowres, non-web_source, variant_set, tall image, duplicate, pixel-perfect_duplicate`
|
| 45 |
+
- **Minimum Post ID:** 1,000,000
|
| 46 |
+
|
| 47 |
+
### Parameter transfer
|
| 48 |
+
I copied weights from `chroma-unlocked-v29` to `Flux.1 schnell` for the shared keys between `double blocks` and `single blocks`.
|
| 49 |
+
`distilled_guidance_layer`, `final_layer`, `img_in`, and `txt_in` from Chroma were not copied.
|
| 50 |
+
|
| 51 |
+
### Blocks Reduction
|
| 52 |
+
I reduced the number of parameters by removing the later blocks from both the `double blocks` and `single blocks`. I also tried uniformly removing blocks across the entire network, but this led to severe degradation in the generated outputs. Therefore, I instead removed the blocks mainly responsible for high-frequency features and retrained the model to reassign those high-frequency functions to the remaining blocks.
|
| 53 |
+
|
| 54 |
+
- **Training Hardware:** A single RTX 4090
|
| 55 |
+
- **Method:** LoRA training and merging the results
|
| 56 |
+
- **Training Script:** [sd-scripts](https://github.com/kohya-ss/sd-scripts)
|
| 57 |
+
- **Basic Settings:**
|
| 58 |
+
```powershell
|
| 59 |
+
accelerate launch --num_cpu_threads_per_process 4 flux_train_network.py ^
|
| 60 |
+
--network_module networks.lora_flux ^
|
| 61 |
+
--xformers --gradient_checkpointing --cache_latents --cache_latents_to_disk --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk ^
|
| 62 |
+
--max_data_loader_n_workers 1 --save_model_as "safetensors" ^
|
| 63 |
+
--mixed_precision "bf16" ^
|
| 64 |
+
--save_precision "bf16" ^
|
| 65 |
+
--min_bucket_reso 64 --max_bucket_reso 1536 --seed 1 ^
|
| 66 |
+
--save_every_n_epochs 1 --max_train_epochs 1 --keep_tokens_separator "|||" ^
|
| 67 |
+
--network_dim 16 --network_alpha 16 ^
|
| 68 |
+
--unet_lr 1e-4 --text_encoder_lr 0 --train_batch_size 1 --gradient_accumulation_steps 4 ^
|
| 69 |
+
--optimizer_type adamw8bit ^
|
| 70 |
+
--lr_scheduler="constant_with_warmup" --lr_warmup_steps 100 ^
|
| 71 |
+
--vae_batch_size 4 --cache_info ^
|
| 72 |
+
--guidance_scale 1 ^
|
| 73 |
+
--timestep_sampling shift --model_prediction_type raw --sigmoid_scale 1.2 --discrete_flow_shift 1 --apply_t5_attn_mask --network_train_unet_only
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
1. copied attention weights from chroma v29 to schnell
|
| 77 |
+
2. remove double blocks 8 ~ 18 / single blocks 22 ~ 37
|
| 78 |
+
3. 36,000images (res512 bs4 acc2 --timestep_sampling uniform) 3epochs
|
| 79 |
+
4. 36,000images (res1024 bs1 acc8 --timestep_sampling uniform) 1epoch
|
| 80 |
+
5. merged into model
|
| 81 |
+
6. 36,000images (res1024 bs2 acc8 --timestep_sampling uniform --full_bf16 --fp8_base_unet) 1epoch
|
| 82 |
+
7. merged into model
|
| 83 |
+
8. 36,000images (res1024 bs1 acc4 --timestep_sampling shift --model_prediction_type raw --sigmoid_scale 1.2 --discrete_flow_shift 1) 1epoch
|
| 84 |
+
9. merged into model
|
| 85 |
+
|
| 86 |
+
## Resources (License)
|
| 87 |
+
- **Chroma (Apache2.0)**
|
| 88 |
+
- **FLUX.1-schnell (Apache2.0)**
|
| 89 |
+
- **danbooru2023-webp-4Mpixel (MIT)**
|
| 90 |
+
- **danbooru2023-metadata-database (MIT)**
|
| 91 |
+
|
| 92 |
+
## Acknowledgements
|
| 93 |
+
- **lodestones** Thanks for extensively training and making the Chroma open-source model publicly available.
|
| 94 |
+
- **black-forest-labs:** Thanks for publishing a great open source model.
|
| 95 |
+
- **kohya-ss:** Thanks for publishing the essential training scripts and for the quick updates.
|
| 96 |
+
- **Kohaku-Blueleaf:** Thanks for the extensive publication of the scripts for the dataset and the various training conditions.
|