FA770
/

Kohada_flux_v003A

+---
+license: apache-2.0
+datasets:
+- KBlueLeaf/danbooru2023-webp-4Mpixel
+- KBlueLeaf/danbooru2023-metadata-database
+base_model:
+- lodestones/Chroma
+- black-forest-labs/FLUX.1-schnell
+pipeline_tag: text-to-image
+tags:
+- anime
+- girls
+---
+![sample_image](./sample_images/1.webp)
+# Model Information
+**Note:** This model is a Schnell-based model, but it requires CFG scale of 5 or higher (not guidance scale) with 20 steps or more.
+**At this time, this model cannot be generated in WebUI Forge. Please use it with ComfyUI.**
+My English is terrible, so I use translation tools.
+## Description
+kohada_flux6B is an experimental anime model designed to test whether the attention mechanism from `Chroma(v29)` can be transferred to `Flux.1 Schnell`. It also serves as a lightweight variant, with the number of parameters reduced to `6B`(double blocks 8/ single blocks 22).
+## Usage
+- Resolution: Up to 1MP
+- **(Distilled) Guidance Scale:** (Distilled) Guidance Scale: 0 (Does not work due to Schnell-based model)
+- **CFG Scale:** 5 ~ 7 (recommend 6; scale 1 does not generate decent outputs)
+- **Sampling Shift(ModelSamplingFlux:Max_shift):** 1.15 ~ 2 (recommend 1.75)
+- Steps: 20 ~ 30
+- sampler: Euler
+- scheduler: Simple, Beta
+## Prompt Format
+Same as the Chroma model.
+## Training
+### Dataset Preparation
+I used [hakubooru](https://github.com/KohakuBlueleaf/HakuBooru)-based custom scripts.
+- **Exclude Tags:** `traditional_media, photo_(medium), scan, animated, animated_gif, lowres, non-web_source, variant_set, tall image, duplicate, pixel-perfect_duplicate`
+- **Minimum Post ID:** 1,000,000
+### Parameter transfer
+I copied weights from `chroma-unlocked-v29` to `Flux.1 schnell` for the shared keys between `double blocks` and `single blocks`.
+`distilled_guidance_layer`, `final_layer`, `img_in`, and `txt_in` from Chroma were not copied.
+### Blocks Reduction
+I reduced the number of parameters by removing the later blocks from both the `double blocks` and `single blocks`. I also tried uniformly removing blocks across the entire network, but this led to severe degradation in the generated outputs. Therefore, I instead removed the blocks mainly responsible for high-frequency features and retrained the model to reassign those high-frequency functions to the remaining blocks.
+- **Training Hardware:** A single RTX 4090
+- **Method:** LoRA training and merging the results
+- **Training Script:** [sd-scripts](https://github.com/kohya-ss/sd-scripts)
+- **Basic Settings:**
+```powershell
+accelerate launch --num_cpu_threads_per_process 4 flux_train_network.py ^
+--network_module networks.lora_flux ^
+--xformers --gradient_checkpointing --cache_latents --cache_latents_to_disk --cache_text_encoder_outputs  --cache_text_encoder_outputs_to_disk ^
+--max_data_loader_n_workers 1 --save_model_as "safetensors" ^
+--mixed_precision "bf16" ^
+--save_precision "bf16" ^
+--min_bucket_reso 64 --max_bucket_reso 1536 --seed 1 ^
+--save_every_n_epochs 1 --max_train_epochs 1 --keep_tokens_separator "|||" ^
+--network_dim 16 --network_alpha 16 ^
+--unet_lr 1e-4 --text_encoder_lr 0 --train_batch_size 1 --gradient_accumulation_steps 4 ^
+--optimizer_type adamw8bit ^
+--lr_scheduler="constant_with_warmup" --lr_warmup_steps 100 ^
+--vae_batch_size 4 --cache_info ^
+--guidance_scale 1 ^
+--timestep_sampling shift --model_prediction_type raw --sigmoid_scale 1.2 --discrete_flow_shift 1 --apply_t5_attn_mask --network_train_unet_only
+```
+1. copied attention weights from chroma v29 to schnell
+2. remove double blocks 8 ~ 18 / single blocks 22 ~ 37
+3. 36,000images (res512 bs4 acc2 --timestep_sampling uniform) 3epochs
+4. 36,000images (res1024 bs1 acc8 --timestep_sampling uniform) 1epoch
+5. merged into model
+6. 36,000images (res1024 bs2 acc8 --timestep_sampling uniform --full_bf16 --fp8_base_unet) 1epoch
+7. merged into model
+8. 36,000images (res1024 bs1 acc4 --timestep_sampling shift --model_prediction_type raw --sigmoid_scale 1.2 --discrete_flow_shift 1) 1epoch
+9. merged into model
+## Resources (License)
+- **Chroma (Apache2.0)**
+- **FLUX.1-schnell (Apache2.0)**
+- **danbooru2023-webp-4Mpixel (MIT)**
+- **danbooru2023-metadata-database (MIT)**
+## Acknowledgements
+- **lodestones** Thanks for extensively training and making the Chroma open-source model publicly available.
+- **black-forest-labs:** Thanks for publishing a great open source model.
+- **kohya-ss:** Thanks for publishing the essential training scripts and for the quick updates.
+- **Kohaku-Blueleaf:** Thanks for the extensive publication of the scripts for the dataset and the various training conditions.