File size: 12,159 Bytes
bfbf793 2ad4d64 bfbf793 c9a0c22 bfbf793 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
---
license: other
license_name: fair-ai-public-license-1.0-sd
license_link: https://freedevproject.org/faipl-1.0-sd/
base_model:
- CabalResearch/NoobAI-RectifiedFlow-Experimental
library_name: diffusers
---
## Model Details
Experimental Conversion of our [NoobAI-RF](https://huggingface.co/CabalResearch/NoobAI-RectifiedFlow-Experimental) model to Flux2 VAE.
<u>We have observed the model's ability to adapt to the Flux2 VAE, and current trends suggest that significant improvements are possible with bigger training, which potentially would allow it to compete with bigger models.
By supporting us you could make it a reality.</u>
More info on supporting us: [click me](https://huggingface.co/CabalResearch/NoobAI-Flux2VAE-RectifiedFlow#potential-future)
### Model Description
This is a native training of SDXL Unet in combination with Flux2 VAE. Essentially we've adapted previously 4 channel model to work with 32 complex channels of Flux 2. No adapters or tricks, fully native.
Danbooru dataset of NoobAI has been utilized for this.


Due to limited compute we were not able to fully converge it, expect output on the level of very early anime models. We hope community will find this interesting enough to support us.
We observe steady convergence throughout whole training process, and believe that further training will result in a new standard for fast local anime generation.
Please take this model a proof of concept, not as a final product.
We have used Rectified Flow for training, with staged approach for adaptation of Flux2 VAE.
Most of the knowledge seem to be preserved, but is significantly weakened due to completely new latent space.
- **Developed by:** Cabal Research (Bluvoll, Anzhc)
- **Funded by:** Community, Bluvoll
- **License:** [fair-ai-public-license-1.0-sd](https://freedevproject.org/faipl-1.0-sd/)
- **Finetuned from model:** [NoobAI-RF](https://huggingface.co/CabalResearch/NoobAI-RectifiedFlow-Experimental)
## Bias and Limitations
Once again, we are limited in budget for this fundamental task. We have adapted enough to have it output somewhat acceptable images (Closer to a theoretical NoobAI 0.1's knowledge using Flux 2 VAE), but further progress would require large compute, as we are in territory where model is simply seeing the new level of details for the first time(as well as old level of details in a new way), and it is hard.
Most biases of official dataset will apply(Blue Archive, etc.).
Expect noise, fuzzy details, low performance in landscape aspect ratio, bad hands and generally issues with composition as a whole.
## Model Output Examples
One of the benefits we have achieved is color:

Due to being native flow model, it achieves strong colors, while not making them acidic, or otherwise unstable.
Generally, as already stated, expect at least some grain and fuzzyness in all gens, as we have not converged to the juicy details yet.









Area it is currently relatively nice in is scenery:








We also provide Aesthetic Tune, that improves details in general:


# Recommendations
### Inference
#### Comfy

(Workflow is available alongside model in repo)
We will provide a temporary ComfyUI fork, and hope it will be adapted in main repo:
**https://github.com/Anzhc/ComfyUI-sdxl-flux2vae-support**
Same as your normal inference, but with addition of SD3 sampling node, as this model is Flow-based.
Recommended Parameters:
**Sampler**: Euler, Euler A, DPM++ SDE, etc.
**Steps**: 20-28
**CFG**: 6-9
**Schedule**: Normal/Simple/SGM Uniform/Quadratic
**Positive Quality Tags**: `masterpiece, best quality`
**Negative Tags**: `worst quality, normal quality, bad anatomy`
#### A1111 WebUI
(All screenshots are repeating our RF release, as there is no difference in setup)
Recommended WebUI: [ReForge](https://github.com/Panchovix/stable-diffusion-webui-reForge) - has native support for Flow models, and we've PR'd our native support for Flux2vae-based SDXL modification.
**How to use in ReForge**:

(ignore Sigma max field at the top, this is not used in RF)
Support for RF in ReForge is being implemented through a built-in extension:


Set parameters to that, and you're good to go.
Flux2VAE does not currently have an appropriate high quality preview method, please use Approx Cheap option, which would allow you to see simple PCA projection(ReForge).
Recommended Parameters:
**Sampler**: Euler A Comfy RF, Euler, DPM++ SDE Comfy, etc. **ALL VARIANTS MUST BE RF OR COMFY, IF AVAILABLE. In ComfyUI routing is automatic, but not in the case of WebUI.**
**Steps**: 20-28
**CFG**: 6-9
**Schedule**: Normal/Simple/SGM Uniform
**Positive Quality Tags**: `masterpiece, best quality`
**Negative Tags**: `worst quality, normal quality, bad anatomy`
**ADETAILER FIX FOR RF**:
By default, Adetailer discards Advanced Model Sampling extension, which breaks RF. You need to add AMS to this part of settings:

Add: `advanced_model_sampling_script,advanced_model_sampling_script_backported` to there.
If that does not work, go into adetailer extension, find args.py, open it, replace _builtin_scripts like this:

Here is a copypaste for easy copy:
```
_builtin_script = (
"advanced_model_sampling_script",
"advanced_model_sampling_script_backported",
"hypertile_script",
"soft_inpainting",
)
```
Or use my fork of Adetailer - https://github.com/Anzhc/aadetailer-reforge
## Training
### Model Composition
(Relative to base it's trained from)
Unet: Same
CLIP L: Same, Frozen
CLIP G: Same, Frozen
VAE: [Flux2 VAE](https://huggingface.co/black-forest-labs/FLUX.2-dev/tree/main/vae)
### Training Details
(Main Stage Training)
**Samples seen**(unbatched steps): ~18.5 million samples seen
**Learning Rate**: 5e-5
**Effective Batch size**: 1472 (92 Batch Size * 2 Accumulation * 8 GPUs)
**Precision**: Full BF16
**Optimizer**: AdamW8bit with Kahan Summation
**Weight Decay**: 0.01
**Schedule**: Constant with warmup
**Timestep Sampling Strategy**: Logit-Normal -0.2 1.5 (sometimes referred to as Lognorm), Shift 2.5
**Text Encoders**: Frozen
**Keep Token**: False
**Tag Dropout**: 10%
**Uncond Dropout**: 10%
**Shuffle**: True
**VAE Conv Padding**: False
**VAE Shift**: 0.0760
**VAE Scale**: 0.6043
**Additional Features used**: Protected Tags, Cosine Optimal Transport.
#### Training Data
2 epochs of the original NoobAI dataset, including images up to October 2024, minus screencap data(was not shared).
### LoRA Training
Current stage is trainable, but it is hard to achieve accurate reproduction if subject/content is dependent on small details, as base model did not converge to them yet.
My current style training settings (Anzhc):
**Learning Rate**: tested up to **7.5e-4**
**Batch Size**: 144 (6 real * 24 accum), using SGA(Stochastic Gradient Accumulation) - without SGA I probably would lower accum to 4-8.
**Optimizer**: Adamw8bit with Kahan summation
**Schedule**: ReREX (Use REX for simplicity, or Cosine annealing)
**Precision**: Full BF16
**Weight Decay**: 0.02
**Timestep Sampling Strategy**: Logit-Normal(either 0.0 1.0, or -0.2 1.5), Shift 2.5
**Dim/Alpha/Conv/Alpha**: 24/24/24/24 (Lycoris/Locon)
**Text Encoders**: Frozen
**Optimal Transport**: True
**Expected Dataset Size**: 100 images (Can be even 10, but balance with repeats to roughly this target.)
**Epochs**: 50
### Hardware
Model was trained on cloud 8xH200 node.
### Software
Custom fork of [SD-Scripts](https://github.com/bluvoll/sd-scripts)(maintained by Bluvoll)
## Acknowledgements
### Special Thanks
**To a special supporter who singlehandidly sponsored whole run and preferred to stay anonymous**
---
# Support
If you wish to support our continuous effort of making waifus 0.2% better, you can do it here:
**https://ko-fi.com/bluvoll**
Crypto link pending.
# Potential future
**Expected Compute Needed**: We theorize that the model needs at the very least 20 epochs on full data, ideally 35 Epochs, each epoch was about 460 USD with the provider we use, at the very least each time we reach enough donations to train 2 epochs, we'll resume and train more. If we have enough donations we will update the dataset to most recent data.
Why not do this now? Caching with Flux 2 VAE takes a whooping 15 hours, and +-20TB since each latent is 2MB, which in itself costs 180 USD of compute time.
We are working on further improvements to pipeline and components at the moment of release of this model, and have plans to upgrade this arch more.
|