Diffusers
File size: 12,159 Bytes
bfbf793
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2ad4d64
 
 
 
bfbf793
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c9a0c22
 
 
bfbf793
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
---
license: other
license_name: fair-ai-public-license-1.0-sd
license_link: https://freedevproject.org/faipl-1.0-sd/
base_model:
- CabalResearch/NoobAI-RectifiedFlow-Experimental
library_name: diffusers
---
## Model Details

Experimental Conversion of our [NoobAI-RF](https://huggingface.co/CabalResearch/NoobAI-RectifiedFlow-Experimental) model to Flux2 VAE.

<u>We have observed the model's ability to adapt to the Flux2 VAE, and current trends suggest that significant improvements are possible with bigger training, which potentially would allow it to compete with bigger models.  
By supporting us you could make it a reality.</u>

More info on supporting us: [click me](https://huggingface.co/CabalResearch/NoobAI-Flux2VAE-RectifiedFlow#potential-future) 

### Model Description

This is a native training of SDXL Unet in combination with Flux2 VAE. Essentially we've adapted previously 4 channel model to work with 32 complex channels of Flux 2. No adapters or tricks, fully native.  
Danbooru dataset of NoobAI has been utilized for this.



![training-flux2vae-sdxl-progress](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/_cObCxad4c07LBrqiMEwp.jpeg)

![training-flux2vae-sdxl-progress-crop](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/0syF9FpKvzRmBsTCz5Wsd.png)

Due to limited compute we were not able to fully converge it, expect output on the level of very early anime models. We hope community will find this interesting enough to support us.
We observe steady convergence throughout whole training process, and believe that further training will result in a new standard for fast local anime generation.  

Please take this model a proof of concept, not as a final product.  

We have used Rectified Flow for training, with staged approach for adaptation of Flux2 VAE.  
Most of the knowledge seem to be preserved, but is significantly weakened due to completely new latent space.


- **Developed by:** Cabal Research (Bluvoll, Anzhc)
- **Funded by:** Community, Bluvoll
- **License:** [fair-ai-public-license-1.0-sd](https://freedevproject.org/faipl-1.0-sd/)
- **Finetuned from model:** [NoobAI-RF](https://huggingface.co/CabalResearch/NoobAI-RectifiedFlow-Experimental)


## Bias and Limitations

Once again, we are limited in budget for this fundamental task. We have adapted enough to have it output somewhat acceptable images (Closer to a theoretical NoobAI 0.1's knowledge using Flux 2 VAE), but further progress would require large compute, as we are in territory where model is simply seeing the new level of details for the first time(as well as old level of details in a new way), and it is hard.

Most biases of official dataset will apply(Blue Archive, etc.).

Expect noise, fuzzy details, low performance in landscape aspect ratio, bad hands and generally issues with composition as a whole.


## Model Output Examples

One of the benefits we have achieved is color:

![00439-3595667584-small](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/Q2b7tx4gE0UAE-YOKpxfX.png)
Due to being native flow model, it achieves strong colors, while not making them acidic, or otherwise unstable.  

Generally, as already stated, expect at least some grain and fuzzyness in all gens, as we have not converged to the juicy details yet.
![00448-1663643003](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/mCRQ-_GXSNsdoaPT5-_a7.png)

![00443-1298253947](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/6U1jBuPztozBTWvr9okgm.png)

![00445-1953653215](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/Eqj6oJcENpoEynKWjLZLo.png)

![00442-1298253945](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/eS-oPhXzyfWTIm3ioorfd.png)

![00440-2105742435](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/y5AoeomwLgTZJmqHF5RZz.png)

![00446-1068409742](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/m6aKB22LRvMUQKouxHWj4.png)

![channel_castation](https://cdn-uploads.huggingface.co/production/uploads/634cf025fe861cc73a2e7dd0/dUNpIpug6HB-2MHSow3Dd.png)

![moshimoshibe_1024x1536](https://cdn-uploads.huggingface.co/production/uploads/634cf025fe861cc73a2e7dd0/U8QSOog8W6xL9KdFj2VFH.png)

![00444-3136499438](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/7t5IfAY_KDWWaAe13zEfQ.png)

Area it is currently relatively nice in is scenery:

![00453-1064929183](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/kDJSqS8YGRiFCB4GiuXbW.png)

![00452-1179534666](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/A4dDCqKjvSn0zxSvF-eKt.png)

![00457-4116373402](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/Lk223iTutucibmKyanaA4.png)

![00458-2531505134](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/dBrFFWq8OspcWKbh5lxpI.png)

![00461-286542451](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/YnhE0aqs54YQT22m6z8n9.png)

![00456-2654901363](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/4by6eRO649-tNknjarYE6.png)

![00460-286542448](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/YjJ66eZKVI0ZQPuQLQ075.png)

![00463-286542453](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/iA52BagsAAB9FVNItVfnQ.png)

We also provide Aesthetic Tune, that improves details in general:  

![00464-286542453](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/HcfQWfUcuSZYzHWPOPArB.png)

![00468-2602842402](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/dQtpye4HvisGbCIOP1QdA.png)

# Recommendations

### Inference

#### Comfy

![Screenshot 2025-12-19 112806](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/iyZaU9E5CqZkQoyfQRIQy.png)

(Workflow is available alongside model in repo)
We will provide a temporary ComfyUI fork, and hope it will be adapted in main repo:  
**https://github.com/Anzhc/ComfyUI-sdxl-flux2vae-support**

Same as your normal inference, but with addition of SD3 sampling node, as this model is Flow-based.

Recommended Parameters:  
**Sampler**: Euler, Euler A, DPM++ SDE, etc.  
**Steps**: 20-28  
**CFG**: 6-9  
**Schedule**: Normal/Simple/SGM Uniform/Quadratic  
**Positive Quality Tags**: `masterpiece, best quality`  
**Negative Tags**: `worst quality, normal quality, bad anatomy`  


#### A1111 WebUI

(All screenshots are repeating our RF release, as there is no difference in setup)


Recommended WebUI: [ReForge](https://github.com/Panchovix/stable-diffusion-webui-reForge) - has native support for Flow models, and we've PR'd our native support for Flux2vae-based SDXL modification.


**How to use in ReForge**:

![изображение](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/UV5Yp66H7YlccdQqborPf.png)
(ignore Sigma max field at the top, this is not used in RF)

Support for RF in ReForge is being implemented through a built-in extension:

![изображение](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/LpMF0lmC96X001Au9fFU_.png)


![imagen](https://cdn-uploads.huggingface.co/production/uploads/634cf025fe861cc73a2e7dd0/siEBfX41E6D_DUEqaXMgs.png)

Set parameters to that, and you're good to go.

Flux2VAE does not currently have an appropriate high quality preview method, please use Approx Cheap option, which would allow you to see simple PCA projection(ReForge).

Recommended Parameters:  
**Sampler**: Euler A Comfy RF, Euler, DPM++ SDE Comfy, etc. **ALL VARIANTS MUST BE RF OR COMFY, IF AVAILABLE. In ComfyUI routing is automatic, but not in the case of WebUI.**  
**Steps**: 20-28  
**CFG**: 6-9  
**Schedule**: Normal/Simple/SGM Uniform  
**Positive Quality Tags**: `masterpiece, best quality`  
**Negative Tags**: `worst quality, normal quality, bad anatomy`  

**ADETAILER FIX FOR RF**:
By default, Adetailer discards Advanced Model Sampling extension, which breaks RF. You need to add AMS to this part of settings:

![изображение](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/RQMtfm5Xi3V7oNsqXoZJN.png)

Add: `advanced_model_sampling_script,advanced_model_sampling_script_backported` to there.

If that does not work, go into adetailer extension, find args.py, open it, replace _builtin_scripts like this:

![изображение](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/rmnS-i_kciJzTZmeR-mGP.png)

Here is a copypaste for easy copy:
```
_builtin_script = (
    "advanced_model_sampling_script",
    "advanced_model_sampling_script_backported",
    "hypertile_script",
    "soft_inpainting",
)
```

Or use my fork of Adetailer - https://github.com/Anzhc/aadetailer-reforge

## Training

### Model Composition
(Relative to base it's trained from)

Unet: Same
CLIP L: Same, Frozen
CLIP G: Same, Frozen
VAE: [Flux2 VAE](https://huggingface.co/black-forest-labs/FLUX.2-dev/tree/main/vae)


### Training Details
(Main Stage Training)

**Samples seen**(unbatched steps): ~18.5 million samples seen  
**Learning Rate**: 5e-5  
**Effective Batch size**: 1472 (92 Batch Size * 2 Accumulation * 8 GPUs)  
**Precision**: Full BF16  
**Optimizer**: AdamW8bit with Kahan Summation  
**Weight Decay**: 0.01  
**Schedule**: Constant with warmup  
**Timestep Sampling Strategy**: Logit-Normal -0.2 1.5 (sometimes referred to as Lognorm), Shift 2.5  
**Text Encoders**: Frozen  
**Keep Token**: False  
**Tag Dropout**: 10%  
**Uncond Dropout**: 10%  
**Shuffle**: True  

**VAE Conv Padding**: False  
**VAE Shift**: 0.0760  
**VAE Scale**: 0.6043

**Additional Features used**: Protected Tags, Cosine Optimal Transport.

#### Training Data

2 epochs of the original NoobAI dataset, including images up to October 2024, minus screencap data(was not shared).


### LoRA Training

Current stage is trainable, but it is hard to achieve accurate reproduction if subject/content is dependent on small details, as base model did not converge to them yet.
My current style training settings (Anzhc):

**Learning Rate**: tested up to **7.5e-4**  
**Batch Size**: 144 (6 real * 24 accum), using SGA(Stochastic Gradient Accumulation) - without SGA I probably would lower accum to 4-8.  
**Optimizer**: Adamw8bit with Kahan summation  
**Schedule**: ReREX (Use REX for simplicity, or Cosine annealing)  
**Precision**: Full BF16  
**Weight Decay**: 0.02  
**Timestep Sampling Strategy**: Logit-Normal(either 0.0 1.0, or -0.2 1.5), Shift 2.5  

**Dim/Alpha/Conv/Alpha**: 24/24/24/24 (Lycoris/Locon)  

**Text Encoders**: Frozen  

**Optimal Transport**: True  

**Expected Dataset Size**: 100 images (Can be even 10, but balance with repeats to roughly this target.)  
**Epochs**: 50  

### Hardware

Model was trained on cloud 8xH200 node.

### Software

Custom fork of [SD-Scripts](https://github.com/bluvoll/sd-scripts)(maintained by Bluvoll)

## Acknowledgements

### Special Thanks

**To a special supporter who singlehandidly sponsored whole run and preferred to stay anonymous**

---

# Support
If you wish to support our continuous effort of making waifus 0.2% better, you can do it here:  

**https://ko-fi.com/bluvoll**

Crypto link pending.

# Potential future

**Expected Compute Needed**: We theorize that the model needs at the very least 20 epochs on full data, ideally 35 Epochs, each epoch was about 460 USD with the provider we use, at the very least each time we reach enough donations to train 2 epochs, we'll resume and train more. If we have enough donations we will update the dataset to most recent data.  
Why not do this now? Caching with Flux 2 VAE takes a whooping 15 hours, and +-20TB since each latent is 2MB, which in itself costs 180 USD of compute time.

We are working on further improvements to pipeline and components at the moment of release of this model, and have plans to upgrade this arch more.