File size: 12,159 Bytes

---
license: other
license_name: fair-ai-public-license-1.0-sd
license_link: https://freedevproject.org/faipl-1.0-sd/
base_model:
- CabalResearch/NoobAI-RectifiedFlow-Experimental
library_name: diffusers
---
## Model Details

Experimental Conversion of our [NoobAI-RF](https://huggingface.co/CabalResearch/NoobAI-RectifiedFlow-Experimental) model to Flux2 VAE.

<u>We have observed the model's ability to adapt to the Flux2 VAE, and current trends suggest that significant improvements are possible with bigger training, which potentially would allow it to compete with bigger models.  
By supporting us you could make it a reality.</u>

More info on supporting us: [click me](https://huggingface.co/CabalResearch/NoobAI-Flux2VAE-RectifiedFlow#potential-future) 

### Model Description

This is a native training of SDXL Unet in combination with Flux2 VAE. Essentially we've adapted previously 4 channel model to work with 32 complex channels of Flux 2. No adapters or tricks, fully native.  
Danbooru dataset of NoobAI has been utilized for this.



![training-flux2vae-sdxl-progress](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/_cObCxad4c07LBrqiMEwp.jpeg)

![training-flux2vae-sdxl-progress-crop](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/0syF9FpKvzRmBsTCz5Wsd.png)

Due to limited compute we were not able to fully converge it, expect output on the level of very early anime models. We hope community will find this interesting enough to support us.
We observe steady convergence throughout whole training process, and believe that further training will result in a new standard for fast local anime generation.  

Please take this model a proof of concept, not as a final product.  

We have used Rectified Flow for training, with staged approach for adaptation of Flux2 VAE.  
Most of the knowledge seem to be preserved, but is significantly weakened due to completely new latent space.


- **Developed by:** Cabal Research (Bluvoll, Anzhc)
- **Funded by:** Community, Bluvoll
- **License:** [fair-ai-public-license-1.0-sd](https://freedevproject.org/faipl-1.0-sd/)
- **Finetuned from model:** [NoobAI-RF](https://huggingface.co/CabalResearch/NoobAI-RectifiedFlow-Experimental)


## Bias and Limitations

Once again, we are limited in budget for this fundamental task. We have adapted enough to have it output somewhat acceptable images (Closer to a theoretical NoobAI 0.1's knowledge using Flux 2 VAE), but further progress would require large compute, as we are in territory where model is simply seeing the new level of details for the first time(as well as old level of details in a new way), and it is hard.

Most biases of official dataset will apply(Blue Archive, etc.).

Expect noise, fuzzy details, low performance in landscape aspect ratio, bad hands and generally issues with composition as a whole.


## Model Output Examples

One of the benefits we have achieved is color:

![00439-3595667584-small](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/Q2b7tx4gE0UAE-YOKpxfX.png)
Due to being native flow model, it achieves strong colors, while not making them acidic, or otherwise unstable.  

Generally, as already stated, expect at least some grain and fuzzyness in all gens, as we have not converged to the juicy details yet.
![00448-1663643003](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/mCRQ-_GXSNsdoaPT5-_a7.png)

![00443-1298253947](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/6U1jBuPztozBTWvr9okgm.png)

![00445-1953653215](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/Eqj6oJcENpoEynKWjLZLo.png)

![00442-1298253945](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/eS-oPhXzyfWTIm3ioorfd.png)

![00440-2105742435](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/y5AoeomwLgTZJmqHF5RZz.png)

![00446-1068409742](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/m6aKB22LRvMUQKouxHWj4.png)

![channel_castation](https://cdn-uploads.huggingface.co/production/uploads/634cf025fe861cc73a2e7dd0/dUNpIpug6HB-2MHSow3Dd.png)

![moshimoshibe_1024x1536](https://cdn-uploads.huggingface.co/production/uploads/634cf025fe861cc73a2e7dd0/U8QSOog8W6xL9KdFj2VFH.png)

![00444-3136499438](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/7t5IfAY_KDWWaAe13zEfQ.png)

Area it is currently relatively nice in is scenery:

![00453-1064929183](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/kDJSqS8YGRiFCB4GiuXbW.png)

![00452-1179534666](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/A4dDCqKjvSn0zxSvF-eKt.png)

![00457-4116373402](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/Lk223iTutucibmKyanaA4.png)

![00458-2531505134](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/dBrFFWq8OspcWKbh5lxpI.png)

![00461-286542451](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/YnhE0aqs54YQT22m6z8n9.png)

![00456-2654901363](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/4by6eRO649-tNknjarYE6.png)

![00460-286542448](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/YjJ66eZKVI0ZQPuQLQ075.png)

![00463-286542453](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/iA52BagsAAB9FVNItVfnQ.png)

We also provide Aesthetic Tune, that improves details in general:  

![00464-286542453](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/HcfQWfUcuSZYzHWPOPArB.png)

![00468-2602842402](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/dQtpye4HvisGbCIOP1QdA.png)

# Recommendations

### Inference

#### Comfy

![Screenshot 2025-12-19 112806](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/iyZaU9E5CqZkQoyfQRIQy.png)

(Workflow is available alongside model in repo)
We will provide a temporary ComfyUI fork, and hope it will be adapted in main repo:  
**https://github.com/Anzhc/ComfyUI-sdxl-flux2vae-support**

Same as your normal inference, but with addition of SD3 sampling node, as this model is Flow-based.

Recommended Parameters:  
**Sampler**: Euler, Euler A, DPM++ SDE, etc.  
**Steps**: 20-28  
**CFG**: 6-9  
**Schedule**: Normal/Simple/SGM Uniform/Quadratic  
**Positive Quality Tags**: `masterpiece, best quality`  
**Negative Tags**: `worst quality, normal quality, bad anatomy`  


#### A1111 WebUI

(All screenshots are repeating our RF release, as there is no difference in setup)


Recommended WebUI: [ReForge](https://github.com/Panchovix/stable-diffusion-webui-reForge) - has native support for Flow models, and we've PR'd our native support for Flux2vae-based SDXL modification.


**How to use in ReForge**:

![изображение](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/UV5Yp66H7YlccdQqborPf.png)
(ignore Sigma max field at the top, this is not used in RF)

Support for RF in ReForge is being implemented through a built-in extension:

![изображение](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/LpMF0lmC96X001Au9fFU_.png)


![imagen](https://cdn-uploads.huggingface.co/production/uploads/634cf025fe861cc73a2e7dd0/siEBfX41E6D_DUEqaXMgs.png)

Set parameters to that, and you're good to go.

Flux2VAE does not currently have an appropriate high quality preview method, please use Approx Cheap option, which would allow you to see simple PCA projection(ReForge).

Recommended Parameters:  
**Sampler**: Euler A Comfy RF, Euler, DPM++ SDE Comfy, etc. **ALL VARIANTS MUST BE RF OR COMFY, IF AVAILABLE. In ComfyUI routing is automatic, but not in the case of WebUI.**  
**Steps**: 20-28  
**CFG**: 6-9  
**Schedule**: Normal/Simple/SGM Uniform  
**Positive Quality Tags**: `masterpiece, best quality`  
**Negative Tags**: `worst quality, normal quality, bad anatomy`  

**ADETAILER FIX FOR RF**:
By default, Adetailer discards Advanced Model Sampling extension, which breaks RF. You need to add AMS to this part of settings:

![изображение](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/RQMtfm5Xi3V7oNsqXoZJN.png)

Add: `advanced_model_sampling_script,advanced_model_sampling_script_backported` to there.

If that does not work, go into adetailer extension, find args.py, open it, replace _builtin_scripts like this:

![изображение](https://cdn-uploads.huggingface.co/production/uploads/633b43d29fe04b13f46c8988/rmnS-i_kciJzTZmeR-mGP.png)

Here is a copypaste for easy copy:
```
_builtin_script = (
    "advanced_model_sampling_script",
    "advanced_model_sampling_script_backported",
    "hypertile_script",
    "soft_inpainting",
)
```

Or use my fork of Adetailer - https://github.com/Anzhc/aadetailer-reforge

## Training

### Model Composition
(Relative to base it's trained from)

Unet: Same
CLIP L: Same, Frozen
CLIP G: Same, Frozen
VAE: [Flux2 VAE](https://huggingface.co/black-forest-labs/FLUX.2-dev/tree/main/vae)


### Training Details
(Main Stage Training)

**Samples seen**(unbatched steps): ~18.5 million samples seen  
**Learning Rate**: 5e-5  
**Effective Batch size**: 1472 (92 Batch Size * 2 Accumulation * 8 GPUs)  
**Precision**: Full BF16  
**Optimizer**: AdamW8bit with Kahan Summation  
**Weight Decay**: 0.01  
**Schedule**: Constant with warmup  
**Timestep Sampling Strategy**: Logit-Normal -0.2 1.5 (sometimes referred to as Lognorm), Shift 2.5  
**Text Encoders**: Frozen  
**Keep Token**: False  
**Tag Dropout**: 10%  
**Uncond Dropout**: 10%  
**Shuffle**: True  

**VAE Conv Padding**: False  
**VAE Shift**: 0.0760  
**VAE Scale**: 0.6043

**Additional Features used**: Protected Tags, Cosine Optimal Transport.

#### Training Data

2 epochs of the original NoobAI dataset, including images up to October 2024, minus screencap data(was not shared).


### LoRA Training

Current stage is trainable, but it is hard to achieve accurate reproduction if subject/content is dependent on small details, as base model did not converge to them yet.
My current style training settings (Anzhc):

**Learning Rate**: tested up to **7.5e-4**  
**Batch Size**: 144 (6 real * 24 accum), using SGA(Stochastic Gradient Accumulation) - without SGA I probably would lower accum to 4-8.  
**Optimizer**: Adamw8bit with Kahan summation  
**Schedule**: ReREX (Use REX for simplicity, or Cosine annealing)  
**Precision**: Full BF16  
**Weight Decay**: 0.02  
**Timestep Sampling Strategy**: Logit-Normal(either 0.0 1.0, or -0.2 1.5), Shift 2.5  

**Dim/Alpha/Conv/Alpha**: 24/24/24/24 (Lycoris/Locon)  

**Text Encoders**: Frozen  

**Optimal Transport**: True  

**Expected Dataset Size**: 100 images (Can be even 10, but balance with repeats to roughly this target.)  
**Epochs**: 50  

### Hardware

Model was trained on cloud 8xH200 node.

### Software

Custom fork of [SD-Scripts](https://github.com/bluvoll/sd-scripts)(maintained by Bluvoll)

## Acknowledgements

### Special Thanks

**To a special supporter who singlehandidly sponsored whole run and preferred to stay anonymous**

---

# Support
If you wish to support our continuous effort of making waifus 0.2% better, you can do it here:  

**https://ko-fi.com/bluvoll**

Crypto link pending.

# Potential future

**Expected Compute Needed**: We theorize that the model needs at the very least 20 epochs on full data, ideally 35 Epochs, each epoch was about 460 USD with the provider we use, at the very least each time we reach enough donations to train 2 epochs, we'll resume and train more. If we have enough donations we will update the dataset to most recent data.  
Why not do this now? Caching with Flux 2 VAE takes a whooping 15 hours, and +-20TB since each latent is 2MB, which in itself costs 180 USD of compute time.

We are working on further improvements to pipeline and components at the moment of release of this model, and have plans to upgrade this arch more.