FlowUpscaler / README.md
TensorForger's picture
cleanup
ae33429
|
Raw
History Blame Contribute Delete
2.89 kB
---
license: unlicense
tags:
- flux2
- flux
- image-to-image
- image-upscaling
- latent-upscaling
- super-resolution
- diffusion
- flow-matching
- rectified-flow
- generative-models
- image-generation
- pytorch
library_name: pytorch
base_model:
- black-forest-labs/FLUX.2-klein-4B
pipeline_tag: image-to-image
---
# Flow Upscaler
**Flow Upscaler** is a fast latent upscaler model that works in the [Flux.2](https://bfl.ai/models/flux-2) latent space.
Under the hood, it is a lightweight **Rectified Flow** model with **59M** parameters that generates upscaled latents in a single denoising step.
**[ComfyUI Node](https://github.com/TensorForger/comfyui-flow-upscaler)**
Features:
* Upscaling from **512x512** to **1024x1024** takes **8ms***
* The model is trained for **2X** upscaling, but multiple passes can be chained to reach up to **8K** resolution
* A full pipeline with Flux generation, upscaling to **8K**, and decoding runs in just **25 seconds** (on RTX 5090)
* The training process uses **Flow Distillation** with Flux.2 as a teacher, forcing the model to learn strong image semantics
*On RTX 5090, in latent space, without decoding, see benchmark [here](https://github.com/tensorforger/CTGMWorkshop).
Here is one **4X** upscaled image (two passes):
![comparison](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/upscaler_comparison.png)
## How it works
Architecturally, Flow Upscaler is a U-Net with SDXL-style ResNet blocks. It takes a noisy sample as input and predicts velocity as output. The generation process happens directly in high-resolution latent space.
The low-resolution latents are passed through a separate conditioning encoder that produces control signals, which are injected into the main U-Net encoder using FiLM conditioning.
No attention layers are used, so compute scales linearly with image area. This makes generation at **8K** resolution possible.
![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_architecture.PNG)
The model is trained using **Flow Distillation** with Flux.2-klein-4B as a teacher. We generated **20K** diverse images with Flux, storing the initial noise, generated latents, and downscaled latents used for conditioning.
The downscaled latents are created by decoding high-resolution latents, downscaling them in pixel space, and encoding them back into latents. Direct latent downscaling introduces artifacts and breaks latent patterns, resulting in blurry decoded images.
![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_training_approach.PNG)
## Training code
If you want to explore the training code or use the model outside ComfyUI, see:
`notebooks/flow_upscaler` in [https://github.com/tensorforger/CTGMWorkshop](https://github.com/tensorforger/CTGMWorkshop)