| --- |
| license: unlicense |
| tags: |
| - flux2 |
| - flux |
| - image-to-image |
| - image-upscaling |
| - latent-upscaling |
| - super-resolution |
| - diffusion |
| - flow-matching |
| - rectified-flow |
| - generative-models |
| - image-generation |
| - pytorch |
| library_name: pytorch |
| base_model: |
| - black-forest-labs/FLUX.2-klein-4B |
| pipeline_tag: image-to-image |
| --- |
| |
|
|
| # Flow Upscaler |
|
|
|
|
|
|
| **Flow Upscaler** is a fast latent upscaler model that works in the [Flux.2](https://bfl.ai/models/flux-2) latent space. |
|
|
| Under the hood, it is a lightweight **Rectified Flow** model with **59M** parameters that generates upscaled latents in a single denoising step. |
|
|
| **[ComfyUI Node](https://github.com/TensorForger/comfyui-flow-upscaler)** |
|
|
| Features: |
|
|
| * Upscaling from **512x512** to **1024x1024** takes **8ms*** |
| * The model is trained for **2X** upscaling, but multiple passes can be chained to reach up to **8K** resolution |
| * A full pipeline with Flux generation, upscaling to **8K**, and decoding runs in just **25 seconds** (on RTX 5090) |
| * The training process uses **Flow Distillation** with Flux.2 as a teacher, forcing the model to learn strong image semantics |
|
|
| *On RTX 5090, in latent space, without decoding, see benchmark [here](https://github.com/tensorforger/CTGMWorkshop). |
| |
| Here is one **4X** upscaled image (two passes): |
| |
|  |
| |
| ## How it works |
| |
| Architecturally, Flow Upscaler is a U-Net with SDXL-style ResNet blocks. It takes a noisy sample as input and predicts velocity as output. The generation process happens directly in high-resolution latent space. |
| |
| The low-resolution latents are passed through a separate conditioning encoder that produces control signals, which are injected into the main U-Net encoder using FiLM conditioning. |
| |
| No attention layers are used, so compute scales linearly with image area. This makes generation at **8K** resolution possible. |
| |
|  |
| |
| The model is trained using **Flow Distillation** with Flux.2-klein-4B as a teacher. We generated **20K** diverse images with Flux, storing the initial noise, generated latents, and downscaled latents used for conditioning. |
| |
| The downscaled latents are created by decoding high-resolution latents, downscaling them in pixel space, and encoding them back into latents. Direct latent downscaling introduces artifacts and breaks latent patterns, resulting in blurry decoded images. |
| |
|  |
| |
| ## Training code |
| |
| If you want to explore the training code or use the model outside ComfyUI, see: |
| |
| `notebooks/flow_upscaler` in [https://github.com/tensorforger/CTGMWorkshop](https://github.com/tensorforger/CTGMWorkshop) |