TensorForger
/

FlowUpscaler

image-upscaling

latent-upscaling

super-resolution

generative-models

image-generation

Model card Files Files and versions

FlowUpscaler / README.md

TensorForger's picture

cleanup

ae33429 12 days ago

|

History Blame Contribute Delete

2.89 kB

	---
	license: unlicense
	tags:
	- flux2
	- flux
	- image-to-image
	- image-upscaling
	- latent-upscaling
	- super-resolution
	- diffusion
	- flow-matching
	- rectified-flow
	- generative-models
	- image-generation
	- pytorch
	library_name: pytorch
	base_model:
	- black-forest-labs/FLUX.2-klein-4B
	pipeline_tag: image-to-image
	---


	# Flow Upscaler



	Flow Upscaler is a fast latent upscaler model that works in the [Flux.2](https://bfl.ai/models/flux-2) latent space.

	Under the hood, it is a lightweight Rectified Flow model with 59M parameters that generates upscaled latents in a single denoising step.

	[ComfyUI Node](https://github.com/TensorForger/comfyui-flow-upscaler)

	Features:

	* Upscaling from 512x512 to 1024x1024 takes 8ms*
	* The model is trained for 2X upscaling, but multiple passes can be chained to reach up to 8K resolution
	* A full pipeline with Flux generation, upscaling to 8K, and decoding runs in just 25 seconds (on RTX 5090)
	* The training process uses Flow Distillation with Flux.2 as a teacher, forcing the model to learn strong image semantics

	*On RTX 5090, in latent space, without decoding, see benchmark [here](https://github.com/tensorforger/CTGMWorkshop).

	Here is one 4X upscaled image (two passes):

	![comparison](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/upscaler_comparison.png)

	## How it works

	Architecturally, Flow Upscaler is a U-Net with SDXL-style ResNet blocks. It takes a noisy sample as input and predicts velocity as output. The generation process happens directly in high-resolution latent space.

	The low-resolution latents are passed through a separate conditioning encoder that produces control signals, which are injected into the main U-Net encoder using FiLM conditioning.

	No attention layers are used, so compute scales linearly with image area. This makes generation at 8K resolution possible.

	![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_architecture.PNG)

	The model is trained using Flow Distillation with Flux.2-klein-4B as a teacher. We generated 20K diverse images with Flux, storing the initial noise, generated latents, and downscaled latents used for conditioning.

	The downscaled latents are created by decoding high-resolution latents, downscaling them in pixel space, and encoding them back into latents. Direct latent downscaling introduces artifacts and breaks latent patterns, resulting in blurry decoded images.

	![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_training_approach.PNG)

	## Training code

	If you want to explore the training code or use the model outside ComfyUI, see:

	`notebooks/flow_upscaler` in [https://github.com/tensorforger/CTGMWorkshop](https://github.com/tensorforger/CTGMWorkshop)