TensorForger commited on
Commit
1dbc168
·
1 Parent(s): 1afd1cf

README cleanup

Browse files
Files changed (1) hide show
  1. README.md +18 -11
README.md CHANGED
@@ -11,36 +11,43 @@ pipeline_tag: image-to-image
11
  ---
12
 
13
 
14
- # Flow Upscaler
15
 
16
- **Flow Upscaler** is a fast Latent Upscaler model that works in [Flux.2](https://bfl.ai/models/flux-2) latent space.
17
 
18
- Under the hood, it is a lightweight **Rectified flow** model with **59M** parameters generating upscaled latents in just one denoising step.
19
 
20
  **[ComfyUI Node](https://github.com/TensorForger/comfyui-flow-upscaler)**
21
 
22
  Features:
23
 
24
- * Upscaling latents for image from **512x512** to **1024x1024** on RTX 5090 takes **7ms**
25
- * The model is trained only for **2X** upscaling, but you can chain it many times up to **8K** resolution
26
- * The training process involves **Flow Distillation** with Flux.2 as a teacher what forces it to understand image semantic very well
 
27
 
28
  Here is one **4X** upscaled image (two passes):
 
29
  ![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/upscaled_cat.png)
30
 
31
  ## How it works
32
 
33
- Architecturally, Flow Upscaler is a Unet with SDXL-style ResNet blocks. It takes the noisy sample on input and predicts velocity on output. This generation process happens in high resolution space. The low resolution latents are passed in a separate conditioning encoder that emits control signals that are passed to main Unet encoder through FiLM conditioning.
 
 
34
 
35
- No attention is used, so compute scales linearly with image area. This makes generation in 8K possible.
36
 
37
  ![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_architecture.PNG)
38
 
39
- The model is trained through Flow Distillation with Flux.2-klein-4B as a teacher. We generated 20K various images with Flux storing initial noise, generated latents and downscaled latents for conditioning. The downscaled latents are generated throgh decoding high resolution latents, downscaling in pixel space and encoding back to latents because downscaling directly in latents breaks some "latent patterns" that makes image blurry if you decode it.
40
 
41
- ![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_training_approach.PNG)
42
 
 
43
 
44
  ## Training code
45
 
46
- If you want to explore training code or use model outside of ComfyUI directly from code, see `notebooks/flow_upscaler` in [https://github.com/tensorforger/CTGMWorkshop](https://github.com/tensorforger/CTGMWorkshop)
 
 
 
11
  ---
12
 
13
 
14
+ # Flow Upscaler ComfyUI Nodes
15
 
16
+ **Flow Upscaler** is a fast latent upscaler model that works in the [Flux.2](https://bfl.ai/models/flux-2) latent space.
17
 
18
+ Under the hood, it is a lightweight **Rectified Flow** model with **59M** parameters that generates upscaled latents in a single denoising step.
19
 
20
  **[ComfyUI Node](https://github.com/TensorForger/comfyui-flow-upscaler)**
21
 
22
  Features:
23
 
24
+ * Upscaling latents from **512x512** to **1024x1024** takes **7ms** on an RTX 5090
25
+ * The model is trained for **2X** upscaling, but multiple passes can be chained to reach up to **8K** resolution
26
+ * A full pipeline with Flux generation, upscaling to **8K**, and decoding runs in just **25 seconds** (on RTX 5090)
27
+ * The training process uses **Flow Distillation** with Flux.2 as a teacher, forcing the model to learn strong image semantics
28
 
29
  Here is one **4X** upscaled image (two passes):
30
+
31
  ![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/upscaled_cat.png)
32
 
33
  ## How it works
34
 
35
+ Architecturally, Flow Upscaler is a U-Net with SDXL-style ResNet blocks. It takes a noisy sample as input and predicts velocity as output. The generation process happens directly in high-resolution latent space.
36
+
37
+ The low-resolution latents are passed through a separate conditioning encoder that produces control signals, which are injected into the main U-Net encoder using FiLM conditioning.
38
 
39
+ No attention layers are used, so compute scales linearly with image area. This makes generation at **8K** resolution possible.
40
 
41
  ![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_architecture.PNG)
42
 
43
+ The model is trained using **Flow Distillation** with Flux.2-klein-4B as a teacher. We generated **20K** diverse images with Flux, storing the initial noise, generated latents, and downscaled latents used for conditioning.
44
 
45
+ The downscaled latents are created by decoding high-resolution latents, downscaling them in pixel space, and encoding them back into latents. Direct latent downscaling introduces artifacts and breaks latent patterns, resulting in blurry decoded images.
46
 
47
+ ![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_training_approach.PNG)
48
 
49
  ## Training code
50
 
51
+ If you want to explore the training code or use the model outside ComfyUI, see:
52
+
53
+ `notebooks/flow_upscaler` in [https://github.com/tensorforger/CTGMWorkshop](https://github.com/tensorforger/CTGMWorkshop)