Update README.md
Browse files
README.md
CHANGED
|
@@ -6,4 +6,32 @@ datasets:
|
|
| 6 |
- bitmind/AFHQ
|
| 7 |
- ILSVRC/imagenet-1k
|
| 8 |
pipeline_tag: unconditional-image-generation
|
| 9 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
- bitmind/AFHQ
|
| 7 |
- ILSVRC/imagenet-1k
|
| 8 |
pipeline_tag: unconditional-image-generation
|
| 9 |
+
---
|
| 10 |
+
---
|
| 11 |
+
license: apache-2.0
|
| 12 |
+
language:
|
| 13 |
+
- en
|
| 14 |
+
datasets:
|
| 15 |
+
- bitmind/AFHQ
|
| 16 |
+
- ILSVRC/imagenet-1k
|
| 17 |
+
pipeline_tag: unconditional-image-generation
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
# Transformer AutoRegressive Flow Model
|
| 21 |
+
The TarFlow is proposed by [Zhai et al., 2024], which introduces stacks of autoregressive Transformer blocks (similar to MAF) into the building of affine coupling layers to do Non-Volume Preserving, combined with guidance and denoising }, finally achieves state-of-the-art results across multiple benchmarks.
|
| 22 |
+
|
| 23 |
+
Let $z$ denotes the noise direction and $x$ denotes the image direction, both with size $(B,T,C)$, where B,T,C represent batch size, patchified sequence length, and feature dimension, respectively. For TarFlow model, an autoregressive block can be written as:
|
| 24 |
+
|
| 25 |
+
\begin{equation}
|
| 26 |
+
\begin{aligned}
|
| 27 |
+
\text{Forward:\quad}z_t &= \exp(-s(x_{<t}))(x_t-u(x_{<t})),\\
|
| 28 |
+
\text{Inverse:\quad}x_t &= \exp(s(x_{<t})) z_t +u(x_{<t}).
|
| 29 |
+
\end{aligned}
|
| 30 |
+
\end{equation}
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
It's sampling process is extremely slow, and we want to accelerate it in []. In experiments, we found that the
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
[1] Zhai S, Zhang R, Nakkiran P, et al. Normalizing flows are capable generative models[J]. arXiv preprint arXiv:2412.06329, 2024.
|