The limitation of the image resolution for the unet

#36
by Tonax - opened

I change the height: int = 576, width: int = 1024 to height: int = 800, width: int = 800, then I encounter an error as following
anaconda3/envs/xxx/lib/python3.9/site-packages/diffusers/models/unet_3d_blocks.py", line 2353, in forward
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 26 but got size 25 for tensor number 1 in the list.

and for line 458-479 in anaconda3/envs/xxx/lib/python3.9/site-packages/diffusers/models/unet_spatio_temporal_condition.py
for i, upsample_block in enumerate(self.up_blocks):
res_samples = down_block_res_samples[-len(upsample_block.resnets) :]
down_block_res_samples = down_block_res_samples[: -len(upsample_block.resnets)]

        print("sample", sample.shape)
        print("res_samples", res_samples[-1].shape)

        if hasattr(upsample_block, "has_cross_attention") and upsample_block.has_cross_attention:
            sample = upsample_block(
                hidden_states=sample,
                temb=emb,
                res_hidden_states_tuple=res_samples,
                encoder_hidden_states=encoder_hidden_states,
                image_only_indicator=image_only_indicator,
            )
        else:
            sample = upsample_block(
                hidden_states=sample,
                temb=emb,
                res_hidden_states_tuple=res_samples,
                image_only_indicator=image_only_indicator,
            )

the outputs are:
sample torch.Size([16, 1280, 13, 13])
res_samples torch.Size([16, 1280, 13, 13])
sample torch.Size([16, 1280, 26, 26])
res_samples torch.Size([16, 1280, 25, 25])

Is it because of the shape of unet layers? which donot support my 800x800 latent shape?

Sign up or log in to comment