The limitation of the image resolution for the unet
I change the height: int = 576, width: int = 1024 to height: int = 800, width: int = 800, then I encounter an error as following
anaconda3/envs/xxx/lib/python3.9/site-packages/diffusers/models/unet_3d_blocks.py", line 2353, in forward
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 26 but got size 25 for tensor number 1 in the list.
and for line 458-479 in anaconda3/envs/xxx/lib/python3.9/site-packages/diffusers/models/unet_spatio_temporal_condition.py
for i, upsample_block in enumerate(self.up_blocks):
res_samples = down_block_res_samples[-len(upsample_block.resnets) :]
down_block_res_samples = down_block_res_samples[: -len(upsample_block.resnets)]
print("sample", sample.shape)
print("res_samples", res_samples[-1].shape)
if hasattr(upsample_block, "has_cross_attention") and upsample_block.has_cross_attention:
sample = upsample_block(
hidden_states=sample,
temb=emb,
res_hidden_states_tuple=res_samples,
encoder_hidden_states=encoder_hidden_states,
image_only_indicator=image_only_indicator,
)
else:
sample = upsample_block(
hidden_states=sample,
temb=emb,
res_hidden_states_tuple=res_samples,
image_only_indicator=image_only_indicator,
)
the outputs are:
sample torch.Size([16, 1280, 13, 13])
res_samples torch.Size([16, 1280, 13, 13])
sample torch.Size([16, 1280, 26, 26])
res_samples torch.Size([16, 1280, 25, 25])
Is it because of the shape of unet layers? which donot support my 800x800 latent shape?