The output from this vae is very blurry. Example:
I think something is wrong.
Double checking the conversion code, look like the original implementation has a Attention Block at the final stage of Encode and first stage of Decoder which diffusers doesn't support it ATM.
· Sign up or log in to comment