Inference Time
Hi,
Great work on this project,
I have used the weights for pixart-alpha T5 model inside this repo "pixart_t5_base_iter150000.pth" and observed a slight degradation in terms inference time (increased) by around 3 seconds extra on a T4 GPU compared to the original encoder shipped with the model. (also generated image adherence to the prompt )
is it possible that the projections layers have impacted this area, knowing that I used the same T5 encoder code in your github repo: https://github.com/LifuWang-66/DistillT5/blob/main/models/T5_encoder.py.
thanks
Is 3 seconds for processing text by T5 or is it for generating the final image? If it is the latter, it would be helpful to profile the time for T5, DiT, and VAE. The time taken by T5 should be negligible compared to the other two components.
thanks for reply, after debugging the issue was that the denoising steps takes the time, I updated the diffusers library and it works, and its much better now.