Can this run on FLAN t5?

#5
by ljhwild - opened

I'm just reading the paper and it appears long t5 runs on t5 and not on flan t5.
Is there any reason why?

Hello! Both t5 and flan-t5 have the same model architecture. You can see in flan-t5's model card that it is using the t5 architecture under the hood: https://huggingface.co/google/flan-t5-xxl/blob/main/config.json#L3

However, long-t5 has a slightly different architecture to enable it to scale to longer sequences.

Hope that helps!

Sign up or log in to comment