Translation
Transformers
PyTorch
TensorFlow
JAX
Rust
Safetensors
t5
text2text-generation
summarization
text-generation-inference
Instructions to use google-t5/t5-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google-t5/t5-base with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="google-t5/t5-base")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base") model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base") - Inference
- Notebooks
- Google Colab
- Kaggle
Order of the LayerNorm in T5 Model
#28
by dkarthikeyan1 - opened
Hi all,
Was just going through the T5 paper and noticed that the authors mention that the LayerNorm was different to the Vaswani et al. 2017 AAYN paper in that the AAYN paper implements LayerNorm on the outputs of the multi-headed attention (MHA) and FFN such that we get LayerNorm(x + SubLayer(x)) whereas T5 applies it on the inputs of the MHA and FFN such that the residual connection becomes: LayerNorm(x) or just x + SubLayer(LayerNorm(x). However when I looked at the T5 model I noticed that the T5LayerNorm comes after the T5Attention. Is this just how the model architecture is printed or a potential detraction from the paper?
Thanks!
dkarthikeyan1 changed discussion status to closed