google
/

t5-efficient-xl

@@ -11,7 +11,7 @@ license: apache-2.0
 # T5-Efficient-XL (Deep-Narrow version)
-T5-Efficient-XL is a variation of the original [T5-3B](https://huggingface.co/t5-3b) checkpoint and follows the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
 It is a *pretrained-only* checkpoint and was released with the
 paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
 by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
@@ -39,8 +39,8 @@ A sequence of word embeddings is therefore processed sequentially by each transf
 ## Details model architecture
 This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
-It has **2852** million parameters and thus requires **11406** MB of memory in full precision (*fp32*)
- or  **5703** MB of memory in half precision (*fp16* or *bf16*).
 The *conventional* T5 architectures are summarized as follows:
@@ -54,7 +54,7 @@ The *conventional* T5 architectures are summarized as follows:
 | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
 | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
-, whereas the following abbreviations are used:
 | Abbreviation | Definition |
 | ----| ---- |
@@ -99,12 +99,14 @@ You can follow on of the following examples on how to fine-tune the model:
 ## Downstream Performance
-TODO: Add table of full downstream performances if possible.
 ## More information
 We strongly recommend the reader to go carefully through the original paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)** to get a more nuanced understanding of this model checkpoint.
 As explained in the following [issue](https://github.com/google-research/google-research/issues/986#issuecomment-1035051145), checkpoints including the *sh* or *skv*
-model architecture variations have *not* been ported to Transformers as they are probably of limited practical usage and are lacking a more detailed description.

 # T5-Efficient-XL (Deep-Narrow version)
+T5-Efficient-XL is a variation of [Google's original T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) following the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
 It is a *pretrained-only* checkpoint and was released with the
 paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
 by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
 ## Details model architecture
 This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
+It has **2852** million parameters and thus requires **11406 MB** of memory in full precision (*fp32*)
+ or  **5703 MB** of memory in half precision (*fp16* or *bf16*).
 The *conventional* T5 architectures are summarized as follows:
 | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
 | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
+whereas the following abbreviations are used:
 | Abbreviation | Definition |
 | ----| ---- |
 ## Downstream Performance
+TODO: Add table if available
+## Computational Complexity
+TODO: Add table if available
 ## More information
 We strongly recommend the reader to go carefully through the original paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)** to get a more nuanced understanding of this model checkpoint.
 As explained in the following [issue](https://github.com/google-research/google-research/issues/986#issuecomment-1035051145), checkpoints including the *sh* or *skv*
+model architecture variations have *not* been ported to Transformers as they are probably of limited practical usage and are lacking a more detailed description. Those checkpoints are kept [here](https://huggingface.co/NewT5SharedHeadsSharedKeyValues) as they might be ported potentially in the future.