Commit
·
3d6744b
1
Parent(s):
b8ecf29
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,7 @@ license: apache-2.0
|
|
| 11 |
|
| 12 |
# T5-Efficient-XL (Deep-Narrow version)
|
| 13 |
|
| 14 |
-
T5-Efficient-XL is a variation of
|
| 15 |
It is a *pretrained-only* checkpoint and was released with the
|
| 16 |
paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
|
| 17 |
by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
|
|
@@ -39,8 +39,8 @@ A sequence of word embeddings is therefore processed sequentially by each transf
|
|
| 39 |
## Details model architecture
|
| 40 |
|
| 41 |
This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
|
| 42 |
-
It has **2852** million parameters and thus requires **11406
|
| 43 |
-
or **5703
|
| 44 |
|
| 45 |
The *conventional* T5 architectures are summarized as follows:
|
| 46 |
|
|
@@ -54,7 +54,7 @@ The *conventional* T5 architectures are summarized as follows:
|
|
| 54 |
| **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
|
| 55 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
| 56 |
|
| 57 |
-
|
| 58 |
|
| 59 |
| Abbreviation | Definition |
|
| 60 |
| ----| ---- |
|
|
@@ -99,12 +99,14 @@ You can follow on of the following examples on how to fine-tune the model:
|
|
| 99 |
|
| 100 |
## Downstream Performance
|
| 101 |
|
| 102 |
-
TODO: Add table
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
## More information
|
| 105 |
|
| 106 |
We strongly recommend the reader to go carefully through the original paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)** to get a more nuanced understanding of this model checkpoint.
|
| 107 |
As explained in the following [issue](https://github.com/google-research/google-research/issues/986#issuecomment-1035051145), checkpoints including the *sh* or *skv*
|
| 108 |
-
model architecture variations have *not* been ported to Transformers as they are probably of limited practical usage and are lacking a more detailed description.
|
| 109 |
-
|
| 110 |
-
|
|
|
|
| 11 |
|
| 12 |
# T5-Efficient-XL (Deep-Narrow version)
|
| 13 |
|
| 14 |
+
T5-Efficient-XL is a variation of [Google's original T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) following the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
|
| 15 |
It is a *pretrained-only* checkpoint and was released with the
|
| 16 |
paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
|
| 17 |
by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
|
|
|
|
| 39 |
## Details model architecture
|
| 40 |
|
| 41 |
This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
|
| 42 |
+
It has **2852** million parameters and thus requires **11406 MB** of memory in full precision (*fp32*)
|
| 43 |
+
or **5703 MB** of memory in half precision (*fp16* or *bf16*).
|
| 44 |
|
| 45 |
The *conventional* T5 architectures are summarized as follows:
|
| 46 |
|
|
|
|
| 54 |
| **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
|
| 55 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
| 56 |
|
| 57 |
+
whereas the following abbreviations are used:
|
| 58 |
|
| 59 |
| Abbreviation | Definition |
|
| 60 |
| ----| ---- |
|
|
|
|
| 99 |
|
| 100 |
## Downstream Performance
|
| 101 |
|
| 102 |
+
TODO: Add table if available
|
| 103 |
+
|
| 104 |
+
## Computational Complexity
|
| 105 |
+
|
| 106 |
+
TODO: Add table if available
|
| 107 |
|
| 108 |
## More information
|
| 109 |
|
| 110 |
We strongly recommend the reader to go carefully through the original paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)** to get a more nuanced understanding of this model checkpoint.
|
| 111 |
As explained in the following [issue](https://github.com/google-research/google-research/issues/986#issuecomment-1035051145), checkpoints including the *sh* or *skv*
|
| 112 |
+
model architecture variations have *not* been ported to Transformers as they are probably of limited practical usage and are lacking a more detailed description. Those checkpoints are kept [here](https://huggingface.co/NewT5SharedHeadsSharedKeyValues) as they might be ported potentially in the future.
|
|
|
|
|
|