Commit
·
63dfd57
1
Parent(s):
685fc6b
Update README.md
Browse files
README.md
CHANGED
|
@@ -10,14 +10,14 @@ license: apache-2.0
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# T5-Efficient-XL
|
| 13 |
-
|
| 14 |
|
| 15 |
T5-Efficient-XL is a variation of the original [T5-3B](https://huggingface.co/t5-3b) checkpoint and follows the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
|
| 16 |
It is a *pretrained-only* checkpoint and was released with the
|
| 17 |
paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
|
| 18 |
by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
|
| 19 |
|
| 20 |
-
In a nutshell, the paper indicates that a **
|
| 21 |
of similar parameter count.
|
| 22 |
|
| 23 |
To quote the paper:
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# T5-Efficient-XL
|
| 13 |
+
### *One of T5's Deep-Narrow checkpoints*
|
| 14 |
|
| 15 |
T5-Efficient-XL is a variation of the original [T5-3B](https://huggingface.co/t5-3b) checkpoint and follows the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
|
| 16 |
It is a *pretrained-only* checkpoint and was released with the
|
| 17 |
paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
|
| 18 |
by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
|
| 19 |
|
| 20 |
+
In a nutshell, the paper indicates that a **Deep-Narrow** model architecture is favorable for **downstream** performance compared to other model architectures
|
| 21 |
of similar parameter count.
|
| 22 |
|
| 23 |
To quote the paper:
|