Commit ·
b8ecf29
1
Parent(s): db69d5e
Update README.md
Browse files
README.md
CHANGED
|
@@ -38,7 +38,11 @@ A sequence of word embeddings is therefore processed sequentially by each transf
|
|
| 38 |
|
| 39 |
## Details model architecture
|
| 40 |
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
| Model | nl (el/dl) | ff | dm | kv | nh | #Params|
|
| 44 |
| ----| ---- | ---- | ---- | ---- | ---- | ----|
|
|
@@ -50,7 +54,7 @@ The *conventional* T5 architectures are summarized in the following table:
|
|
| 50 |
| **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
|
| 51 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
| Abbreviation | Definition |
|
| 56 |
| ----| ---- |
|
|
@@ -66,10 +70,6 @@ with the following definitions:
|
|
| 66 |
|
| 67 |
If a model checkpoint has no specific, *el* or *dl* than both the number of encoder- and decoder layers correspond no *nl*.
|
| 68 |
|
| 69 |
-
This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
|
| 70 |
-
It has **2852** million parameters and thus requires **11406** MB of memory in full precision (*fp32*)
|
| 71 |
-
or **5703** MB of memory in half precision (*fp16* or *bf16*).
|
| 72 |
-
|
| 73 |
## Pre-Training
|
| 74 |
|
| 75 |
The checkpoint was pretrained on the [Colossal, Cleaned version of Common Crawl (C4)](https://huggingface.co/datasets/c4) for 524288 steps using
|
|
|
|
| 38 |
|
| 39 |
## Details model architecture
|
| 40 |
|
| 41 |
+
This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
|
| 42 |
+
It has **2852** million parameters and thus requires **11406** MB of memory in full precision (*fp32*)
|
| 43 |
+
or **5703** MB of memory in half precision (*fp16* or *bf16*).
|
| 44 |
+
|
| 45 |
+
The *conventional* T5 architectures are summarized as follows:
|
| 46 |
|
| 47 |
| Model | nl (el/dl) | ff | dm | kv | nh | #Params|
|
| 48 |
| ----| ---- | ---- | ---- | ---- | ---- | ----|
|
|
|
|
| 54 |
| **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
|
| 55 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
| 56 |
|
| 57 |
+
, whereas the following abbreviations are used:
|
| 58 |
|
| 59 |
| Abbreviation | Definition |
|
| 60 |
| ----| ---- |
|
|
|
|
| 70 |
|
| 71 |
If a model checkpoint has no specific, *el* or *dl* than both the number of encoder- and decoder layers correspond no *nl*.
|
| 72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
## Pre-Training
|
| 74 |
|
| 75 |
The checkpoint was pretrained on the [Colossal, Cleaned version of Common Crawl (C4)](https://huggingface.co/datasets/c4) for 524288 steps using
|