google
/

t5-efficient-xl

text2text-generation

text-generation-inference

Model card Files Files and versions

patrickvonplaten commited on Feb 15, 2022

Commit

b8ecf29

·

1 Parent(s): db69d5e

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -38,7 +38,11 @@ A sequence of word embeddings is therefore processed sequentially by each transf
 ## Details model architecture
-The *conventional* T5 architectures are summarized in the following table:
 | Model | nl (el/dl) | ff | dm | kv | nh | #Params|
 | ----| ---- | ---- | ---- | ---- | ---- | ----|
@@ -50,7 +54,7 @@ The *conventional* T5 architectures are summarized in the following table:
 | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
 | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
-with the following definitions:
 | Abbreviation | Definition |
 | ----| ---- |
@@ -66,10 +70,6 @@ with the following definitions:
 If a model checkpoint has no specific, *el* or *dl* than both the number of encoder- and decoder layers correspond no *nl*.
-This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
-It has **2852** million parameters and thus requires **11406** MB of memory in full precision (*fp32*)
- or  **5703** MB of memory in half precision (*fp16* or *bf16*).
 ## Pre-Training
 The checkpoint was pretrained on the [Colossal, Cleaned version of Common Crawl (C4)](https://huggingface.co/datasets/c4) for 524288 steps using

 ## Details model architecture
+This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
+It has **2852** million parameters and thus requires **11406** MB of memory in full precision (*fp32*)
+ or  **5703** MB of memory in half precision (*fp16* or *bf16*).
+The *conventional* T5 architectures are summarized as follows:
 | Model | nl (el/dl) | ff | dm | kv | nh | #Params|
 | ----| ---- | ---- | ---- | ---- | ---- | ----|
 | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
 | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
+, whereas the following abbreviations are used:
 | Abbreviation | Definition |
 | ----| ---- |
 If a model checkpoint has no specific, *el* or *dl* than both the number of encoder- and decoder layers correspond no *nl*.
 ## Pre-Training
 The checkpoint was pretrained on the [Colossal, Cleaned version of Common Crawl (C4)](https://huggingface.co/datasets/c4) for 524288 steps using