bigscience
/

mt0-xxl-mt

Text Generation

text2text-generation

Model card Files Files and versions

TimeRobber commited on Oct 28, 2022

Commit

b954a57

·

1 Parent(s): 6f9f453

Update README.md

Files changed (1) hide show

README.md +10 -12

README.md CHANGED Viewed

@@ -175,8 +175,6 @@ Multilingual model capable of following user instructions in a variety of langua
 <details>
 <summary>Click to expand</summary>
-Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
 ### Model Architecture and Objective
 * Same architecture as [mt5](https://arxiv.org/abs/2010.11934)
@@ -187,11 +185,14 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
 ### Compute infrastructure
-// TODO @adarob: Can you describe where you trained it?
-#### Hardware
-// TODO @adarob: Can you describe what was the hardware used?
 #### Software
@@ -220,7 +221,7 @@ It was pretrained on mC4 and then finetuned on xP3, P3 or xP3mt.
 ## Speeds, Sizes, Times
 // TODO @adarob: Maybe we can push tensorboard on this repo as well
-Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11-176B-ml-logs/)
 - Checkpoint size:
@@ -228,9 +229,6 @@ Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/
 - Number of epochs: 1
-// TODO @adarob: Can you share where the server is?
-- Server training location:
 ## Environmental Impact
@@ -269,7 +267,7 @@ print(tokenizer.decode(outputs[0]))
 ## Intended Use
-This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further fine-tuned for specific tasks. Use cases below are not exhaustive.
 ### Direct Use

 <details>
 <summary>Click to expand</summary>
 ### Model Architecture and Objective
 * Same architecture as [mt5](https://arxiv.org/abs/2010.11934)
 ### Compute infrastructure
+Models were finetuned on [TPUv4](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#tpu_v4):
+ - `mt0-small` was finetuned on TPUv4-64
+ - `mt0-base` was finetuned on TPUv4-64
+ - `mt0-large` was finetuned on TPUv4-64
+ - `mt0-xl` was finetuned on TPUv4-128
+ - `mt0-xxl` was finetuned on TPUv4-256
+ - `mt0-mt-xxl` was finetuned on TPUv4-256
+ - `mt0-p3-xxl` was finetuned on TPUv4-256
 #### Software
 ## Speeds, Sizes, Times
 // TODO @adarob: Maybe we can push tensorboard on this repo as well
+Training logs:
 - Checkpoint size:
 - Number of epochs: 1
 ## Environmental Impact
 ## Intended Use
+This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further finetuned for specific tasks. Use cases below are not exhaustive.
 ### Direct Use