Commit
·
b954a57
1
Parent(s):
6f9f453
Update README.md
Browse files
README.md
CHANGED
|
@@ -175,8 +175,6 @@ Multilingual model capable of following user instructions in a variety of langua
|
|
| 175 |
<details>
|
| 176 |
<summary>Click to expand</summary>
|
| 177 |
|
| 178 |
-
Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
|
| 179 |
-
|
| 180 |
### Model Architecture and Objective
|
| 181 |
|
| 182 |
* Same architecture as [mt5](https://arxiv.org/abs/2010.11934)
|
|
@@ -187,11 +185,14 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
|
|
| 187 |
|
| 188 |
### Compute infrastructure
|
| 189 |
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
|
|
|
|
|
|
|
|
|
| 195 |
|
| 196 |
#### Software
|
| 197 |
|
|
@@ -220,7 +221,7 @@ It was pretrained on mC4 and then finetuned on xP3, P3 or xP3mt.
|
|
| 220 |
## Speeds, Sizes, Times
|
| 221 |
|
| 222 |
// TODO @adarob: Maybe we can push tensorboard on this repo as well
|
| 223 |
-
Training logs:
|
| 224 |
|
| 225 |
- Checkpoint size:
|
| 226 |
|
|
@@ -228,9 +229,6 @@ Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/
|
|
| 228 |
|
| 229 |
- Number of epochs: 1
|
| 230 |
|
| 231 |
-
// TODO @adarob: Can you share where the server is?
|
| 232 |
-
- Server training location:
|
| 233 |
-
|
| 234 |
|
| 235 |
## Environmental Impact
|
| 236 |
|
|
@@ -269,7 +267,7 @@ print(tokenizer.decode(outputs[0]))
|
|
| 269 |
|
| 270 |
## Intended Use
|
| 271 |
|
| 272 |
-
This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further
|
| 273 |
|
| 274 |
### Direct Use
|
| 275 |
|
|
|
|
| 175 |
<details>
|
| 176 |
<summary>Click to expand</summary>
|
| 177 |
|
|
|
|
|
|
|
| 178 |
### Model Architecture and Objective
|
| 179 |
|
| 180 |
* Same architecture as [mt5](https://arxiv.org/abs/2010.11934)
|
|
|
|
| 185 |
|
| 186 |
### Compute infrastructure
|
| 187 |
|
| 188 |
+
Models were finetuned on [TPUv4](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#tpu_v4):
|
| 189 |
+
- `mt0-small` was finetuned on TPUv4-64
|
| 190 |
+
- `mt0-base` was finetuned on TPUv4-64
|
| 191 |
+
- `mt0-large` was finetuned on TPUv4-64
|
| 192 |
+
- `mt0-xl` was finetuned on TPUv4-128
|
| 193 |
+
- `mt0-xxl` was finetuned on TPUv4-256
|
| 194 |
+
- `mt0-mt-xxl` was finetuned on TPUv4-256
|
| 195 |
+
- `mt0-p3-xxl` was finetuned on TPUv4-256
|
| 196 |
|
| 197 |
#### Software
|
| 198 |
|
|
|
|
| 221 |
## Speeds, Sizes, Times
|
| 222 |
|
| 223 |
// TODO @adarob: Maybe we can push tensorboard on this repo as well
|
| 224 |
+
Training logs:
|
| 225 |
|
| 226 |
- Checkpoint size:
|
| 227 |
|
|
|
|
| 229 |
|
| 230 |
- Number of epochs: 1
|
| 231 |
|
|
|
|
|
|
|
|
|
|
| 232 |
|
| 233 |
## Environmental Impact
|
| 234 |
|
|
|
|
| 267 |
|
| 268 |
## Intended Use
|
| 269 |
|
| 270 |
+
This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further finetuned for specific tasks. Use cases below are not exhaustive.
|
| 271 |
|
| 272 |
### Direct Use
|
| 273 |
|