bigscience
/

bloom-3b

Text Generation

Eval Results (legacy)

text-generation-inference

Model card Files Files and versions

Muennighoff commited on Aug 8, 2022

Commit

1c5992f

·

1 Parent(s): 68331cd

Update README.md

Files changed (1) hide show

README.md +5 -14

README.md CHANGED Viewed

@@ -1662,7 +1662,9 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
 * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
-* 2.5 billion parameters:
     * 30 layers, 32 attention heads
@@ -1705,18 +1707,7 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
 #### **Training**
-_In progress._
-Current training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11-176B-ml-logs/)
-- Checkpoint size:
-    - Bf16 weights: 329GB
-    - Full checkpoint with optimizer states: 2.3TB
-- Training throughput: About 150 TFLOP per GPU per second
 - Number of epochs: 1 (*current target*)
@@ -1724,7 +1715,7 @@ Current training logs: [Tensorboard link](https://huggingface.co/tensorboard/big
     - Started 11th March, 2022 11:42am PST
-    - Estimated end: 5th July, 2022
 - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments)

 * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
+* 3,002,557,440 parameters:
+    * 642,252,800 embedding parameters
     * 30 layers, 32 attention heads
 #### **Training**
+Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11c-2B5-logs)
 - Number of epochs: 1 (*current target*)
     - Started 11th March, 2022 11:42am PST
+    - Ended 5th July, 2022
 - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments)