bigscience
/

tr1-13B-codecarbon

Model card Files Files and versions

stas commited on Sep 14, 2021

Commit

8ad08c2

·

1 Parent(s): 220ffd1

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -1,11 +1,13 @@
-CodeCarbon wasn't ready until the training was over so we only did an additional 10h run to measure with and the to extrapolate to the whole training.
-This captures the startup time and 2499 iterations in 2 records, since there was also an intermediary checkpoint saved half-way and we flush the CC
 records on each checkpoint saving.
 The training had 168000 iterations. Therefore multiply the reported data by 67. This would be quite approximate since we were using 16 nodes when doing
 the ramp up, then 64 and only the last 3 weeks 128 nodes.
 Each csv file contains a report for a single gpu.

+CodeCarbon wasn't ready until the training was over so we only did an additional 10h run to measure with and then we can extrapolate to the whole training.
+This set of records captures the startup time and 2499 iterations in 2 records per gpu, since there was also an intermediary checkpoint saved half-way and we flush the CC
 records on each checkpoint saving.
 The training had 168000 iterations. Therefore multiply the reported data by 67. This would be quite approximate since we were using 16 nodes when doing
 the ramp up, then 64 and only the last 3 weeks 128 nodes.
+Caveat emptor: I'm not sure whether CC-reports overlap since each report is per gpu and I think they may be measuring the same thing, other than the gpu itself. So this requires research.
 Each csv file contains a report for a single gpu.