Update README.md
Browse files
README.md
CHANGED
|
@@ -1,11 +1,13 @@
|
|
| 1 |
-
CodeCarbon wasn't ready until the training was over so we only did an additional 10h run to measure with and
|
| 2 |
|
| 3 |
-
This captures the startup time and 2499 iterations in 2 records, since there was also an intermediary checkpoint saved half-way and we flush the CC
|
| 4 |
records on each checkpoint saving.
|
| 5 |
|
| 6 |
The training had 168000 iterations. Therefore multiply the reported data by 67. This would be quite approximate since we were using 16 nodes when doing
|
| 7 |
the ramp up, then 64 and only the last 3 weeks 128 nodes.
|
| 8 |
|
|
|
|
|
|
|
| 9 |
Each csv file contains a report for a single gpu.
|
| 10 |
|
| 11 |
|
|
|
|
| 1 |
+
CodeCarbon wasn't ready until the training was over so we only did an additional 10h run to measure with and then we can extrapolate to the whole training.
|
| 2 |
|
| 3 |
+
This set of records captures the startup time and 2499 iterations in 2 records per gpu, since there was also an intermediary checkpoint saved half-way and we flush the CC
|
| 4 |
records on each checkpoint saving.
|
| 5 |
|
| 6 |
The training had 168000 iterations. Therefore multiply the reported data by 67. This would be quite approximate since we were using 16 nodes when doing
|
| 7 |
the ramp up, then 64 and only the last 3 weeks 128 nodes.
|
| 8 |
|
| 9 |
+
Caveat emptor: I'm not sure whether CC-reports overlap since each report is per gpu and I think they may be measuring the same thing, other than the gpu itself. So this requires research.
|
| 10 |
+
|
| 11 |
Each csv file contains a report for a single gpu.
|
| 12 |
|
| 13 |
|