| CodeCarbon wasn't ready until the training was over so we only did an additional 10h run to measure with and then we can extrapolate to the whole training. | |
| This set of records captures the startup time and 2499 iterations in 2 records per gpu, since there was also an intermediary checkpoint saved half-way and we flush the CC | |
| records on each checkpoint saving. | |
| The training had 168000 iterations. Therefore multiply the reported data by 67. This would be quite approximate since we were using 16 nodes when doing | |
| the ramp up, then 64 and only the last 3 weeks 128 nodes. | |
| Caveat emptor: I'm not sure whether CC-reports overlap since each report is per gpu and I think they may be measuring the same thing, other than the gpu itself. | |
| So this requires research. | |
| Each csv file contains a report for a single gpu. | |