training demand of GPU

#37

by mintglobe - opened May 23, 2023

May 23, 2023

hello， i want to retrain starcoderbase with c language， just like retrain starcoderbase with python and get starcoder.
How many gpu do i need to run the training？ now I got a node with 2 A100，and it report ‘CUDA out of memory’. If I get a node with 8 A100(that's my limit), can i afford the training? thank you.

nib12345

May 25, 2023

•

edited Jun 5, 2023

I may be wrong, but

According to the model documentation, it states that the training process for this model would take 512 * A100 units and span a duration of 24 days.
Link: https://huggingface.co/bigcode/starcoder#hardware

So it means
One A100 40 GB per hour cost $1.10 on lambdalabs
It would cost ($1.1 per hour *24 hour * 24 days *512 A100) = $324403.2
or $1.1 * $24 * $24 * $512 = $324403.2

One A100 80 GB per hour cost $1.50 on lambdalabs
It would cost ($1.5 per hour *24 hour * 24 days *512 A100) = $442368
or $1.5 * $24 * $24 * $512 = $442368

Here, they did not say which A100 GPU they have used. It is A100 40 GB or A100 80GB

So i believe that most people cannot spend that much amount of cash to train the model.

loubnabnl

BigCode org May 25, 2023

•

edited May 25, 2023

You can fine-tune StarCoderBase on C (instead of training from Scratch like we did with Python to get StarCoder), although you probably won't be able to go through the full C dataset with 8 GPUs only in a short period of time, for information the python fine-tuning for 2 epochs on 35B tokens took ~10k GPU hours. Check this repo for some fine-tuning code: https://github.com/bigcode-project/starcoder

loubnabnl changed discussion status to closed Jun 6, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment