Training memory requirements.

by thanhdaonguyen - opened Jun 19, 2024

Jun 19, 2024

What is the minimum memory it takes to train DPO on Llama-3-70B with context length 4096? And what is the config to achieve that?

odelalleau

NVIDIA org Jun 19, 2024

•

edited Jun 19, 2024

Got this answer from my colleague who worked on it:

The minimum VRAM necessary is 80GB per GPU, and 8 GPUs per node.
With TP=8 and PP=2, you will need a minimum of 2 nodes to host the model, and this will allow you to train with DP=1.
We used 32 nodes for training and this gave us DP=16. These numbers assume you're using Nemo's bf16-mixed mode.

thanhdaonguyen

Jun 21, 2024

Thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment