Instructions to use nvidia/Llama3-70B-DPO-Chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nvidia/Llama3-70B-DPO-Chat with NeMo:
# tag did not correspond to a valid NeMo domain.
- Notebooks
- Google Colab
- Kaggle
Training memory requirements.
#1
by thanhdaonguyen - opened
What is the minimum memory it takes to train DPO on Llama-3-70B with context length 4096? And what is the config to achieve that?
Got this answer from my colleague who worked on it:
The minimum VRAM necessary is 80GB per GPU, and 8 GPUs per node.
With TP=8 and PP=2, you will need a minimum of 2 nodes to host the model, and this will allow you to train with DP=1.
We used 32 nodes for training and this gave us DP=16. These numbers assume you're using Nemo's bf16-mixed mode.
Thank you!