Instructions to use KexuanShi/Megatron-LM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use KexuanShi/Megatron-LM with NeMo:
# tag did not correspond to a valid NeMo domain.
- Notebooks
- Google Colab
- Kaggle
| # distributed package | |
| This package contains various utilities to finalize model weight gradients | |
| on each rank before the optimizer step. This includes a distributed data | |
| parallelism wrapper to all-reduce or reduce-scatter the gradients across | |
| data-parallel replicas, and a `finalize_model_grads` method to | |
| synchronize gradients across different parallelism modes (e.g., 'tied' | |
| layers on different pipeline stages, or gradients for experts in a MoE on | |
| different ranks due to expert parallelism). | |