DETAILS = """ ### Motivation Existing tools like the [Hugging Face Model Memory Estimator](https://huggingface.co/spaces/hf-accelerate/model-memory-usage), [DeepSpeed Calculator](https://huggingface.co/spaces/andstor/deepspeed-model-memory-usage), and [DeepSpeed Native Utility](https://deepspeed.readthedocs.io/en/latest/memory.html) are valuable but don't support the full range of modern training configurations. This tool adds: - Arbitrary model configurations beyond preset architectures - FSDP and 5D parallelism support - Interactive memory breakdowns by category to inform configuration decisions ### References Helpful resources used while building this: - [The Ultra-Scale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook) - [Reducing Activation Recomputation in Large Transformer Models](https://arxiv.org/abs/2205.05198) - [Transformer Math - Michael Wornow](https://michaelwornow.net/2024/01/18/counting-params-in-transformer) - [Transformer Math 101](https://blog.eleuther.ai/transformer-math/) """ INSTRUCTIONS = """ This calculator will estimate the memory used per GPU during training (excluding intermediates) ## How to Use 1. Use Presets OR Adjust the parallelism, model, and training panels to match your run. 2. Press **Calculate** to refresh the memory breakdown chart. 3. Review the details and references below for context on the estimates. """ LIMITATIONS = """ ### Key Assumptions: - Standard transformer architecture with homogeneous layers - Adam optimizer - Mixed precision keeps master weights copy - Tensor parallelism includes sequence parallelism - Pipeline parallelism maintains consistent activation memory due to schedule ### Not Currently Supported: - Non-standard architectures (alternating dense/sparse layers, custom attention) - Multi-modal models with vision layers - Non-homogeneous parameter dtypes (e.g. BF16 & MXFP4 in GPT-OSS). Mixed Precision is supported. - Kernel/framework overhead and intermediate memory For advanced configurations, results should be validated against profiling. """