Spaces:
Sleeping
Sleeping
| DETAILS = """ | |
| ### Resources I found helpful while building this tool: | |
| - [The Ultra-Scale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook) | |
| - [Reducing Activation Recomputation in Large Transformer Models](https://arxiv.org/abs/2205.05198) | |
| - [Transformer Math - Michael Wornow](https://michaelwornow.net/2024/01/18/counting-params-in-transformer) | |
| - [Transformer Math 101](https://blog.eleuther.ai/transformer-math/) | |
| ### Why this tool? | |
| While there are some good tools out there already: | |
| - [Hugging Face Model Memory Estimator](https://huggingface.co/spaces/hf-accelerate/model-memory-usage) | |
| - [DeepSpeed Model Memory Calculator](https://huggingface.co/spaces/andstor/deepspeed-model-memory-usage) | |
| - [DeepSpeed Native Utility](https://deepspeed.readthedocs.io/en/latest/memory.html) | |
| None of them had all the features I wanted in one place. I wanted a tool that could: | |
| - Accept arbitrary model configurations | |
| - Support FSDP | |
| - Support 5d parallelism | |
| - Be interactive and break down memory usage by category, to better inform configurations. | |
| """ | |