Spaces:
Sleeping
Sleeping
File size: 1,064 Bytes
f45427d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
DETAILS = """
### Resources I found helpful while building this tool:
- [The Ultra-Scale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook)
- [Reducing Activation Recomputation in Large Transformer Models](https://arxiv.org/abs/2205.05198)
- [Transformer Math - Michael Wornow](https://michaelwornow.net/2024/01/18/counting-params-in-transformer)
- [Transformer Math 101](https://blog.eleuther.ai/transformer-math/)
### Why this tool?
While there are some good tools out there already:
- [Hugging Face Model Memory Estimator](https://huggingface.co/spaces/hf-accelerate/model-memory-usage)
- [DeepSpeed Model Memory Calculator](https://huggingface.co/spaces/andstor/deepspeed-model-memory-usage)
- [DeepSpeed Native Utility](https://deepspeed.readthedocs.io/en/latest/memory.html)
None of them had all the features I wanted in one place. I wanted a tool that could:
- Accept arbitrary model configurations
- Support FSDP
- Support 5d parallelism
- Be interactive and break down memory usage by category, to better inform configurations.
"""
|