File size: 1,058 Bytes
f45427d
75dbc58
 
 
 
 
 
 
 
 
 
f45427d
 
 
 
 
75dbc58
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
DETAILS = """
### Motivation
Existing tools like the [Hugging Face Model Memory Estimator](https://huggingface.co/spaces/hf-accelerate/model-memory-usage), [DeepSpeed Calculator](https://huggingface.co/spaces/andstor/deepspeed-model-memory-usage), and [DeepSpeed Native Utility](https://deepspeed.readthedocs.io/en/latest/memory.html) are valuable but don't support the full range of modern training configurations.

This tool adds:
- Arbitrary model configurations beyond preset architectures
- FSDP and 5D parallelism support
- Interactive memory breakdowns by category to inform configuration decisions

### References
Helpful resources used while building this:
- [The Ultra-Scale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook)
- [Reducing Activation Recomputation in Large Transformer Models](https://arxiv.org/abs/2205.05198)
- [Transformer Math - Michael Wornow](https://michaelwornow.net/2024/01/18/counting-params-in-transformer)
- [Transformer Math 101](https://blog.eleuther.ai/transformer-math/)
"""

INSTRUCTIONS = """ """