George Yang commited on
Commit
dd750c0
ยท
1 Parent(s): 60d4d37

SEO: Optimize Space README for discoverability and traffic

Browse files

- Add comprehensive tags (llm, gpu, deep-learning, pytorch, etc.)
- Set pinned: true for better visibility
- Add detailed feature descriptions and use cases
- Include model comparison table
- Add multiple CTAs to GitHub repo
- Add technical details and authoritative references
- Improve formatting with emojis and badges
- Add 'Made with โค๏ธ by the AI community' footer

Files changed (1) hide show
  1. README.md +83 -38
README.md CHANGED
@@ -4,59 +4,104 @@ emoji: ๐ŸŽฎ
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: docker
7
- pinned: false
8
  license: mit
 
9
  ---
10
 
11
- # GPU Memory Calculator
12
 
13
- Calculate GPU memory requirements for training and running Large Language Models (LLMs). Supports multiple training engines (PyTorch DDP, DeepSpeed ZeRO, Megatron-LM, FSDP), inference engines (HuggingFace, vLLM, TGI, TensorRT-LLM, SGLang), and multi-node training configurations.
14
 
15
- ## Features
 
 
16
 
17
- - **Training Memory Calculation**: Calculate memory for PyTorch DDP, DeepSpeed ZeRO (0-3), Megatron-LM, FSDP, and hybrid approaches
18
- - **Inference Memory Calculation**: Estimate memory requirements for HuggingFace Transformers, vLLM, TGI, TensorRT-LLM, and SGLang
19
- - **Multi-Node Support**: Calculate network overhead for distributed training across multiple nodes
20
- - **Model Presets**: Pre-configured settings for popular models (LLaMA 2, GPT-3, Mixtral, GLM, Qwen, DeepSeek-MoE)
21
- - **Configuration Export**: Generate configs for Accelerate, Lightning, Axolotl, DeepSpeed, YAML, and JSON
22
- - **Batch Size Optimization**: Automatically find the maximum batch size that fits in GPU memory
23
 
24
- ## Supported Training Engines
 
 
 
 
25
 
26
- - PyTorch DDP (Distributed Data Parallel)
27
- - DeepSpeed ZeRO (Stages 0-3) with CPU/NVMe offloading
28
- - Megatron-LM (Tensor + Pipeline Parallelism)
29
- - PyTorch FSDP (Fully Sharded Data Parallel)
30
- - Megatron-LM + DeepSpeed (Hybrid)
31
 
32
- ## Supported Inference Engines
 
 
 
 
 
 
33
 
34
- - HuggingFace Transformers
35
- - vLLM (PagedAttention)
36
- - Text Generation Inference (TGI)
37
- - TensorRT-LLM
38
- - SGLang (RadixAttention)
 
 
39
 
40
- ## How to Use
 
 
 
 
 
41
 
42
- 1. **Select a preset model** or configure your own
43
- 2. **Choose training/inference engine** and adjust parameters
44
- 3. **Calculate** memory requirements instantly
45
- 4. **Export** configurations to your preferred framework
46
 
47
- ## Example Use Cases
 
 
 
 
 
 
 
48
 
49
- - Planning GPU requirements for LLM training
50
- - Optimizing batch sizes for your hardware
51
- - Comparing memory efficiency across engines
52
- - Estimating KV cache memory for inference
53
- - Calculating multi-node network overhead
54
 
55
- ## Links
 
 
 
 
56
 
57
- - [GitHub Repository](https://github.com/George614/gpu-mem-calculator)
58
- - [Documentation](https://github.com/George614/gpu-mem-calculator/blob/main/README.md)
59
 
60
- ## License
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
- MIT License - see [LICENSE](https://github.com/George614/gpu-mem-calculator/blob/main/LICENSE) for details.
 
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: docker
7
+ pinned: true
8
  license: mit
9
+ tags: [llm, gpu, deep-learning, pytorch, training, inference, memory-calculator, deepspeed, megatron, fsdp, vllm, quantization, machine-learning, ai, tools]
10
  ---
11
 
12
+ # ๐ŸŽฎ GPU Memory Calculator for LLM Training & Inference
13
 
14
+ **Instantly calculate GPU memory requirements for training and running Large Language Models.** Plan your infrastructure, avoid OOM errors, and optimize costs before you start.
15
 
16
+ [![GitHub Stars](https://img.shields.io/github/stars/George614/gpu-mem-calculator?style=social)](https://github.com/George614/gpu-mem-calculator)
17
+ [![GitHub Issues](https://img.shields.io/github/issues/George614/gpu-mem-calculator)](https://github.com/George614/gpu-mem-calculator/issues)
18
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
19
 
20
+ ## ๐Ÿš€ Why Use This Tool?
 
 
 
 
 
21
 
22
+ - **๐Ÿ’ฐ Save Money** - Know exactly what GPUs you need before spending thousands
23
+ - **โšก Avoid OOM** - Validate your config fits in memory before training
24
+ - **๐Ÿ“Š Compare Strategies** - DeepSpeed vs Megatron vs FSDP at a glance
25
+ - **๐ŸŽฏ Plan Infrastructure** - From 7B to 175B+ parameter models
26
+ - **โš™๏ธ Export Configs** - Generate working configs for your training framework
27
 
28
+ ## โœจ Features
 
 
 
 
29
 
30
+ ### Training Memory Calculation
31
+ Calculate memory for all major training frameworks:
32
+ - **PyTorch DDP** - Baseline distributed training
33
+ - **DeepSpeed ZeRO** (Stages 0-3) with CPU/NVMe offloading
34
+ - **Megatron-LM** - Tensor + Pipeline parallelism
35
+ - **PyTorch FSDP** - Fully sharded data parallel
36
+ - **Megatron + DeepSpeed** - Hybrid approach
37
 
38
+ ### Inference Memory Estimation
39
+ Optimize your deployment with:
40
+ - **HuggingFace Transformers** - Baseline inference
41
+ - **vLLM** - PagedAttention optimization
42
+ - **TGI** - Text Generation Inference
43
+ - **TensorRT-LLM** - Maximum throughput
44
+ - **SGLang** - RadixAttention caching
45
 
46
+ ### Smart Features
47
+ - ๐ŸŽฏ **Model Presets** - LLaMA 2, GPT-3, Mixtral, GLM, Qwen, DeepSeek-MoE
48
+ - ๐Ÿ“ฆ **Export Configs** - Accelerate, Lightning, Axolotl, DeepSpeed, YAML, JSON
49
+ - ๐Ÿ”ข **Batch Optimizer** - Auto-find max batch size for your hardware
50
+ - ๐ŸŒ **Multi-Node** - Calculate network overhead for distributed training
51
+ - ๐Ÿ’พ **KV Cache** - Quantization options (INT4/INT8/FP8/None)
52
 
53
+ ## ๐ŸŽฏ Supported Models
 
 
 
54
 
55
+ | Model | Parameters | Use Case |
56
+ |-------|-----------|----------|
57
+ | LLaMA 2 | 7B, 13B, 70B | General purpose |
58
+ | GPT-3 | 175B | Large scale training |
59
+ | Mixtral 8x7B | 47B | Mixture of Experts |
60
+ | GLM-4 | 9B - 355B | Chinese/English |
61
+ | Qwen MoE | 2.7B | Efficient inference |
62
+ | DeepSeek-MoE | 16B | sparse training |
63
 
64
+ ## ๐Ÿ“– How to Use
 
 
 
 
65
 
66
+ 1. **Select a Model** - Choose from presets or enter custom parameters
67
+ 2. **Pick Your Engine** - Training (DeepSpeed/Megatron/FSDP) or Inference (vLLM/TGI/SGLang)
68
+ 3. **Configure** - Adjust batch size, GPUs, precision, offloading
69
+ 4. **Calculate** - Get instant memory breakdown
70
+ 5. **Export** - Generate working configs for your framework
71
 
72
+ ## ๐Ÿ’ก Example Use Cases
 
73
 
74
+ - **"Can I train a 7B model on 4x A100s?"** โ†’ Calculate and find out
75
+ - **"What's the max batch size for DeepSpeed ZeRO-3?"** โ†’ Batch optimizer tells you
76
+ - **"vLLM vs TGI - which uses less memory?"** โ†’ Compare instantly
77
+ - **"How many GPUs for 175B with Megatron?"** โ†’ Plan your cluster
78
+
79
+ ## ๐Ÿ”— Links & Resources
80
+
81
+ - **[GitHub Repository](https://github.com/George614/gpu-mem-calculator)** - Star us on GitHub! โญ
82
+ - **[Full Documentation](https://github.com/George614/gpu-mem-calculator#readme)** - Complete guide
83
+ - **[Report Issues](https://github.com/George614/gpu-mem-calculator/issues)** - Bug reports & feature requests
84
+ - **[Contributing Guide](https://github.com/George614/gpu-mem-calculator/blob/main/CONTRIBUTING.md)** - Pull requests welcome!
85
+
86
+ ## ๐Ÿ“š Technical Details
87
+
88
+ Built with:
89
+ - **FastAPI** - High-performance web framework
90
+ - **Pydantic** - Data validation and settings
91
+ - **Python 3.12** - Latest Python for maximum performance
92
+
93
+ Formulas verified against:
94
+ - [EleutherAI Transformer Math](https://blog.eleuther.ai/transformer-math/)
95
+ - [Microsoft DeepSpeed ZeRO](https://www.microsoft.com/en-us/research/blog/zero-deepspeed/)
96
+ - [NVIDIA Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
97
+
98
+ ## ๐Ÿ“Š License
99
+
100
+ MIT License - Free for commercial and personal use.
101
+
102
+ ---
103
+
104
+ **Made with โค๏ธ by the AI community**
105
+
106
+ [![GitHub stars](https://img.shields.io/github/stars/George614/gpu-mem-calculator?style=flat-square&logo=github&label=Star%20on%20GitHub)](https://github.com/George614/gpu-mem-calculator)
107