Spaces:
Sleeping
Sleeping
George Yang commited on
Commit ยท
dd750c0
1
Parent(s): 60d4d37
SEO: Optimize Space README for discoverability and traffic
Browse files- Add comprehensive tags (llm, gpu, deep-learning, pytorch, etc.)
- Set pinned: true for better visibility
- Add detailed feature descriptions and use cases
- Include model comparison table
- Add multiple CTAs to GitHub repo
- Add technical details and authoritative references
- Improve formatting with emojis and badges
- Add 'Made with โค๏ธ by the AI community' footer
README.md
CHANGED
|
@@ -4,59 +4,104 @@ emoji: ๐ฎ
|
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
| 6 |
sdk: docker
|
| 7 |
-
pinned:
|
| 8 |
license: mit
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
-
# GPU Memory Calculator
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
|
| 18 |
-
- **Inference Memory Calculation**: Estimate memory requirements for HuggingFace Transformers, vLLM, TGI, TensorRT-LLM, and SGLang
|
| 19 |
-
- **Multi-Node Support**: Calculate network overhead for distributed training across multiple nodes
|
| 20 |
-
- **Model Presets**: Pre-configured settings for popular models (LLaMA 2, GPT-3, Mixtral, GLM, Qwen, DeepSeek-MoE)
|
| 21 |
-
- **Configuration Export**: Generate configs for Accelerate, Lightning, Axolotl, DeepSpeed, YAML, and JSON
|
| 22 |
-
- **Batch Size Optimization**: Automatically find the maximum batch size that fits in GPU memory
|
| 23 |
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
-
|
| 27 |
-
- DeepSpeed ZeRO (Stages 0-3) with CPU/NVMe offloading
|
| 28 |
-
- Megatron-LM (Tensor + Pipeline Parallelism)
|
| 29 |
-
- PyTorch FSDP (Fully Sharded Data Parallel)
|
| 30 |
-
- Megatron-LM + DeepSpeed (Hybrid)
|
| 31 |
|
| 32 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
-
|
| 37 |
-
-
|
| 38 |
-
-
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
-
|
| 43 |
-
2. **Choose training/inference engine** and adjust parameters
|
| 44 |
-
3. **Calculate** memory requirements instantly
|
| 45 |
-
4. **Export** configurations to your preferred framework
|
| 46 |
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
-
|
| 50 |
-
- Optimizing batch sizes for your hardware
|
| 51 |
-
- Comparing memory efficiency across engines
|
| 52 |
-
- Estimating KV cache memory for inference
|
| 53 |
-
- Calculating multi-node network overhead
|
| 54 |
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
-
|
| 58 |
-
- [Documentation](https://github.com/George614/gpu-mem-calculator/blob/main/README.md)
|
| 59 |
|
| 60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
-
MIT License - see [LICENSE](https://github.com/George614/gpu-mem-calculator/blob/main/LICENSE) for details.
|
|
|
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
| 6 |
sdk: docker
|
| 7 |
+
pinned: true
|
| 8 |
license: mit
|
| 9 |
+
tags: [llm, gpu, deep-learning, pytorch, training, inference, memory-calculator, deepspeed, megatron, fsdp, vllm, quantization, machine-learning, ai, tools]
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# ๐ฎ GPU Memory Calculator for LLM Training & Inference
|
| 13 |
|
| 14 |
+
**Instantly calculate GPU memory requirements for training and running Large Language Models.** Plan your infrastructure, avoid OOM errors, and optimize costs before you start.
|
| 15 |
|
| 16 |
+
[](https://github.com/George614/gpu-mem-calculator)
|
| 17 |
+
[](https://github.com/George614/gpu-mem-calculator/issues)
|
| 18 |
+
[](https://opensource.org/licenses/MIT)
|
| 19 |
|
| 20 |
+
## ๐ Why Use This Tool?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
+
- **๐ฐ Save Money** - Know exactly what GPUs you need before spending thousands
|
| 23 |
+
- **โก Avoid OOM** - Validate your config fits in memory before training
|
| 24 |
+
- **๐ Compare Strategies** - DeepSpeed vs Megatron vs FSDP at a glance
|
| 25 |
+
- **๐ฏ Plan Infrastructure** - From 7B to 175B+ parameter models
|
| 26 |
+
- **โ๏ธ Export Configs** - Generate working configs for your training framework
|
| 27 |
|
| 28 |
+
## โจ Features
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
### Training Memory Calculation
|
| 31 |
+
Calculate memory for all major training frameworks:
|
| 32 |
+
- **PyTorch DDP** - Baseline distributed training
|
| 33 |
+
- **DeepSpeed ZeRO** (Stages 0-3) with CPU/NVMe offloading
|
| 34 |
+
- **Megatron-LM** - Tensor + Pipeline parallelism
|
| 35 |
+
- **PyTorch FSDP** - Fully sharded data parallel
|
| 36 |
+
- **Megatron + DeepSpeed** - Hybrid approach
|
| 37 |
|
| 38 |
+
### Inference Memory Estimation
|
| 39 |
+
Optimize your deployment with:
|
| 40 |
+
- **HuggingFace Transformers** - Baseline inference
|
| 41 |
+
- **vLLM** - PagedAttention optimization
|
| 42 |
+
- **TGI** - Text Generation Inference
|
| 43 |
+
- **TensorRT-LLM** - Maximum throughput
|
| 44 |
+
- **SGLang** - RadixAttention caching
|
| 45 |
|
| 46 |
+
### Smart Features
|
| 47 |
+
- ๐ฏ **Model Presets** - LLaMA 2, GPT-3, Mixtral, GLM, Qwen, DeepSeek-MoE
|
| 48 |
+
- ๐ฆ **Export Configs** - Accelerate, Lightning, Axolotl, DeepSpeed, YAML, JSON
|
| 49 |
+
- ๐ข **Batch Optimizer** - Auto-find max batch size for your hardware
|
| 50 |
+
- ๐ **Multi-Node** - Calculate network overhead for distributed training
|
| 51 |
+
- ๐พ **KV Cache** - Quantization options (INT4/INT8/FP8/None)
|
| 52 |
|
| 53 |
+
## ๐ฏ Supported Models
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
+
| Model | Parameters | Use Case |
|
| 56 |
+
|-------|-----------|----------|
|
| 57 |
+
| LLaMA 2 | 7B, 13B, 70B | General purpose |
|
| 58 |
+
| GPT-3 | 175B | Large scale training |
|
| 59 |
+
| Mixtral 8x7B | 47B | Mixture of Experts |
|
| 60 |
+
| GLM-4 | 9B - 355B | Chinese/English |
|
| 61 |
+
| Qwen MoE | 2.7B | Efficient inference |
|
| 62 |
+
| DeepSeek-MoE | 16B | sparse training |
|
| 63 |
|
| 64 |
+
## ๐ How to Use
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
+
1. **Select a Model** - Choose from presets or enter custom parameters
|
| 67 |
+
2. **Pick Your Engine** - Training (DeepSpeed/Megatron/FSDP) or Inference (vLLM/TGI/SGLang)
|
| 68 |
+
3. **Configure** - Adjust batch size, GPUs, precision, offloading
|
| 69 |
+
4. **Calculate** - Get instant memory breakdown
|
| 70 |
+
5. **Export** - Generate working configs for your framework
|
| 71 |
|
| 72 |
+
## ๐ก Example Use Cases
|
|
|
|
| 73 |
|
| 74 |
+
- **"Can I train a 7B model on 4x A100s?"** โ Calculate and find out
|
| 75 |
+
- **"What's the max batch size for DeepSpeed ZeRO-3?"** โ Batch optimizer tells you
|
| 76 |
+
- **"vLLM vs TGI - which uses less memory?"** โ Compare instantly
|
| 77 |
+
- **"How many GPUs for 175B with Megatron?"** โ Plan your cluster
|
| 78 |
+
|
| 79 |
+
## ๐ Links & Resources
|
| 80 |
+
|
| 81 |
+
- **[GitHub Repository](https://github.com/George614/gpu-mem-calculator)** - Star us on GitHub! โญ
|
| 82 |
+
- **[Full Documentation](https://github.com/George614/gpu-mem-calculator#readme)** - Complete guide
|
| 83 |
+
- **[Report Issues](https://github.com/George614/gpu-mem-calculator/issues)** - Bug reports & feature requests
|
| 84 |
+
- **[Contributing Guide](https://github.com/George614/gpu-mem-calculator/blob/main/CONTRIBUTING.md)** - Pull requests welcome!
|
| 85 |
+
|
| 86 |
+
## ๐ Technical Details
|
| 87 |
+
|
| 88 |
+
Built with:
|
| 89 |
+
- **FastAPI** - High-performance web framework
|
| 90 |
+
- **Pydantic** - Data validation and settings
|
| 91 |
+
- **Python 3.12** - Latest Python for maximum performance
|
| 92 |
+
|
| 93 |
+
Formulas verified against:
|
| 94 |
+
- [EleutherAI Transformer Math](https://blog.eleuther.ai/transformer-math/)
|
| 95 |
+
- [Microsoft DeepSpeed ZeRO](https://www.microsoft.com/en-us/research/blog/zero-deepspeed/)
|
| 96 |
+
- [NVIDIA Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
|
| 97 |
+
|
| 98 |
+
## ๐ License
|
| 99 |
+
|
| 100 |
+
MIT License - Free for commercial and personal use.
|
| 101 |
+
|
| 102 |
+
---
|
| 103 |
+
|
| 104 |
+
**Made with โค๏ธ by the AI community**
|
| 105 |
+
|
| 106 |
+
[](https://github.com/George614/gpu-mem-calculator)
|
| 107 |
|
|
|