+ +
+ +
+

Training Configuration

+ + +
+

Model Settings

+
+ + +
+ +
+
+ + +
+ +
+ + + Enter model size as number with optional suffix: 7B, 7000M, or 7000000000 +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+
+
+ + +
+

Mixture of Experts (MoE)

+
+ +
+ + +
+ + +
+

Training Settings

+
+
+ + + +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+
+
+ + +
+

Parallelism

+
+
+ + +
+ +
+ + +
+ +
+ + +
+ +
+ +
+
+

Effective GPUs: 8

+
+ + +
+

Training Engine

+
+ + +
+ +
+ + +
+ + +
+ +
+ + +
+ +
+ + +
+ + +
+ +
+ + + + + + + + +
+ + +
+ +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+
+
+
+ + +
+

Hardware

+
+
+ + +
+ +
+ + + +
+
+
+ + +
+ + +
+
+ + +
+

Results

+ +
+

Memory Breakdown

+
+ Per GPU: + -- GB +
+
+ Total All GPUs: + -- GB +
+
+ CPU Memory: + -- GB +
+
+ +
+

Component Breakdown

+
+ Model Parameters: + -- GB +
+
+ Gradients: + -- GB +
+
+ Optimizer States: + -- GB +
+
+ Activations: + -- GB +
+
+ Overhead: + -- GB +
+ + +
+
+
+
+
+
+
+ Params + Grads + Opt + Act +
+
+ +
+

Feasibility

+
+ Status: + -- +
+
+ Utilization: + --% +
+ +
+ +
+

Formula Explanation

+
+

Run a calculation to see the formula breakdown.

+
+ + + +
+ +
+ + + +
+
+
+ + + + + + + +