joeddav commited on
Commit
4d1b204
·
1 Parent(s): 133beb7

Polish README for public repo

Browse files
Files changed (1) hide show
  1. README.md +15 -4
README.md CHANGED
@@ -13,6 +13,21 @@ short_description: "[WIP] Interactive visualization of an LLM training cluster"
13
 
14
  Interactive workbench for exploring how large-model training layouts map onto GPU clusters.
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  Current WIP scope:
17
 
18
  - compute-backed memory, communication, and throughput estimates
@@ -20,10 +35,6 @@ Current WIP scope:
20
  - editable model, cluster, training, and parallelism controls
21
  - built-in OLMo 3 32B and Trinity Large 400B starting points
22
 
23
- Temporary note:
24
-
25
- - the Llama 3.1 405B example is hidden from the UI while its training recipe is being reworked
26
-
27
  ## Stack
28
 
29
  - React 19 + TypeScript
 
13
 
14
  Interactive workbench for exploring how large-model training layouts map onto GPU clusters.
15
 
16
+ Live demo: https://huggingface.co/spaces/joeddav/illustrated-cluster
17
+
18
+ This project is meant to make training-parallelism tradeoffs legible:
19
+
20
+ - per-GPU memory pressure
21
+ - tensor / pipeline / context / expert communication
22
+ - pipeline bubbles and throughput estimates
23
+ - physical placement across nodes and racks
24
+
25
+ Status:
26
+
27
+ - estimates are directional, not production-grade
28
+ - the app is still a WIP and may contain bugs or logical errors
29
+ - the Llama 3.1 405B example is temporarily hidden while its training recipe is being reworked
30
+
31
  Current WIP scope:
32
 
33
  - compute-backed memory, communication, and throughput estimates
 
35
  - editable model, cluster, training, and parallelism controls
36
  - built-in OLMo 3 32B and Trinity Large 400B starting points
37
 
 
 
 
 
38
  ## Stack
39
 
40
  - React 19 + TypeScript