vikramp commited on
Commit
ada01d1
·
verified ·
1 Parent(s): 69a94b7

Upload folder using huggingface_hub

Browse files
logs/output_run_20260202_exfigure_disannul.log ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-02-02 06:24:39,213 - root - INFO - Run: run_20260202_exfigure_disannul
2
+ 2026-02-02 06:24:39,213 - root - INFO - Log directory: /root/tiny_moe/training_runs/Tiny_MoE/logs
3
+ 2026-02-02 06:24:39,213 - root - INFO - Output dir: /root/tiny_moe/training_runs
4
+ 2026-02-02 06:24:41,045 - jax._src.xla_bridge - INFO - Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
5
+ 2026-02-02 06:24:44,761 - root - INFO - Flax version: 0.11.1
6
+ 2026-02-02 06:24:44,761 - root - INFO - Optax version: 0.2.6
7
+ 2026-02-02 06:24:44,761 - root - INFO - Platform: gpu
8
+ 2026-02-02 06:24:44,761 - root - INFO - Num Devices: 8
9
+ 2026-02-02 06:24:44,762 - root - INFO - Devices: [CudaDevice(id=0), CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3), CudaDevice(id=4), CudaDevice(id=5), CudaDevice(id=6), CudaDevice(id=7)]
10
+ 2026-02-02 06:24:46,241 - root - INFO - Model config:
11
+ Config(name='Tiny_MoE',
12
+ dtype=<class 'jax.numpy.bfloat16'>,
13
+ vocab_size=50304,
14
+ block_size=2048,
15
+ n_layer=30,
16
+ n_embed=672,
17
+ n_glu_hidden=2048,
18
+ n_head=12,
19
+ n_kv_head=4,
20
+ n_experts=8,
21
+ init_stddev=0.02,
22
+ expert_load_factor=1.25,
23
+ aux_loss_coeff=0.01,
24
+ moe_bias=True,
25
+ mlp_bias=False,
26
+ attention_bias=False,
27
+ load_balance_loss_coeff=0.01,
28
+ z_loss_coeff=0.0005,
29
+ expert_top_k=2,
30
+ ln_epsilon=1e-05,
31
+ rope_theta=0.0001,
32
+ expert_partition_spec=PartitionSpec('devices',),
33
+ sdpa_implementation='cudnn',
34
+ value_residual_init=0.5,
35
+ logit_softcap=30.0)
36
+ 2026-02-02 06:26:34,591 - root - INFO - Parameter Count: 1,062,185,550
37
+ 2026-02-02 06:26:34,591 - root - INFO - Sharded / MoE Parameter Count: 992,210,160
38
+ 2026-02-02 06:26:34,591 - root - INFO - Replicated Parameter Count: 69,975,390
39
+ 2026-02-02 06:26:36,146 - root - INFO - Weight decay param count: 1,062,140,928
40
+ 2026-02-02 06:26:36,146 - root - INFO - Training config:
41
+ TrainerConfig(num_tokens=100000000000,
42
+ num_tokens_per_batch=262144,
43
+ mB=128,
44
+ T=2048,
45
+ max_steps=381469,
46
+ max_lr=0.008,
47
+ min_lr=0.0008,
48
+ max_grad_norm=1.0,
49
+ weight_decay=0.1,
50
+ adam_b1=0.9,
51
+ adam_b2=0.95,
52
+ warmup_steps=3814,
53
+ print_interval=100,
54
+ val=True,
55
+ val_interval=5000,
56
+ val_batches=50,
57
+ checkpoint_model=False,
58
+ checkpoint_optimizer=False,
59
+ checkpoint_interval=10000)
60
+ 2026-02-02 06:26:36,146 - root - INFO - Effective batch size per device: 16
61
+ 2026-02-02 06:26:40,697 - root - INFO - HuggingfaceDataLoader: 1030 shards (train)
62
+ 2026-02-02 06:26:40,837 - root - INFO - HuggingfaceDataLoader initialized:
63
+ ------------------------
64
+ label: train
65
+ shards: 1,030
66
+ shard size: 100,000,000
67
+ batch size: 128
68
+ block size: 2048
69
+ device rank: 1
70
+ start shard: 0
71
+ start pos: 0
72
+ ------------------------
73
+ 2026-02-02 06:26:40,837 - root - INFO - HuggingfaceDataLoader: 1 shards (val)
74
+ 2026-02-02 06:26:40,982 - root - INFO - Starting from step: 0
75
+ 2026-02-02 06:31:37,533 - root - INFO - 100 | lr: 0.0002 | loss: 7.0053 | logits loss: 6.6875 | load balance loss: 30.6791 | z loss: 20.6250 | avg iter time: 0.00ms | avg tok/sec: 0.00 | tokens processed: 26,214,400 | ETA: calculating...
76
+ 2026-02-02 06:33:03,813 - root - INFO - 200 | lr: 0.0004 | loss: 6.0123 | logits loss: 5.6875 | load balance loss: 30.3247 | z loss: 13.4375 | avg iter time: 862.73ms | avg tok/sec: 303,854.46 | tokens processed: 52,428,800 | ETA: 91h 22m
77
+ 2026-02-02 06:34:30,060 - root - INFO - 300 | lr: 0.0006 | loss: 5.4304 | logits loss: 5.1250 | load balance loss: 30.1869 | z loss: 8.1875 | avg iter time: 862.40ms | avg tok/sec: 303,970.94 | tokens processed: 78,643,200 | ETA: 91h 18m
78
+ 2026-02-02 06:35:56,300 - root - INFO - 400 | lr: 0.0008 | loss: 5.2696 | logits loss: 4.9688 | load balance loss: 30.2513 | z loss: 4.2500 | avg iter time: 862.30ms | avg tok/sec: 304,006.05 | tokens processed: 104,857,600 | ETA: 91h 16m
79
+ 2026-02-02 06:37:22,585 - root - INFO - 500 | lr: 0.0011 | loss: 4.8607 | logits loss: 4.5625 | load balance loss: 30.2636 | z loss: 3.2656 | avg iter time: 862.77ms | avg tok/sec: 303,839.54 | tokens processed: 131,072,000 | ETA: 91h 18m
80
+ 2026-02-02 06:38:48,698 - root - INFO - 600 | lr: 0.0013 | loss: 4.7315 | logits loss: 4.4375 | load balance loss: 30.1463 | z loss: 2.7031 | avg iter time: 861.03ms | avg tok/sec: 304,452.36 | tokens processed: 157,286,400 | ETA: 91h 5m
81
+ 2026-02-02 06:40:14,880 - root - INFO - 700 | lr: 0.0015 | loss: 4.6053 | logits loss: 4.3125 | load balance loss: 30.2221 | z loss: 2.1406 | avg iter time: 861.75ms | avg tok/sec: 304,199.79 | tokens processed: 183,500,800 | ETA: 91h 8m
82
+ 2026-02-02 06:41:04,649 - root - INFO - Downloading fineweb_train_000003.bin from kjj0/fineweb100B-gpt2...
83
+ 2026-02-02 06:41:04,886 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_train_000003.bin "HTTP/1.1 302 Found"
84
+ 2026-02-02 06:41:04,950 - httpx - INFO - HTTP Request: GET https://huggingface.co/api/datasets/kjj0/fineweb100B-gpt2/xet-read-token/50d1422b27e1a928440c26a8829f3f827f44ac56 "HTTP/1.1 200 OK"
85
+ 2026-02-02 06:41:41,041 - root - INFO - 800 | lr: 0.0017 | loss: 4.4269 | logits loss: 4.1250 | load balance loss: 30.1905 | z loss: 1.6875 | avg iter time: 861.54ms | avg tok/sec: 304,274.88 | tokens processed: 209,715,200 | ETA: 91h 6m
86
+ 2026-02-02 06:43:07,288 - root - INFO - 900 | lr: 0.0019 | loss: 4.4167 | logits loss: 4.1250 | load balance loss: 30.1692 | z loss: 1.4297 | avg iter time: 862.35ms | avg tok/sec: 303,987.30 | tokens processed: 235,929,600 | ETA: 91h 9m
87
+ 2026-02-02 06:44:33,285 - root - INFO - 1000 | lr: 0.0021 | loss: 4.2574 | logits loss: 3.9531 | load balance loss: 30.1724 | z loss: 0.9453 | avg iter time: 859.88ms | avg tok/sec: 304,861.78 | tokens processed: 262,144,000 | ETA: 90h 52m
88
+ 2026-02-02 06:45:59,372 - root - INFO - 1100 | lr: 0.0023 | loss: 4.3696 | logits loss: 4.0625 | load balance loss: 30.1162 | z loss: 1.0156 | avg iter time: 860.78ms | avg tok/sec: 304,544.09 | tokens processed: 288,358,400 | ETA: 90h 56m
89
+ 2026-02-02 06:46:33,587 - root - INFO - Downloading fineweb_train_000004.bin from kjj0/fineweb100B-gpt2...
90
+ 2026-02-02 06:46:33,715 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_train_000004.bin "HTTP/1.1 302 Found"
91
+ 2026-02-02 06:47:25,407 - root - INFO - 1200 | lr: 0.0025 | loss: 4.1615 | logits loss: 3.8594 | load balance loss: 30.1078 | z loss: 0.9727 | avg iter time: 860.27ms | avg tok/sec: 304,721.82 | tokens processed: 314,572,800 | ETA: 90h 52m
92
+ 2026-02-02 06:48:51,451 - root - INFO - 1300 | lr: 0.0027 | loss: 4.1895 | logits loss: 3.8906 | load balance loss: 30.1048 | z loss: 0.8711 | avg iter time: 860.34ms | avg tok/sec: 304,697.47 | tokens processed: 340,787,200 | ETA: 90h 51m
93
+ 2026-02-02 06:50:17,573 - root - INFO - 1400 | lr: 0.0029 | loss: 4.1330 | logits loss: 3.8281 | load balance loss: 30.1491 | z loss: 0.8711 | avg iter time: 861.16ms | avg tok/sec: 304,409.64 | tokens processed: 367,001,600 | ETA: 90h 54m
94
+ 2026-02-02 06:51:43,574 - root - INFO - 1500 | lr: 0.0031 | loss: 4.0840 | logits loss: 3.7812 | load balance loss: 30.1333 | z loss: 0.8672 | avg iter time: 859.94ms | avg tok/sec: 304,841.02 | tokens processed: 393,216,000 | ETA: 90h 45m
95
+ 2026-02-02 06:52:01,879 - root - INFO - Downloading fineweb_train_000005.bin from kjj0/fineweb100B-gpt2...
96
+ 2026-02-02 06:52:01,990 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_train_000005.bin "HTTP/1.1 302 Found"
97
+ 2026-02-02 06:53:10,223 - root - INFO - 1600 | lr: 0.0034 | loss: 4.2628 | logits loss: 3.9531 | load balance loss: 30.1823 | z loss: 1.0859 | avg iter time: 866.41ms | avg tok/sec: 302,563.25 | tokens processed: 419,430,400 | ETA: 91h 25m
98
+ 2026-02-02 06:54:36,398 - root - INFO - 1700 | lr: 0.0036 | loss: 4.0582 | logits loss: 3.7500 | load balance loss: 30.1749 | z loss: 0.9375 | avg iter time: 861.68ms | avg tok/sec: 304,224.60 | tokens processed: 445,644,800 | ETA: 90h 53m
99
+ 2026-02-02 06:56:02,551 - root - INFO - 1800 | lr: 0.0038 | loss: 4.0431 | logits loss: 3.7344 | load balance loss: 30.0975 | z loss: 1.0078 | avg iter time: 861.44ms | avg tok/sec: 304,308.60 | tokens processed: 471,859,200 | ETA: 90h 51m
100
+ 2026-02-02 06:57:28,720 - root - INFO - 1900 | lr: 0.0040 | loss: 4.0118 | logits loss: 3.7031 | load balance loss: 30.1188 | z loss: 0.9062 | avg iter time: 861.61ms | avg tok/sec: 304,248.34 | tokens processed: 498,073,600 | ETA: 90h 50m
101
+ 2026-02-02 06:57:31,151 - root - INFO - Downloading fineweb_train_000006.bin from kjj0/fineweb100B-gpt2...
102
+ 2026-02-02 06:57:31,285 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_train_000006.bin "HTTP/1.1 302 Found"
103
+ 2026-02-02 06:57:31,356 - httpx - INFO - HTTP Request: GET https://huggingface.co/api/datasets/kjj0/fineweb100B-gpt2/xet-read-token/50d1422b27e1a928440c26a8829f3f827f44ac56 "HTTP/1.1 200 OK"
104
+ 2026-02-02 06:58:54,835 - root - INFO - 2000 | lr: 0.0042 | loss: 3.9879 | logits loss: 3.6875 | load balance loss: 30.1081 | z loss: 1.0703 | avg iter time: 861.08ms | avg tok/sec: 304,436.01 | tokens processed: 524,288,000 | ETA: 90h 45m
105
+ 2026-02-02 07:00:21,038 - root - INFO - 2100 | lr: 0.0044 | loss: 4.0795 | logits loss: 3.7812 | load balance loss: 30.1146 | z loss: 0.9023 | avg iter time: 861.96ms | avg tok/sec: 304,124.49 | tokens processed: 550,502,400 | ETA: 90h 50m
106
+ 2026-02-02 07:01:47,171 - root - INFO - 2200 | lr: 0.0046 | loss: 4.0241 | logits loss: 3.7188 | load balance loss: 30.1241 | z loss: 0.9141 | avg iter time: 861.26ms | avg tok/sec: 304,372.88 | tokens processed: 576,716,800 | ETA: 90h 44m
107
+ 2026-02-02 07:02:59,359 - root - INFO - Downloading fineweb_train_000007.bin from kjj0/fineweb100B-gpt2...
108
+ 2026-02-02 07:02:59,472 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_train_000007.bin "HTTP/1.1 302 Found"
109
+ 2026-02-02 07:03:13,337 - root - INFO - 2300 | lr: 0.0048 | loss: 3.9629 | logits loss: 3.6562 | load balance loss: 30.1136 | z loss: 0.9219 | avg iter time: 861.56ms | avg tok/sec: 304,267.24 | tokens processed: 602,931,200 | ETA: 90h 44m
110
+ 2026-02-02 07:04:39,464 - root - INFO - 2400 | lr: 0.0050 | loss: 4.0699 | logits loss: 3.7656 | load balance loss: 30.1341 | z loss: 0.9180 | avg iter time: 861.20ms | avg tok/sec: 304,392.81 | tokens processed: 629,145,600 | ETA: 90h 40m
111
+ 2026-02-02 07:06:05,605 - root - INFO - 2500 | lr: 0.0052 | loss: 3.9481 | logits loss: 3.6406 | load balance loss: 30.1059 | z loss: 0.9219 | avg iter time: 861.32ms | avg tok/sec: 304,349.99 | tokens processed: 655,360,000 | ETA: 90h 40m
112
+ 2026-02-02 07:07:31,907 - root - INFO - 2600 | lr: 0.0055 | loss: 4.0153 | logits loss: 3.7188 | load balance loss: 30.1015 | z loss: 0.9727 | avg iter time: 862.94ms | avg tok/sec: 303,781.32 | tokens processed: 681,574,400 | ETA: 90h 48m
113
+ 2026-02-02 07:08:29,202 - root - INFO - Downloading fineweb_train_000008.bin from kjj0/fineweb100B-gpt2...
114
+ 2026-02-02 07:08:29,311 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_train_000008.bin "HTTP/1.1 302 Found"
115
+ 2026-02-02 07:08:58,705 - root - INFO - 2700 | lr: 0.0057 | loss: 3.9821 | logits loss: 3.6875 | load balance loss: 30.0977 | z loss: 0.9805 | avg iter time: 867.86ms | avg tok/sec: 302,057.22 | tokens processed: 707,788,800 | ETA: 91h 18m
116
+ 2026-02-02 07:10:24,933 - root - INFO - 2800 | lr: 0.0059 | loss: 3.9728 | logits loss: 3.6719 | load balance loss: 30.0864 | z loss: 0.9492 | avg iter time: 862.21ms | avg tok/sec: 304,036.26 | tokens processed: 734,003,200 | ETA: 90h 41m
117
+ 2026-02-02 07:11:51,073 - root - INFO - 2900 | lr: 0.0061 | loss: 3.9217 | logits loss: 3.6250 | load balance loss: 30.1051 | z loss: 0.9062 | avg iter time: 861.31ms | avg tok/sec: 304,353.24 | tokens processed: 760,217,600 | ETA: 90h 34m
118
+ 2026-02-02 07:13:17,294 - root - INFO - 3000 | lr: 0.0063 | loss: 3.8670 | logits loss: 3.5625 | load balance loss: 30.0905 | z loss: 0.8789 | avg iter time: 862.13ms | avg tok/sec: 304,065.25 | tokens processed: 786,432,000 | ETA: 90h 38m
119
+ 2026-02-02 07:13:57,622 - root - INFO - Downloading fineweb_train_000009.bin from kjj0/fineweb100B-gpt2...
120
+ 2026-02-02 07:13:57,722 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_train_000009.bin "HTTP/1.1 302 Found"
121
+ 2026-02-02 07:13:57,773 - httpx - INFO - HTTP Request: GET https://huggingface.co/api/datasets/kjj0/fineweb100B-gpt2/xet-read-token/50d1422b27e1a928440c26a8829f3f827f44ac56 "HTTP/1.1 200 OK"
122
+ 2026-02-02 07:14:43,498 - root - INFO - 3100 | lr: 0.0065 | loss: 3.9544 | logits loss: 3.6562 | load balance loss: 30.1180 | z loss: 0.9336 | avg iter time: 861.94ms | avg tok/sec: 304,132.93 | tokens processed: 812,646,400 | ETA: 90h 35m
123
+ 2026-02-02 07:16:09,652 - root - INFO - 3200 | lr: 0.0067 | loss: 3.9347 | logits loss: 3.6250 | load balance loss: 30.1562 | z loss: 1.1406 | avg iter time: 861.47ms | avg tok/sec: 304,300.05 | tokens processed: 838,860,800 | ETA: 90h 31m
124
+ 2026-02-02 07:17:35,835 - root - INFO - 3300 | lr: 0.0069 | loss: 3.9468 | logits loss: 3.6406 | load balance loss: 30.0809 | z loss: 1.0000 | avg iter time: 861.76ms | avg tok/sec: 304,196.86 | tokens processed: 865,075,200 | ETA: 90h 31m
125
+ 2026-02-02 07:19:01,981 - root - INFO - 3400 | lr: 0.0071 | loss: 3.9490 | logits loss: 3.6406 | load balance loss: 30.1709 | z loss: 1.0547 | avg iter time: 861.35ms | avg tok/sec: 304,341.10 | tokens processed: 891,289,600 | ETA: 90h 27m
126
+ 2026-02-02 07:19:26,778 - root - INFO - Downloading fineweb_train_000010.bin from kjj0/fineweb100B-gpt2...
127
+ 2026-02-02 07:19:26,869 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_train_000010.bin "HTTP/1.1 302 Found"
128
+ 2026-02-02 07:20:28,102 - root - INFO - 3500 | lr: 0.0073 | loss: 3.9102 | logits loss: 3.6094 | load balance loss: 30.1205 | z loss: 0.8867 | avg iter time: 861.10ms | avg tok/sec: 304,429.10 | tokens processed: 917,504,000 | ETA: 90h 24m
129
+ 2026-02-02 07:21:54,251 - root - INFO - 3600 | lr: 0.0076 | loss: 3.9131 | logits loss: 3.6094 | load balance loss: 30.2483 | z loss: 1.2969 | avg iter time: 861.41ms | avg tok/sec: 304,321.26 | tokens processed: 943,718,400 | ETA: 90h 24m
130
+ 2026-02-02 07:23:20,374 - root - INFO - 3700 | lr: 0.0078 | loss: 3.8370 | logits loss: 3.5312 | load balance loss: 30.1589 | z loss: 0.9492 | avg iter time: 861.13ms | avg tok/sec: 304,418.04 | tokens processed: 969,932,800 | ETA: 90h 21m
131
+ 2026-02-02 07:24:47,103 - root - INFO - 3800 | lr: 0.0080 | loss: 3.7491 | logits loss: 3.4531 | load balance loss: 30.1284 | z loss: 1.0391 | avg iter time: 867.18ms | avg tok/sec: 302,294.53 | tokens processed: 996,147,200 | ETA: 90h 58m
132
+ 2026-02-02 07:24:55,611 - root - INFO - Downloading fineweb_train_000011.bin from kjj0/fineweb100B-gpt2...
133
+ 2026-02-02 07:24:55,825 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_train_000011.bin "HTTP/1.1 302 Found"
134
+ 2026-02-02 07:26:13,249 - root - INFO - 3900 | lr: 0.0079 | loss: 3.9064 | logits loss: 3.6094 | load balance loss: 30.1117 | z loss: 0.9336 | avg iter time: 861.36ms | avg tok/sec: 304,337.52 | tokens processed: 1,022,361,600 | ETA: 90h 20m
135
+ 2026-02-02 07:27:39,357 - root - INFO - 4000 | lr: 0.0078 | loss: 3.8833 | logits loss: 3.5781 | load balance loss: 30.1254 | z loss: 0.9961 | avg iter time: 861.00ms | avg tok/sec: 304,463.13 | tokens processed: 1,048,576,000 | ETA: 90h 16m
136
+ 2026-02-02 07:29:05,551 - root - INFO - 4100 | lr: 0.0077 | loss: 3.8721 | logits loss: 3.5781 | load balance loss: 30.1071 | z loss: 0.9375 | avg iter time: 861.83ms | avg tok/sec: 304,170.29 | tokens processed: 1,074,790,400 | ETA: 90h 20m
137
+ 2026-02-02 07:30:24,687 - root - INFO - Downloading fineweb_train_000012.bin from kjj0/fineweb100B-gpt2...
138
+ 2026-02-02 07:30:24,783 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_train_000012.bin "HTTP/1.1 302 Found"
139
+ 2026-02-02 07:30:24,838 - httpx - INFO - HTTP Request: GET https://huggingface.co/api/datasets/kjj0/fineweb100B-gpt2/xet-read-token/50d1422b27e1a928440c26a8829f3f827f44ac56 "HTTP/1.1 200 OK"
140
+ 2026-02-02 07:30:31,800 - root - INFO - 4200 | lr: 0.0076 | loss: 3.9316 | logits loss: 3.6250 | load balance loss: 30.1386 | z loss: 0.9453 | avg iter time: 862.36ms | avg tok/sec: 303,983.13 | tokens processed: 1,101,004,800 | ETA: 90h 22m
141
+ 2026-02-02 07:31:58,034 - root - INFO - 4300 | lr: 0.0075 | loss: 3.9004 | logits loss: 3.5938 | load balance loss: 30.1101 | z loss: 0.9062 | avg iter time: 862.23ms | avg tok/sec: 304,031.50 | tokens processed: 1,127,219,200 | ETA: 90h 20m
142
+ 2026-02-02 07:32:14,249 - root - WARNING - Received KeyboardInterrupt. Exiting...
143
+ 2026-02-02 07:32:14,528 - root - INFO - Training completed.
logs/run_20260202_exfigure_disannul_train.csv ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ step,lr,loss,load_balance_loss,z_loss,time,tokens_processed,tokens_per_sec
2
+ 100,0.00021185107,7.005314350128174,30.67913818359375,20.625,0,26214400,0
3
+ 200,0.0004216046,6.012259006500244,30.324657440185547,13.4375,862.7288317680359,52428800,303854.45617109427
4
+ 300,0.0006313581,5.430443286895752,30.186866760253906,8.1875,862.3982191085815,78643200,303970.94311136834
5
+ 400,0.0008411117,5.269588947296143,30.251304626464844,4.25,862.2986221313477,104857600,304006.05227926426
6
+ 500,0.0010508653,4.860705375671387,30.26357650756836,3.265625,862.7711868286133,131072000,303839.53938424005
7
+ 600,0.0012606188,4.731544017791748,30.146270751953125,2.703125,861.0345578193665,157286400,304452.3563187743
8
+ 700,0.0014703723,4.605316638946533,30.22212791442871,2.140625,861.7494535446167,183500800,304199.78674976627
9
+ 800,0.0016801258,4.426919937133789,30.190507888793945,1.6875,861.5367913246155,209715200,304274.8755940565
10
+ 900,0.0018898793,4.4166951179504395,30.16924285888672,1.4296875,862.3518109321594,235929600,303987.30155925034
11
+ 1000,0.002099633,4.257418155670166,30.172353744506836,0.9453125,859.8781967163086,262144000,304861.7827514083
12
+ 1100,0.0023093864,4.369624614715576,30.116167068481445,1.015625,860.7752108573914,288358400,304544.08618353045
13
+ 1200,0.00251914,4.1615118980407715,30.10776710510254,0.97265625,860.2731609344482,314572800,304721.8161673825
14
+ 1300,0.0027288937,4.189517974853516,30.104774475097656,0.87109375,860.341899394989,340787200,304697.4699062609
15
+ 1400,0.002938647,4.132986545562744,30.14911651611328,0.87109375,861.155378818512,367001600,304409.641335175
16
+ 1500,0.0031484007,4.084044456481934,30.133291244506836,0.8671875,859.9367547035217,393216000,304841.0229777639
17
+ 1600,0.003358154,4.262795448303223,30.182254791259766,1.0859375,866.4105868339539,419430400,302563.2465525718
18
+ 1700,0.0035679077,4.05819845199585,30.174877166748047,0.9375,861.6791534423828,445644800,304224.6048923691
19
+ 1800,0.0037776614,4.043099403381348,30.09752082824707,1.0078125,861.4413237571716,471859200,304308.5962682407
20
+ 1900,0.0039874148,4.011845111846924,30.118757247924805,0.90625,861.6119241714478,498073600,304248.3427235361
21
+ 2000,0.004197168,3.987912178039551,30.10809898376465,1.0703125,861.0807991027832,524288000,304436.0067872203
22
+ 2100,0.004406922,4.079455375671387,30.11456871032715,0.90234375,861.9628071784973,550502400,304124.4910068545
23
+ 2200,0.0046166754,4.024138450622559,30.12409210205078,0.9140625,861.2593936920166,576716800,304372.87757901865
24
+ 2300,0.004826429,3.962876558303833,30.113624572753906,0.921875,861.5584063529968,602931200,304267.2418573032
25
+ 2400,0.0050361827,4.06994104385376,30.13414764404297,0.91796875,861.2029957771301,629145600,304392.81015673565
26
+ 2500,0.005245936,3.9481170177459717,30.105880737304688,0.921875,861.3241577148438,655360000,304349.99140798196
27
+ 2600,0.0054556895,4.015290260314941,30.101520538330078,0.97265625,862.9365372657776,681574400,303781.31957490835
28
+ 2700,0.005665443,3.982128620147705,30.097726821899414,0.98046875,867.8620505332947,707788800,302057.2219270499
29
+ 2800,0.005875197,3.9728376865386963,30.086423873901367,0.94921875,862.2129440307617,734003200,304036.26136079826
30
+ 2900,0.00608495,3.921653985977173,30.10512351989746,0.90625,861.3149642944336,760217600,304353.2399495014
31
+ 3000,0.0062947036,3.8670413494110107,30.090465545654297,0.87890625,862.1307301521301,786432000,304065.2546438549
32
+ 3100,0.0065044574,3.9544262886047363,30.117971420288086,0.93359375,861.938886642456,812646400,304132.93107257254
33
+ 3200,0.006714211,3.9347493648529053,30.15625,1.140625,861.4655065536499,838860800,304300.0538103081
34
+ 3300,0.0069239642,3.9467906951904297,30.080856323242188,1.0,861.7577314376831,865075200,304196.86466016533
35
+ 3400,0.0071337176,3.948972463607788,30.170944213867188,1.0546875,861.3493299484253,891289600,304341.09702702885
36
+ 3500,0.0073434715,3.9102299213409424,30.120487213134766,0.88671875,861.100344657898,917504000,304429.09659285506
37
+ 3600,0.007553225,3.913080930709839,30.248336791992188,1.296875,861.4054799079895,943718400,304321.2588199471
38
+ 3700,0.0077629783,3.8369827270507812,30.158864974975586,0.94921875,861.1316132545471,969932800,304418.0424514403
39
+ 3800,0.007972732,3.7490692138671875,30.12843132019043,1.0390625,867.1807551383972,996147200,302294.5313842479
40
+ 3900,0.00791029,3.906449317932129,30.111663818359375,0.93359375,861.359453201294,1022361600,304337.52021380403
41
+ 4000,0.0078108106,3.8832788467407227,30.12540054321289,0.99609375,861.0040879249573,1048576000,304463.13051982597
42
+ 4100,0.0077149924,3.872138023376465,30.10710334777832,0.9375,861.8330144882202,1074790400,304170.29238044244
43
+ 4200,0.007622615,3.9315664768218994,30.138641357421875,0.9453125,862.3636531829834,1101004800,303983.1271093427
44
+ 4300,0.00753348,3.9004430770874023,30.1101131439209,0.90625,862.2264409065247,1127219200,304031.5021241844
logs/run_20260202_exfigure_disannul_train.png ADDED
logs/run_20260202_exfigure_disannul_val.csv ADDED
@@ -0,0 +1 @@
 
 
1
+ step,loss,logits_loss