vikramp commited on
Commit
c5804a2
·
verified ·
1 Parent(s): af24e34

Upload folder using huggingface_hub

Browse files
logs/output_run_20260201_scabinus_rufflike.log ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-02-01 18:25:14,635 - root - INFO - Run: run_20260201_scabinus_rufflike
2
+ 2026-02-01 18:25:14,635 - root - INFO - Log directory: /root/tiny_moe/training_runs/Tiny_MoE/logs
3
+ 2026-02-01 18:25:14,635 - root - INFO - Output dir: /root/tiny_moe/training_runs
4
+ 2026-02-01 18:25:16,928 - jax._src.xla_bridge - INFO - Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
5
+ 2026-02-01 18:25:20,480 - root - INFO - Flax version: 0.11.1
6
+ 2026-02-01 18:25:20,481 - root - INFO - Optax version: 0.2.6
7
+ 2026-02-01 18:25:20,481 - root - INFO - Platform: gpu
8
+ 2026-02-01 18:25:20,481 - root - INFO - Num Devices: 8
9
+ 2026-02-01 18:25:20,481 - root - INFO - Devices: [CudaDevice(id=0), CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3), CudaDevice(id=4), CudaDevice(id=5), CudaDevice(id=6), CudaDevice(id=7)]
10
+ 2026-02-01 18:25:22,700 - root - INFO - Model config:
11
+ Config(name='Tiny_MoE',
12
+ dtype=<class 'jax.numpy.bfloat16'>,
13
+ vocab_size=50304,
14
+ block_size=2048,
15
+ n_layer=30,
16
+ n_embed=672,
17
+ n_glu_hidden=2048,
18
+ n_head=12,
19
+ n_kv_head=4,
20
+ n_experts=8,
21
+ init_stddev=0.02,
22
+ expert_load_factor=1.25,
23
+ aux_loss_coeff=0.01,
24
+ moe_bias=True,
25
+ mlp_bias=False,
26
+ attention_bias=False,
27
+ load_balance_loss_coeff=0.01,
28
+ z_loss_coeff=0.0005,
29
+ expert_top_k=2,
30
+ ln_epsilon=1e-05,
31
+ rope_theta=0.0001,
32
+ expert_partition_spec=PartitionSpec('devices',),
33
+ sdpa_implementation='cudnn',
34
+ value_residual_init=0.5,
35
+ logit_softcap=30.0)
36
+ 2026-02-01 18:26:45,603 - root - INFO - Parameter Count: 1,062,182,190
37
+ 2026-02-01 18:26:45,603 - root - INFO - Sharded / MoE Parameter Count: 992,210,160
38
+ 2026-02-01 18:26:45,603 - root - INFO - Replicated Parameter Count: 69,972,030
39
+ 2026-02-01 18:26:46,788 - root - INFO - Weight decay param count: 1,062,140,928
40
+ 2026-02-01 18:26:46,789 - root - INFO - Training config:
41
+ TrainerConfig(num_tokens=100000000000,
42
+ num_tokens_per_batch=262144,
43
+ mB=128,
44
+ T=2048,
45
+ max_steps=381469,
46
+ max_lr=0.006,
47
+ min_lr=0.0006000000000000001,
48
+ max_grad_norm=1.0,
49
+ weight_decay=0.1,
50
+ adam_b1=0.9,
51
+ adam_b2=0.95,
52
+ warmup_steps=3814,
53
+ print_interval=100,
54
+ val=True,
55
+ val_interval=5000,
56
+ val_batches=50,
57
+ checkpoint_model=False,
58
+ checkpoint_optimizer=False,
59
+ checkpoint_interval=10000)
60
+ 2026-02-01 18:26:46,789 - root - INFO - Effective batch size per device: 16
61
+ 2026-02-01 18:26:50,172 - root - INFO - ModdedNanoGPTDataLoader: 1030 shards (train)
62
+ 2026-02-01 18:26:50,255 - root - INFO - HuggingfaceDataLoader initialized:
63
+ ------------------------
64
+ label: train
65
+ shards: 1,030
66
+ shard size: 100,000,000
67
+ batch size: 128
68
+ block size: 2048
69
+ device rank: 1
70
+ start shard: 0
71
+ start pos: 0
72
+ ------------------------
73
+ 2026-02-01 18:26:50,255 - root - INFO - ModdedNanoGPTDataLoader: 1 shards (val)
74
+ 2026-02-01 18:26:50,342 - root - INFO - Starting from step: 0
75
+ 2026-02-01 18:27:58,183 - root - INFO - 0 | lr: 0.0000 | loss: 13.1395 | logits loss: 12.7500 | load balance loss: 30.1163 | z loss: 146.0000 | avg iter time: 0.00ms | avg tok/sec: 0.00 | tokens processed: 262,144
76
+ 2026-02-01 18:30:29,724 - root - INFO - 100 | lr: 0.0002 | loss: 6.9586 | logits loss: 6.6562 | load balance loss: 30.5754 | z loss: 17.6250 | avg iter time: 1507.90ms | avg tok/sec: 173,847.19 | tokens processed: 26,476,544
77
+ 2026-02-01 18:32:02,437 - root - INFO - 200 | lr: 0.0003 | loss: 6.0919 | logits loss: 5.7812 | load balance loss: 30.2222 | z loss: 11.2500 | avg iter time: 919.43ms | avg tok/sec: 285,116.48 | tokens processed: 52,690,944
78
+ 2026-02-01 18:33:35,073 - root - INFO - 300 | lr: 0.0005 | loss: 5.5857 | logits loss: 5.2812 | load balance loss: 30.2130 | z loss: 5.4062 | avg iter time: 918.76ms | avg tok/sec: 285,322.74 | tokens processed: 78,905,344
79
+ 2026-02-01 18:35:07,966 - root - INFO - 400 | lr: 0.0006 | loss: 5.3367 | logits loss: 5.0312 | load balance loss: 30.1679 | z loss: 4.0000 | avg iter time: 921.32ms | avg tok/sec: 284,529.41 | tokens processed: 105,119,744
80
+ 2026-02-01 18:36:40,724 - root - INFO - 500 | lr: 0.0008 | loss: 5.0200 | logits loss: 4.7188 | load balance loss: 30.1561 | z loss: 3.6094 | avg iter time: 919.95ms | avg tok/sec: 284,954.20 | tokens processed: 131,334,144
81
+ 2026-02-01 18:38:13,288 - root - INFO - 600 | lr: 0.0009 | loss: 4.7637 | logits loss: 4.4688 | load balance loss: 30.2110 | z loss: 3.5938 | avg iter time: 920.01ms | avg tok/sec: 284,936.48 | tokens processed: 157,548,544
82
+ 2026-02-01 18:39:45,832 - root - INFO - 700 | lr: 0.0011 | loss: 4.6500 | logits loss: 4.3438 | load balance loss: 30.1683 | z loss: 2.2344 | avg iter time: 917.79ms | avg tok/sec: 285,624.24 | tokens processed: 183,762,944
83
+ 2026-02-01 18:40:05,194 - root - WARNING - Received KeyboardInterrupt. Exiting...
84
+ 2026-02-01 18:40:05,450 - root - INFO - Training completed.
logs/run_20260201_scabinus_rufflike_train.csv ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ step,lr,loss,load_balance_loss,z_loss,time,tokens_processed,tokens_per_sec
2
+ 0,1.5731515e-06,13.139485359191895,30.116331100463867,146.0,0,262144,0
3
+ 100,0.0001588883,6.958637237548828,30.575374603271484,17.625,1507.8989911079407,26476544,173847.1884031089
4
+ 200,0.00031620346,6.091878890991211,30.222169876098633,11.25,919.4277381896973,52690944,285116.4796443352
5
+ 300,0.00047351862,5.585726737976074,30.21299171447754,5.40625,918.7630677223206,78905344,285322.74446977256
6
+ 400,0.0006308338,5.336730480194092,30.167863845825195,4.0,921.3248038291931,105119744,284529.4069045813
7
+ 500,0.0007881489,5.019951820373535,30.156095504760742,3.609375,919.9513387680054,131334144,284954.20241581695
8
+ 600,0.00094546407,4.763744831085205,30.211027145385742,3.59375,920.0085520744324,157548544,284936.481741304
9
+ 700,0.0011027792,4.650022506713867,30.1683406829834,2.234375,917.7932596206665,183762944,285624.2375416298
logs/run_20260201_scabinus_rufflike_train.png ADDED
logs/run_20260201_scabinus_rufflike_val.csv ADDED
@@ -0,0 +1 @@
 
 
1
+ step,loss,logits_loss