vikramp commited on
Commit
d6a5846
·
verified ·
1 Parent(s): ada01d1

Upload folder using huggingface_hub

Browse files
logs/output_run_20260202_themata_traplike.log ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-02-02 09:19:24,528 - root - INFO - Run: run_20260202_themata_traplike
2
+ 2026-02-02 09:19:24,528 - root - INFO - Log directory: /root/tiny_moe/training_runs/Tiny_MoE/logs
3
+ 2026-02-02 09:19:24,528 - root - INFO - Output dir: /root/tiny_moe/training_runs
4
+ 2026-02-02 09:19:26,963 - jax._src.xla_bridge - INFO - Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
5
+ 2026-02-02 09:19:30,311 - root - INFO - Flax version: 0.11.1
6
+ 2026-02-02 09:19:30,311 - root - INFO - Optax version: 0.2.6
7
+ 2026-02-02 09:19:30,311 - root - INFO - Platform: gpu
8
+ 2026-02-02 09:19:30,311 - root - INFO - Num Devices: 8
9
+ 2026-02-02 09:19:30,311 - root - INFO - Devices: [CudaDevice(id=0), CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3), CudaDevice(id=4), CudaDevice(id=5), CudaDevice(id=6), CudaDevice(id=7)]
10
+ 2026-02-02 09:19:31,206 - root - INFO - Model config:
11
+ Config(name='Tiny_MoE',
12
+ dtype=<class 'jax.numpy.bfloat16'>,
13
+ vocab_size=50304,
14
+ block_size=2048,
15
+ n_layer=30,
16
+ n_embed=672,
17
+ n_glu_hidden=2048,
18
+ n_head=12,
19
+ n_kv_head=4,
20
+ n_experts=8,
21
+ init_stddev=0.02,
22
+ expert_load_factor=1.25,
23
+ aux_loss_coeff=0.01,
24
+ moe_bias=True,
25
+ mlp_bias=False,
26
+ attention_bias=False,
27
+ load_balance_loss_coeff=0.01,
28
+ z_loss_coeff=0.0005,
29
+ expert_top_k=2,
30
+ ln_epsilon=1e-05,
31
+ rope_theta=0.0001,
32
+ expert_partition_spec=PartitionSpec('devices',),
33
+ sdpa_implementation='cudnn',
34
+ window_size=(-1, -1),
35
+ value_residual_init=0.5,
36
+ logit_softcap=30.0)
37
+ 2026-02-02 09:21:13,758 - root - INFO - Parameter Count: 1,062,185,550
38
+ 2026-02-02 09:21:13,758 - root - INFO - Sharded / MoE Parameter Count: 992,210,160
39
+ 2026-02-02 09:21:13,758 - root - INFO - Replicated Parameter Count: 69,975,390
40
+ 2026-02-02 09:21:15,215 - root - INFO - Weight decay param count: 1,062,140,928
41
+ 2026-02-02 09:21:15,215 - root - INFO - Training config:
42
+ TrainerConfig(num_tokens=100000000000,
43
+ num_tokens_per_batch=262144,
44
+ mB=128,
45
+ T=2048,
46
+ max_steps=381469,
47
+ max_lr=0.008,
48
+ min_lr=0.0008,
49
+ max_grad_norm=1.0,
50
+ weight_decay=0.1,
51
+ adam_b1=0.9,
52
+ adam_b2=0.95,
53
+ warmup_steps=3814,
54
+ print_interval=100,
55
+ val=True,
56
+ val_interval=5000,
57
+ val_batches=50,
58
+ checkpoint_model=False,
59
+ checkpoint_optimizer=False,
60
+ checkpoint_interval=10000)
61
+ 2026-02-02 09:21:15,215 - root - INFO - Effective batch size per device: 16
62
+ 2026-02-02 09:21:20,337 - root - INFO - HuggingfaceDataLoader: 1030 shards (train)
63
+ 2026-02-02 09:21:20,337 - root - INFO - Downloading fineweb_train_000001.bin from kjj0/fineweb100B-gpt2...
64
+ 2026-02-02 09:21:20,864 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_train_000001.bin "HTTP/1.1 302 Found"
65
+ 2026-02-02 09:21:20,891 - httpx - INFO - HTTP Request: GET https://huggingface.co/api/datasets/kjj0/fineweb100B-gpt2/xet-read-token/50d1422b27e1a928440c26a8829f3f827f44ac56 "HTTP/1.1 200 OK"
66
+ 2026-02-02 09:21:22,549 - root - INFO - HuggingfaceDataLoader initialized:
67
+ ------------------------
68
+ label: train
69
+ shards: 1,030
70
+ shard size: 100,000,000
71
+ batch size: 128
72
+ block size: 2048
73
+ device rank: 1
74
+ start shard: 0
75
+ start pos: 0
76
+ ------------------------
77
+ 2026-02-02 09:21:22,550 - root - INFO - HuggingfaceDataLoader: 1 shards (val)
78
+ 2026-02-02 09:21:22,550 - root - INFO - Downloading fineweb_val_000000.bin from kjj0/fineweb100B-gpt2...
79
+ 2026-02-02 09:21:22,583 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/datasets/kjj0/fineweb100B-gpt2/resolve/main/fineweb_val_000000.bin "HTTP/1.1 302 Found"
80
+ 2026-02-02 09:21:24,147 - root - INFO - Starting from step: 0
81
+ 2026-02-02 09:26:11,874 - root - INFO - 100 | lr: 0.0002 | loss: 7.0015 | logits loss: 6.6875 | load balance loss: 30.3341 | z loss: 18.5000 | avg iter time: 0.00ms | avg tok/sec: 0.00 | tokens processed: 26,214,400 | elapsed: 0h 4m 47s | ETA: calculating...
82
+ 2026-02-02 09:26:17,305 - root - WARNING - Received KeyboardInterrupt. Exiting...
83
+ 2026-02-02 09:26:17,609 - root - INFO - Training completed.
logs/run_20260202_themata_traplike_train.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ step,lr,loss,load_balance_loss,z_loss,time,tokens_processed,tokens_per_sec,elapsed_seconds
2
+ 100,0.00021185107,7.001525402069092,30.33412742614746,18.5,0,26214400,0,287.7257511615753
logs/run_20260202_themata_traplike_train.png ADDED
logs/run_20260202_themata_traplike_val.csv ADDED
@@ -0,0 +1 @@
 
 
1
+ step,loss,logits_loss