RonanMcGovern commited on
Commit
e7fcc10
·
verified ·
1 Parent(s): 0e56eb3

Upload via push_to_hf.py

Browse files
report/latest/chat-evaluation-mid-(r=16).md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Chat evaluation mid (r=16)
2
+ timestamp: 2025-12-17 12:06:20
3
+
4
+ - source: mid
5
+ - task_name: None
6
+ - dtype: bfloat16
7
+ - temperature: 0.0000
8
+ - max_new_tokens: 512
9
+ - num_samples: 1
10
+ - top_k: 50
11
+ - batch_size: 8
12
+ - model_tag: None
13
+ - step: None
14
+ - max_problems: None
15
+ - device_type:
16
+ - num_recur: 2,4,8,16
17
+ - num_recur: 16
18
+ - ARC-Easy: 0.4251
19
+ - ARC-Challenge: 0.3012
20
+ - MMLU: 0.3262
21
+ - GSM8K: 0.0387
22
+ - HumanEval: 0.0976
23
+ - SpellingBee: 0.9805
24
+ - ChatCORE metric: 0.2533
25
+
report/latest/chat-evaluation-mid-(r=2).md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Chat evaluation mid (r=2)
2
+ timestamp: 2025-12-17 11:20:38
3
+
4
+ - source: mid
5
+ - task_name: None
6
+ - dtype: bfloat16
7
+ - temperature: 0.0000
8
+ - max_new_tokens: 512
9
+ - num_samples: 1
10
+ - top_k: 50
11
+ - batch_size: 8
12
+ - model_tag: None
13
+ - step: None
14
+ - max_problems: None
15
+ - device_type:
16
+ - num_recur: 2,4,8,16
17
+ - num_recur: 2
18
+ - ARC-Easy: 0.3805
19
+ - ARC-Challenge: 0.2799
20
+ - MMLU: 0.3157
21
+ - GSM8K: 0.0273
22
+ - HumanEval: 0.0671
23
+ - SpellingBee: 0.9688
24
+ - ChatCORE metric: 0.2274
25
+
report/latest/chat-evaluation-mid-(r=4).md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Chat evaluation mid (r=4)
2
+ timestamp: 2025-12-17 11:28:21
3
+
4
+ - source: mid
5
+ - task_name: None
6
+ - dtype: bfloat16
7
+ - temperature: 0.0000
8
+ - max_new_tokens: 512
9
+ - num_samples: 1
10
+ - top_k: 50
11
+ - batch_size: 8
12
+ - model_tag: None
13
+ - step: None
14
+ - max_problems: None
15
+ - device_type:
16
+ - num_recur: 2,4,8,16
17
+ - num_recur: 4
18
+ - ARC-Easy: 0.4276
19
+ - ARC-Challenge: 0.3038
20
+ - MMLU: 0.3237
21
+ - GSM8K: 0.0447
22
+ - HumanEval: 0.0854
23
+ - SpellingBee: 0.9805
24
+ - ChatCORE metric: 0.2529
25
+
report/latest/chat-evaluation-mid-(r=8).md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Chat evaluation mid (r=8)
2
+ timestamp: 2025-12-17 11:41:15
3
+
4
+ - source: mid
5
+ - task_name: None
6
+ - dtype: bfloat16
7
+ - temperature: 0.0000
8
+ - max_new_tokens: 512
9
+ - num_samples: 1
10
+ - top_k: 50
11
+ - batch_size: 8
12
+ - model_tag: None
13
+ - step: None
14
+ - max_problems: None
15
+ - device_type:
16
+ - num_recur: 2,4,8,16
17
+ - num_recur: 8
18
+ - ARC-Easy: 0.4268
19
+ - ARC-Challenge: 0.3106
20
+ - MMLU: 0.3246
21
+ - GSM8K: 0.0417
22
+ - HumanEval: 0.0854
23
+ - SpellingBee: 0.9805
24
+ - ChatCORE metric: 0.2539
25
+
report/latest/chat-evaluation-sft-(r=16).md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Chat evaluation sft (r=16)
2
+ timestamp: 2025-12-17 12:56:07
3
+
4
+ - source: sft
5
+ - task_name: None
6
+ - dtype: bfloat16
7
+ - temperature: 0.0000
8
+ - max_new_tokens: 512
9
+ - num_samples: 1
10
+ - top_k: 50
11
+ - batch_size: 8
12
+ - model_tag: None
13
+ - step: None
14
+ - max_problems: None
15
+ - device_type:
16
+ - num_recur: 2,4,8,16
17
+ - num_recur: 16
18
+ - ARC-Easy: 0.4381
19
+ - ARC-Challenge: 0.3123
20
+ - MMLU: 0.3179
21
+ - GSM8K: 0.0644
22
+ - HumanEval: 0.0793
23
+ - SpellingBee: 0.9844
24
+ - ChatCORE metric: 0.2588
25
+
report/latest/chat-evaluation-sft-(r=2).md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Chat evaluation sft (r=2)
2
+ timestamp: 2025-12-17 12:13:50
3
+
4
+ - source: sft
5
+ - task_name: None
6
+ - dtype: bfloat16
7
+ - temperature: 0.0000
8
+ - max_new_tokens: 512
9
+ - num_samples: 1
10
+ - top_k: 50
11
+ - batch_size: 8
12
+ - model_tag: None
13
+ - step: None
14
+ - max_problems: None
15
+ - device_type:
16
+ - num_recur: 2,4,8,16
17
+ - num_recur: 2
18
+ - ARC-Easy: 0.4141
19
+ - ARC-Challenge: 0.3063
20
+ - MMLU: 0.3119
21
+ - GSM8K: 0.0356
22
+ - HumanEval: 0.0793
23
+ - SpellingBee: 0.9844
24
+ - ChatCORE metric: 0.2459
25
+
report/latest/chat-evaluation-sft-(r=4).md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Chat evaluation sft (r=4)
2
+ timestamp: 2025-12-17 12:20:55
3
+
4
+ - source: sft
5
+ - task_name: None
6
+ - dtype: bfloat16
7
+ - temperature: 0.0000
8
+ - max_new_tokens: 512
9
+ - num_samples: 1
10
+ - top_k: 50
11
+ - batch_size: 8
12
+ - model_tag: None
13
+ - step: None
14
+ - max_problems: None
15
+ - device_type:
16
+ - num_recur: 2,4,8,16
17
+ - num_recur: 4
18
+ - ARC-Easy: 0.4306
19
+ - ARC-Challenge: 0.3114
20
+ - MMLU: 0.3158
21
+ - GSM8K: 0.0614
22
+ - HumanEval: 0.0793
23
+ - SpellingBee: 0.9883
24
+ - ChatCORE metric: 0.2566
25
+
report/latest/chat-evaluation-sft-(r=8).md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Chat evaluation sft (r=8)
2
+ timestamp: 2025-12-17 12:33:26
3
+
4
+ - source: sft
5
+ - task_name: None
6
+ - dtype: bfloat16
7
+ - temperature: 0.0000
8
+ - max_new_tokens: 512
9
+ - num_samples: 1
10
+ - top_k: 50
11
+ - batch_size: 8
12
+ - model_tag: None
13
+ - step: None
14
+ - max_problems: None
15
+ - device_type:
16
+ - num_recur: 2,4,8,16
17
+ - num_recur: 8
18
+ - ARC-Easy: 0.4423
19
+ - ARC-Challenge: 0.3106
20
+ - MMLU: 0.3185
21
+ - GSM8K: 0.0599
22
+ - HumanEval: 0.0915
23
+ - SpellingBee: 0.9883
24
+ - ChatCORE metric: 0.2614
25
+
report/latest/chat-sft.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Chat SFT
2
+ timestamp: 2025-12-17 12:09:10
3
+
4
+ - run: recursive-d20
5
+ - source: mid
6
+ - device_type:
7
+ - dtype: bfloat16
8
+ - device_batch_size: 4
9
+ - num_epochs: 1
10
+ - num_iterations: -1
11
+ - target_examples_per_step: 32
12
+ - unembedding_lr: 0.0040
13
+ - embedding_lr: 0.2000
14
+ - matrix_lr: 0.0200
15
+ - weight_decay: 0.0000
16
+ - init_lr_frac: 0.0200
17
+ - eval_every: 100
18
+ - eval_steps: 100
19
+ - eval_metrics_every: 200
20
+ - eval_metrics_max_problems: 1024
21
+ - Training rows: 22,440
22
+ - Number of iterations: 701
23
+ - Training loss: 1.4988
24
+ - Validation loss: 1.0783
25
+
report/latest/header.md CHANGED
@@ -1,13 +1,13 @@
1
  # nanochat training report
2
 
3
- Generated: 2025-12-17 08:44:33
4
 
5
  ## Environment
6
 
7
  ### Git Information
8
  - Branch: recursive
9
- - Commit: f008f9b (dirty)
10
- - Message: feat: add Poisson sampling to mid/sft training and multi-recur chat eval
11
 
12
  ### Hardware
13
  - Platform: Linux
@@ -24,13 +24,13 @@ Generated: 2025-12-17 08:44:33
24
 
25
 
26
  ### Bloat
27
- - Characters: 464,005
28
- - Lines: 11,225
29
  - Files: 55
30
- - Tokens (approx): 116,001
31
- - Dependencies (uv.lock lines): 2,252
32
 
33
- Run started: 2025-12-17 08:44:37
34
 
35
  ---
36
 
 
1
  # nanochat training report
2
 
3
+ Generated: 2025-12-17 09:42:24
4
 
5
  ## Environment
6
 
7
  ### Git Information
8
  - Branch: recursive
9
+ - Commit: 427e3f4 (dirty)
10
+ - Message: skip identity conv gen if present
11
 
12
  ### Hardware
13
  - Platform: Linux
 
24
 
25
 
26
  ### Bloat
27
+ - Characters: 464,857
28
+ - Lines: 11,239
29
  - Files: 55
30
+ - Tokens (approx): 116,214
31
+ - Dependencies (uv.lock lines): 2,254
32
 
33
+ Run started: 2025-12-17 09:42:27
34
 
35
  ---
36
 
report/latest/midtraining.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Midtraining
2
+ timestamp: 2025-12-17 11:15:09
3
+
4
+ - run: recursive-d20
5
+ - device_type:
6
+ - dtype: bfloat16
7
+ - num_iterations: -1
8
+ - max_seq_len: 2048
9
+ - device_batch_size: 4
10
+ - unembedding_lr: 0.0040
11
+ - embedding_lr: 0.2000
12
+ - matrix_lr: 0.0200
13
+ - init_lr_frac: 1.0000
14
+ - weight_decay: 0.0000
15
+ - eval_every: 150
16
+ - eval_tokens: 10,485,760
17
+ - total_batch_size: 524,288
18
+ - dry_run: 0
19
+ - Number of iterations: 810
20
+ - DDP world size: 8
21
+ - Minimum validation bpb: 0.4178
22
+
report/latest/report.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # nanochat training report
2
+
3
+ Generated: 2025-12-17 09:42:24
4
+
5
+ ## Environment
6
+
7
+ ### Git Information
8
+ - Branch: recursive
9
+ - Commit: 427e3f4 (dirty)
10
+ - Message: skip identity conv gen if present
11
+
12
+ ### Hardware
13
+ - Platform: Linux
14
+ - CPUs: 128 cores (256 logical)
15
+ - Memory: 1511.5 GB
16
+ - GPUs: 8x NVIDIA H100 80GB HBM3
17
+ - GPU Memory: 632.8 GB total
18
+ - CUDA Version: 12.8
19
+ - Hourly Rate: $24.00/hour
20
+
21
+ ### Software
22
+ - Python: 3.10.12
23
+ - PyTorch: 2.8.0+cu128
24
+
25
+
26
+ ### Bloat
27
+ - Characters: 464,857
28
+ - Lines: 11,239
29
+ - Files: 55
30
+ - Tokens (approx): 116,214
31
+ - Dependencies (uv.lock lines): 2,254
32
+
33
+ Run started: 2025-12-17 09:42:27
34
+
35
+ ---
36
+
37
+ ## Midtraining
38
+ timestamp: 2025-12-17 11:15:09
39
+
40
+ - run: recursive-d20
41
+ - device_type:
42
+ - dtype: bfloat16
43
+ - num_iterations: -1
44
+ - max_seq_len: 2048
45
+ - device_batch_size: 4
46
+ - unembedding_lr: 0.0040
47
+ - embedding_lr: 0.2000
48
+ - matrix_lr: 0.0200
49
+ - init_lr_frac: 1.0000
50
+ - weight_decay: 0.0000
51
+ - eval_every: 150
52
+ - eval_tokens: 10,485,760
53
+ - total_batch_size: 524,288
54
+ - dry_run: 0
55
+ - Number of iterations: 810
56
+ - DDP world size: 8
57
+ - Minimum validation bpb: 0.4178
58
+
59
+
60
+ ## Chat SFT
61
+ timestamp: 2025-12-17 12:09:10
62
+
63
+ - run: recursive-d20
64
+ - source: mid
65
+ - device_type:
66
+ - dtype: bfloat16
67
+ - device_batch_size: 4
68
+ - num_epochs: 1
69
+ - num_iterations: -1
70
+ - target_examples_per_step: 32
71
+ - unembedding_lr: 0.0040
72
+ - embedding_lr: 0.2000
73
+ - matrix_lr: 0.0200
74
+ - weight_decay: 0.0000
75
+ - init_lr_frac: 0.0200
76
+ - eval_every: 100
77
+ - eval_steps: 100
78
+ - eval_metrics_every: 200
79
+ - eval_metrics_max_problems: 1024
80
+ - Training rows: 22,440
81
+ - Number of iterations: 701
82
+ - Training loss: 1.4988
83
+ - Validation loss: 1.0783
84
+
85
+
86
+ ## Summary
87
+
88
+ - Characters: 464,857
89
+ - Lines: 11,239
90
+ - Files: 55
91
+ - Tokens (approx): 116,214
92
+ - Dependencies (uv.lock lines): 2,254
93
+
94
+ | Metric | BASE | MID | SFT | RL |
95
+ |-----------------|----------|----------|----------|----------|
96
+
97
+ Total wall clock time: 2h26m