Pacific-i64 commited on
Commit
9ca12ea
·
verified ·
1 Parent(s): 6649c3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -3
README.md CHANGED
@@ -1,3 +1,58 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: cc-by-nc-4.0
4
+ tags:
5
+ - complexity-deep
6
+ - dense-baseline
7
+ - swiglu
8
+ - checkpoint
9
+ - resumable
10
+ - chinchilla
11
+ ---
12
+
13
+ # Dense SwiGLU Baseline (384.5M) — Training Checkpoint (Step 15,259)
14
+
15
+ Resumable training checkpoint with full optimizer state at the end of 8B tokens training.
16
+
17
+ **Note**: This model was trained with a Chinchilla-like token budget (8B tokens for 384.5M parameters, ~21 tokens/param). The model may benefit from continued training beyond this point.
18
+
19
+ ## Contents
20
+
21
+ - `checkpoint.pt` - Model weights + training state
22
+ - `model.safetensors` - Model weights (safetensors format)
23
+ - `optimizer_rank0.pt` - AdamW optimizer state (GPU 0)
24
+ - `optimizer_rank1.pt` - AdamW optimizer state (GPU 1)
25
+ - `training_state.json` - Step counter, LR, etc.
26
+
27
+ ## Model Config
28
+
29
+ - **Parameters**: 384.5M (all active per token)
30
+ - **Hidden**: 1024, Layers: 20, Heads: 16, KV Heads: 4
31
+ - **MLP**: Dense SwiGLU, Intermediate: 4358
32
+ - **Training**: 8B tokens (15,259 steps), AdamW lr=2.1e-4, cosine 5% warmup
33
+
34
+ ## Resume Training
35
+
36
+ ```python
37
+ import torch
38
+
39
+ checkpoint = torch.load("checkpoint.pt", map_location="cpu")
40
+ model.load_state_dict(checkpoint["model"])
41
+
42
+ # Load optimizer for your GPU rank (0 or 1)
43
+ rank = torch.distributed.get_rank()
44
+ optimizer_state = torch.load(f"optimizer_rank{rank}.pt", map_location="cpu")
45
+ optimizer.load_state_dict(optimizer_state)
46
+
47
+ # Resume from step 15,259
48
+ ```
49
+
50
+ ## Pretrained Weights (inference)
51
+
52
+ For inference use the safetensors checkpoint in `../final/` instead.
53
+
54
+ ## License
55
+
56
+ CC-BY-NC-4.0
57
+
58
+ Complexity-ML -- 2026