Pacific-i64
/

Dense-final-15259

complexity-deep

Model card Files Files and versions

Pacific-i64 commited on Apr 3

Commit

9ca12ea

·

verified ·

1 Parent(s): 6649c3d

Update README.md

Files changed (1) hide show

README.md +58 -3

README.md CHANGED Viewed

@@ -1,3 +1,58 @@
----
-license: cc-by-nc-4.0
----

+---
+language: en
+license: cc-by-nc-4.0
+tags:
+  - complexity-deep
+  - dense-baseline
+  - swiglu
+  - checkpoint
+  - resumable
+  - chinchilla
+---
+# Dense SwiGLU Baseline (384.5M) — Training Checkpoint (Step 15,259)
+Resumable training checkpoint with full optimizer state at the end of 8B tokens training.
+**Note**: This model was trained with a Chinchilla-like token budget (8B tokens for 384.5M parameters, ~21 tokens/param). The model may benefit from continued training beyond this point.
+## Contents
+- `checkpoint.pt` - Model weights + training state
+- `model.safetensors` - Model weights (safetensors format)
+- `optimizer_rank0.pt` - AdamW optimizer state (GPU 0)
+- `optimizer_rank1.pt` - AdamW optimizer state (GPU 1)
+- `training_state.json` - Step counter, LR, etc.
+## Model Config
+- **Parameters**: 384.5M (all active per token)
+- **Hidden**: 1024, Layers: 20, Heads: 16, KV Heads: 4
+- **MLP**: Dense SwiGLU, Intermediate: 4358
+- **Training**: 8B tokens (15,259 steps), AdamW lr=2.1e-4, cosine 5% warmup
+## Resume Training
+```python
+import torch
+checkpoint = torch.load("checkpoint.pt", map_location="cpu")
+model.load_state_dict(checkpoint["model"])
+# Load optimizer for your GPU rank (0 or 1)
+rank = torch.distributed.get_rank()
+optimizer_state = torch.load(f"optimizer_rank{rank}.pt", map_location="cpu")
+optimizer.load_state_dict(optimizer_state)
+# Resume from step 15,259
+```
+## Pretrained Weights (inference)
+For inference use the safetensors checkpoint in `../final/` instead.
+## License
+CC-BY-NC-4.0
+Complexity-ML -- 2026