Upload CRATE checkpoint at step 20000

Files changed (6) hide show

README.md ADDED Viewed

+---
+tags:
+- nanochat
+- crate
+license: mit
+---
+# crate-d12-base
+A CRATE (Coding-Representation Auto-encoding Transformer Encoder) language model
+trained with [nanochat](https://github.com/karpathy/nanochat).
+## Model Details
+| Parameter | Value |
+|-----------|-------|
+| Architecture | CRATE |
+| Layers | 12 |
+| Hidden dim | 768 |
+| Attention heads | 6 |
+| Vocab size | 50304 |
+| Max sequence length | 1024 |
+| Window pattern | SSSL |
+| Training step | 20,000 |
+| Validation BPB | 1.1131 |
+| Smooth train loss | 3.7495 |
+| Training time | 3.4 hours |
+| Run name | 4090-crate-a |
+| Batch size (tokens) | 65536 |
+## Files
+- `model.safetensors` -- model weights in safetensors format
+- `config.json` -- model architecture config (reconstruct with `CRATEConfig(**config)`)
+- `tokenizer.pkl` -- BPE tokenizer (pickle of tiktoken Encoding)
+- `token_bytes.pt` -- token byte mappings
+- `meta.json` -- full training metadata from the checkpoint
+## Usage
+```python
+from nanochat.checkpoint_manager import build_model
+model, tokenizer, meta = build_model("path/to/downloaded/dir", step=20000, device=torch.device("cuda"), phase="eval")
+```

config.json ADDED Viewed

+{
+  "sequence_len": 1024,
+  "vocab_size": 50304,
+  "n_layer": 12,
+  "n_head": 6,
+  "n_kv_head": 6,
+  "n_embd": 768,
+  "window_pattern": "SSSL"
+}

meta.json ADDED Viewed

+{
+  "step": 20000,
+  "val_bpb": 1.1130690245666328,
+  "model_config": {
+    "sequence_len": 1024,
+    "vocab_size": 50304,
+    "n_layer": 12,
+    "n_head": 6,
+    "n_kv_head": 6,
+    "n_embd": 768,
+    "window_pattern": "SSSL"
+  },
+  "user_config": {
+    "run": "4090-crate-a",
+    "device_type": "",
+    "depth": 12,
+    "aspect_ratio": 64,
+    "head_dim": 128,
+    "max_seq_len": 1024,
+    "window_pattern": "SSSL",
+    "num_iterations": 50000,
+    "target_flops": -1.0,
+    "target_param_data_ratio": 8,
+    "device_batch_size": 16,
+    "total_batch_size": 65536,
+    "embedding_lr": 0.3,
+    "unembedding_lr": 0.004,
+    "weight_decay": 0.2,
+    "matrix_lr": 0.02,
+    "scalar_lr": 0.5,
+    "adam_beta1": 0.8,
+    "adam_beta2": 0.95,
+    "warmup_ratio": 0.0,
+    "warmdown_ratio": 0.4,
+    "final_lr_frac": 0.0,
+    "resume_from_step": -1,
+    "eval_every": 20000,
+    "eval_tokens": 10485760,
+    "core_metric_every": 2000,
+    "core_metric_max_per_task": 500,
+    "sample_every": 2000,
+    "save_every": 5000,
+    "model_tag": null
+  },
+  "device_batch_size": 16,
+  "max_seq_len": 1024,
+  "dataloader_state_dict": {
+    "pq_idx": 8,
+    "rg_idx": 18,
+    "epoch": 2
+  },
+  "loop_state": {
+    "min_val_bpb": 1.1130690245666328,
+    "smooth_train_loss": 3.7495008182618137,
+    "total_training_time": 12204.943566560745
+  }
+}

model.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:72bf013a21a11c7bdf5eb92b983e75426f4ea31daf13fc11d412e9a1b0ad57aa
+size 515070592

token_bytes.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:d03fb2a61108a67da7bfc068acb7df60418a9810d2396cae9ba431edb48ebe2f
+size 202793

tokenizer.pkl ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:6329ca24d1862360651c17b750fc49b01564c871f9047717dc63c7726891ac22
+size 644366