Buckets:
| # AGILLM-4.1 — code + compatible checkpoints (self-contained) | |
| This bucket holds the **AGILLM-4.1** model: the exact runtime **code** (`code/`) and the | |
| **checkpoints** (`ckpts/`) that load under it, kept together so there is no version skew. | |
| > This supersedes the legacy `OpenTransformer/AGILLM-4` model repo, which mixes older | |
| > agillm4 code with older checkpoints that are **not guaranteed to be loadable** under the | |
| > current 4.1 architecture. Use this bucket for 4.1; treat AGILLM-4 as archival. | |
| ## Architecture (from the live checkpoint cfg) | |
| - layers: **28** (4 DiffusionBlocks x 7), d_model: **1280**, heads: **20**, rank: **160** | |
| - MoE FFN: **2 experts, top-1, mult 4** · tied weights · sublinear attention | |
| - vocab: **129280** · tokenizer: `deepseek-ai/DeepSeek-V4-Pro` | |
| - objective: AR / SAT / NAT (stochastic) · DiffusionBlocks EDM block-wise training | |
| ## Loading | |
| The runtime is the single file `code/agillm41.py` (mainline @ `1e7f963`). Checkpoints in | |
| `ckpts/pretrain_step*.pt` are plain `torch.save` dicts with keys `cfg`, `step`, | |
| `tokenizer_id`, `tie_weights`, and the model tensors. Pull with the `hf` client: | |
| ```bash | |
| pip install "huggingface_hub[hf_xet]>=1.18" | |
| hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/ckpts/<file>.pt ./ | |
| hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/code/agillm41.py ./ | |
| ``` | |
| Backed by HF **Storage Buckets** (Xet dedup, mutable): each backup ships only changed | |
| chunks. Synced from the live trainer; `latest.json` names the newest checkpoint. | |
Xet Storage Details
- Size:
- 1.52 kB
- Xet hash:
- 57e32ce8daad71adc00830a3c589a9655df0842996ccc0be0d4ae684865589c7
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.