OpenTransformer's picture
|
download
raw
1.52 kB
# AGILLM-4.1 — code + compatible checkpoints (self-contained)
This bucket holds the **AGILLM-4.1** model: the exact runtime **code** (`code/`) and the
**checkpoints** (`ckpts/`) that load under it, kept together so there is no version skew.
> This supersedes the legacy `OpenTransformer/AGILLM-4` model repo, which mixes older
> agillm4 code with older checkpoints that are **not guaranteed to be loadable** under the
> current 4.1 architecture. Use this bucket for 4.1; treat AGILLM-4 as archival.
## Architecture (from the live checkpoint cfg)
- layers: **28** (4 DiffusionBlocks x 7), d_model: **1280**, heads: **20**, rank: **160**
- MoE FFN: **2 experts, top-1, mult 4** · tied weights · sublinear attention
- vocab: **129280** · tokenizer: `deepseek-ai/DeepSeek-V4-Pro`
- objective: AR / SAT / NAT (stochastic) · DiffusionBlocks EDM block-wise training
## Loading
The runtime is the single file `code/agillm41.py` (mainline @ `1e7f963`). Checkpoints in
`ckpts/pretrain_step*.pt` are plain `torch.save` dicts with keys `cfg`, `step`,
`tokenizer_id`, `tie_weights`, and the model tensors. Pull with the `hf` client:
```bash
pip install "huggingface_hub[hf_xet]>=1.18"
hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/ckpts/<file>.pt ./
hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/code/agillm41.py ./
```
Backed by HF **Storage Buckets** (Xet dedup, mutable): each backup ships only changed
chunks. Synced from the live trainer; `latest.json` names the newest checkpoint.

Xet Storage Details

Size:
1.52 kB
·
Xet hash:
57e32ce8daad71adc00830a3c589a9655df0842996ccc0be0d4ae684865589c7

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.