Buckets:

OpenTransformer
/

agillm41-checkpoints

21 days ago

1.52 kB

	# AGILLM-4.1 — code + compatible checkpoints (self-contained)

	This bucket holds the AGILLM-4.1 model: the exact runtime code (`code/`) and the
	checkpoints (`ckpts/`) that load under it, kept together so there is no version skew.

	> This supersedes the legacy `OpenTransformer/AGILLM-4` model repo, which mixes older
	> agillm4 code with older checkpoints that are not guaranteed to be loadable under the
	> current 4.1 architecture. Use this bucket for 4.1; treat AGILLM-4 as archival.

	## Architecture (from the live checkpoint cfg)
	- layers: 28 (4 DiffusionBlocks x 7), d_model: 1280, heads: 20, rank: 160
	- MoE FFN: 2 experts, top-1, mult 4 · tied weights · sublinear attention
	- vocab: 129280 · tokenizer: `deepseek-ai/DeepSeek-V4-Pro`
	- objective: AR / SAT / NAT (stochastic) · DiffusionBlocks EDM block-wise training

	## Loading
	The runtime is the single file `code/agillm41.py` (mainline @ `1e7f963`). Checkpoints in
	`ckpts/pretrain_step*.pt` are plain `torch.save` dicts with keys `cfg`, `step`,
	`tokenizer_id`, `tie_weights`, and the model tensors. Pull with the `hf` client:

	```bash
	pip install "huggingface_hub[hf_xet]>=1.18"
	hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/ckpts/<file>.pt ./
	hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/code/agillm41.py ./
	```

	Backed by HF Storage Buckets (Xet dedup, mutable): each backup ships only changed
	chunks. Synced from the live trainer; `latest.json` names the newest checkpoint.

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.