OpenTransformer
/

AGILLM-4

Model card Files Files and versions

AGILLM-4 / README.md

OpenTransformer's picture

OpenTransformer

Harvest fused QKV projection from n1

18b3e9e verified 3 days ago

|

history blame contribute delete

1.16 kB

	---
	library_name: pytorch
	tags:
	- transformer
	- language-model
	- long-context
	- agillm
	- experimental
	---

	# AGILLM-4

	AGILLM-4 is the next training target after AGILLM-3. The current code is a
	production-oriented starting point, copied from the proven single-file trainer
	and extended for:

	- ~1.5B parameter main preset (`agillm4_main`)
	- 100 tokens per parameter target ratio
	- longer block-size work on 24GB, B200, and B300 class GPUs
	- AR+SAT every step with sequential backward to reduce peak VRAM
	- SDPA and experimental sublinear local+landmark attention backends
	- exact M-fold expansion attention harvested from n1.py, with local verifier
	- fused QKV projection harvested from n1.py, with legacy checkpoint loading
	- profiling tools for memory, throughput, AR cost, SAT cost, and optimizer cost
	- synthetic long-context curriculum generation for recall and multi-hop tests

	Start with [AGILLM-4.md](AGILLM-4.md) for the training plan and command
	recipes. The current sublinear backend is intentionally experimental: profile it
	against SDPA before using it for a real run.

	Current harvest status from n1.py is tracked in [N1_HARVEST.md](N1_HARVEST.md).