Add darwASCIInGPT nGPT checkpoints + bred children + model card

0a2f3e3 verified 6 days ago

4.73 kB

	---
	license: mit
	tags:
	- nGPT
	- ascii-art
	- character-language-model
	- model-merging
	- evolutionary-merge
	library_name: pytorch
	pipeline_tag: text-generation
	---

	# darwASCIInGPT — nGPT ASCII-art artists + Darwin-bred children

	Char-level nGPT (Normalized GPT) checkpoints from the darwASCIInGPT
	experiments: small hypersphere transformers that draw ASCII art, plus the
	Darwin-style bred offspring produced by merging them with **no gradient
	training**. Companion knowledge base (observations, code, Spark setup):

	> GitHub: https://github.com/tinycrops/darwASCIInGPT-playbook

	All ASCII models are `dim 256 / depth 4` (~3.18M params), char vocab ~106–109,
	trained on the apehex hand-drawn ASCII corpus on a Quadro P4000. The enwik8
	text models are `dim 256–512 / depth 8`, trained on a GTX 1060.

	## Special tokens (char-level)

	`SOL = \x02`, `SEP = \x03`, `EOA = \x04`. Two framings:

	\| Framing \| Prime with \| Use \|
	\|---\|---\|---\|
	\| Conditional `<SOL> label <SEP> art <EOA>` \| `<SOL>` + label + `<SEP>` \| request a class (e.g. `Cats`, `Swords`) \|
	\| Unconditional `<SOL> art <EOA>` \| `<SOL>` \| free-form draw (no label channel) \|

	These models are trained to very low loss (near-memorization), so:
	`T≈0.6, top_k≈20` → clean complete drawings; `top_k=1` → one fixed canonical piece
	per prefix; higher `T` → more variety with occasional whitespace drift.

	## Contents

	\| Path \| Type \| Framing \| Trained on / notes \|
	\|---\|---\|---\|---\|
	\| `uncond/styleA` \| artist \| unconditional \| apehex creatures & nature half. final stream_loss 0.031 (99.2% acc) \|
	\| `uncond/styleB` \| artist \| unconditional \| apehex objects & tech half. final stream_loss 0.089 (97.5% acc) \|
	\| `apehex/styleA` \| artist \| conditional \| GROUP_A subcategories (Cats, Dragons, Flowers, …) \|
	\| `apehex/styleB` \| artist \| conditional \| GROUP_B subcategories (Swords, Cars, Robots, …) \|
	\| `apehex/breed/child_slerp` \| bred \| conditional \| SLERP merge of styleA × styleB on the nGPT hypersphere \|
	\| `apehex/breed/child_slerp_frozenattn` \| bred \| conditional \| attention frozen from one parent, FFN SLERP-blended \|
	\| `parents/domA`, `parents/domB` \| artist \| conditional \| domain split: apehex art vs mrzjy sample \|
	\| `parents/breed/child_slerp`, `child_discrete`, `child_slerp_frozenattn` \| bred \| conditional \| recombinations of domA × domB \|
	\| `smith-experiment/ngpt` \| ablation \| conditional \| nGPT normalized-lerp residual. val_loss 1.0736 \|
	\| `smith-experiment/smith` \| ablation \| conditional \| Möbius geodesic residual (matched init/data/schedule). val_loss 1.0788, ~1.57× slower \|
	\| `resonance/standard_g1`, `harmonic30_g1` \| sweep \| conditional \| resonance-geometry sweep representatives \|
	\| `enwik8-darwin/offspring_forkL__x__forkR.pt` \| bred \| enwik8 text \| the hybrid-vigor offspring: bpc 2.4636 vs best parent 2.5047 (+0.0412) \|
	\| `enwik8-darwin/darwin_log.json` \| log \| — \| shared-ancestor breeding → vigor \|
	\| `enwik8-darwin/darwin_log_independent.json` \| log \| — \| independent-init breeding → no vigor (control) \|

	## Headline result: genealogy decides hybrid vigor

	Identical SLERP breeder, different parent relationship (enwik8 bpc, lower better):

	\| Parents \| Origin \| Gen-0 child \| Champion \| Best parent \| Vigor? \|
	\|---\|---\|---\|---\|---\|---\|
	\| independent inits \| different basins \| 3.26 \| 2.3064 \| 2.3063 \| No \|
	\| shared ancestor, split data \| same basin \| 2.47 \| 2.4633 \| 2.5047 \| Yes (+0.041) \|

	Crossbreeding only works between mode-connected parents (shared ancestor,
	specialized differently). See the GitHub `docs/darwin-breeding.md`.

	## Loading

	```python
	import torch
	from nGPT_pytorch import nGPT
	import ngpt_patch # restore __hash__ on nGPT modules; import BEFORE constructing

	ck = torch.load("uncond/styleA/model.pt", map_location="cuda", weights_only=False)
	model = nGPT(**ck["config"]).cuda(); model.load_state_dict(ck["model"]); model.eval()
	stoi, itos = ck["stoi"], ck["itos"]
	# see code/sample.py in the GitHub repo for the full conditional/unconditional sampler
	```

	> Checkpoints with `variant == "smith"` need the `SmithResidual` swap before
	> construction (see `train_compare.make_model` in the source lab).

	## Example — `uncond/styleA` (unconditional, T=0.6)

	```
	_
	( )
	\ ( ) )
	\ /\) (/\
	\ /` `
	\| dlb
	```

	Per-checkpoint sample galleries are in the GitHub repo under `galleries/`.

	## Provenance & license

	Models are derived from the apehex / mrzjy ASCII-art corpora and enwik8. Released
	MIT for the model weights and code; original ASCII art belongs to its respective
	artists (signatures like `dlb`, `jgs`, `sjw`, `ejm` are preserved in outputs).