darwASCIInGPT / README.md
tinycrops's picture
Add darwASCIInGPT nGPT checkpoints + bred children + model card
0a2f3e3 verified
|
Raw
History Blame Contribute Delete
4.73 kB
---
license: mit
tags:
- nGPT
- ascii-art
- character-language-model
- model-merging
- evolutionary-merge
library_name: pytorch
pipeline_tag: text-generation
---
# darwASCIInGPT — nGPT ASCII-art artists + Darwin-bred children
Char-level **nGPT** (Normalized GPT) checkpoints from the *darwASCIInGPT*
experiments: small hypersphere transformers that draw ASCII art, plus the
**Darwin-style bred offspring** produced by merging them with **no gradient
training**. Companion knowledge base (observations, code, Spark setup):
> **GitHub:** https://github.com/tinycrops/darwASCIInGPT-playbook
All ASCII models are `dim 256 / depth 4` (~3.18M params), char vocab ~106–109,
trained on the **apehex** hand-drawn ASCII corpus on a Quadro P4000. The enwik8
text models are `dim 256–512 / depth 8`, trained on a GTX 1060.
## Special tokens (char-level)
`SOL = \x02`, `SEP = \x03`, `EOA = \x04`. Two framings:
| Framing | Prime with | Use |
|---|---|---|
| **Conditional** `<SOL> label <SEP> art <EOA>` | `<SOL>` + label + `<SEP>` | request a class (e.g. `Cats`, `Swords`) |
| **Unconditional** `<SOL> art <EOA>` | `<SOL>` | free-form draw (no label channel) |
These models are trained to **very low loss (near-memorization)**, so:
`T≈0.6, top_k≈20` → clean complete drawings; `top_k=1` → one fixed canonical piece
per prefix; higher `T` → more variety with occasional whitespace drift.
## Contents
| Path | Type | Framing | Trained on / notes |
|---|---|---|---|
| `uncond/styleA` | artist | unconditional | apehex **creatures & nature** half. final stream_loss **0.031** (99.2% acc) |
| `uncond/styleB` | artist | unconditional | apehex **objects & tech** half. final stream_loss **0.089** (97.5% acc) |
| `apehex/styleA` | artist | conditional | GROUP_A subcategories (Cats, Dragons, Flowers, …) |
| `apehex/styleB` | artist | conditional | GROUP_B subcategories (Swords, Cars, Robots, …) |
| `apehex/breed/child_slerp` | **bred** | conditional | SLERP merge of styleA × styleB on the nGPT hypersphere |
| `apehex/breed/child_slerp_frozenattn` | **bred** | conditional | attention frozen from one parent, FFN SLERP-blended |
| `parents/domA`, `parents/domB` | artist | conditional | domain split: apehex art vs mrzjy sample |
| `parents/breed/child_slerp`, `child_discrete`, `child_slerp_frozenattn` | **bred** | conditional | recombinations of domA × domB |
| `smith-experiment/ngpt` | ablation | conditional | nGPT normalized-lerp residual. val_loss **1.0736** |
| `smith-experiment/smith` | ablation | conditional | Möbius geodesic residual (matched init/data/schedule). val_loss **1.0788**, ~1.57× slower |
| `resonance/standard_g1`, `harmonic30_g1` | sweep | conditional | resonance-geometry sweep representatives |
| `enwik8-darwin/offspring_forkL__x__forkR.pt` | **bred** | enwik8 text | the hybrid-vigor offspring: **bpc 2.4636 vs best parent 2.5047 (+0.0412)** |
| `enwik8-darwin/darwin_log.json` | log | — | shared-ancestor breeding → vigor |
| `enwik8-darwin/darwin_log_independent.json` | log | — | independent-init breeding → **no** vigor (control) |
## Headline result: genealogy decides hybrid vigor
Identical SLERP breeder, different parent *relationship* (enwik8 bpc, lower better):
| Parents | Origin | Gen-0 child | Champion | Best parent | Vigor? |
|---|---|---|---|---|---|
| independent inits | different basins | 3.26 | 2.3064 | 2.3063 | **No** |
| shared ancestor, split data | same basin | 2.47 | **2.4633** | 2.5047 | **Yes (+0.041)** |
Crossbreeding only works between **mode-connected** parents (shared ancestor,
specialized differently). See the GitHub `docs/darwin-breeding.md`.
## Loading
```python
import torch
from nGPT_pytorch import nGPT
import ngpt_patch # restore __hash__ on nGPT modules; import BEFORE constructing
ck = torch.load("uncond/styleA/model.pt", map_location="cuda", weights_only=False)
model = nGPT(**ck["config"]).cuda(); model.load_state_dict(ck["model"]); model.eval()
stoi, itos = ck["stoi"], ck["itos"]
# see code/sample.py in the GitHub repo for the full conditional/unconditional sampler
```
> Checkpoints with `variant == "smith"` need the `SmithResidual` swap before
> construction (see `train_compare.make_model` in the source lab).
## Example — `uncond/styleA` (unconditional, T=0.6)
```
_
( )
\ ( ) )
\ /\) (/\
\ /` `
| dlb
```
Per-checkpoint sample galleries are in the GitHub repo under `galleries/`.
## Provenance & license
Models are derived from the apehex / mrzjy ASCII-art corpora and enwik8. Released
MIT for the model weights and code; original ASCII art belongs to its respective
artists (signatures like `dlb`, `jgs`, `sjw`, `ejm` are preserved in outputs).