| --- |
| license: mit |
| tags: |
| - nGPT |
| - ascii-art |
| - character-language-model |
| - model-merging |
| - evolutionary-merge |
| library_name: pytorch |
| pipeline_tag: text-generation |
| --- |
| |
| # darwASCIInGPT — nGPT ASCII-art artists + Darwin-bred children |
|
|
| Char-level **nGPT** (Normalized GPT) checkpoints from the *darwASCIInGPT* |
| experiments: small hypersphere transformers that draw ASCII art, plus the |
| **Darwin-style bred offspring** produced by merging them with **no gradient |
| training**. Companion knowledge base (observations, code, Spark setup): |
|
|
| > **GitHub:** https://github.com/tinycrops/darwASCIInGPT-playbook |
|
|
| All ASCII models are `dim 256 / depth 4` (~3.18M params), char vocab ~106–109, |
| trained on the **apehex** hand-drawn ASCII corpus on a Quadro P4000. The enwik8 |
| text models are `dim 256–512 / depth 8`, trained on a GTX 1060. |
|
|
| ## Special tokens (char-level) |
|
|
| `SOL = \x02`, `SEP = \x03`, `EOA = \x04`. Two framings: |
|
|
| | Framing | Prime with | Use | |
| |---|---|---| |
| | **Conditional** `<SOL> label <SEP> art <EOA>` | `<SOL>` + label + `<SEP>` | request a class (e.g. `Cats`, `Swords`) | |
| | **Unconditional** `<SOL> art <EOA>` | `<SOL>` | free-form draw (no label channel) | |
|
|
| These models are trained to **very low loss (near-memorization)**, so: |
| `T≈0.6, top_k≈20` → clean complete drawings; `top_k=1` → one fixed canonical piece |
| per prefix; higher `T` → more variety with occasional whitespace drift. |
|
|
| ## Contents |
|
|
| | Path | Type | Framing | Trained on / notes | |
| |---|---|---|---| |
| | `uncond/styleA` | artist | unconditional | apehex **creatures & nature** half. final stream_loss **0.031** (99.2% acc) | |
| | `uncond/styleB` | artist | unconditional | apehex **objects & tech** half. final stream_loss **0.089** (97.5% acc) | |
| | `apehex/styleA` | artist | conditional | GROUP_A subcategories (Cats, Dragons, Flowers, …) | |
| | `apehex/styleB` | artist | conditional | GROUP_B subcategories (Swords, Cars, Robots, …) | |
| | `apehex/breed/child_slerp` | **bred** | conditional | SLERP merge of styleA × styleB on the nGPT hypersphere | |
| | `apehex/breed/child_slerp_frozenattn` | **bred** | conditional | attention frozen from one parent, FFN SLERP-blended | |
| | `parents/domA`, `parents/domB` | artist | conditional | domain split: apehex art vs mrzjy sample | |
| | `parents/breed/child_slerp`, `child_discrete`, `child_slerp_frozenattn` | **bred** | conditional | recombinations of domA × domB | |
| | `smith-experiment/ngpt` | ablation | conditional | nGPT normalized-lerp residual. val_loss **1.0736** | |
| | `smith-experiment/smith` | ablation | conditional | Möbius geodesic residual (matched init/data/schedule). val_loss **1.0788**, ~1.57× slower | |
| | `resonance/standard_g1`, `harmonic30_g1` | sweep | conditional | resonance-geometry sweep representatives | |
| | `enwik8-darwin/offspring_forkL__x__forkR.pt` | **bred** | enwik8 text | the hybrid-vigor offspring: **bpc 2.4636 vs best parent 2.5047 (+0.0412)** | |
| | `enwik8-darwin/darwin_log.json` | log | — | shared-ancestor breeding → vigor | |
| | `enwik8-darwin/darwin_log_independent.json` | log | — | independent-init breeding → **no** vigor (control) | |
|
|
| ## Headline result: genealogy decides hybrid vigor |
|
|
| Identical SLERP breeder, different parent *relationship* (enwik8 bpc, lower better): |
|
|
| | Parents | Origin | Gen-0 child | Champion | Best parent | Vigor? | |
| |---|---|---|---|---|---| |
| | independent inits | different basins | 3.26 | 2.3064 | 2.3063 | **No** | |
| | shared ancestor, split data | same basin | 2.47 | **2.4633** | 2.5047 | **Yes (+0.041)** | |
|
|
| Crossbreeding only works between **mode-connected** parents (shared ancestor, |
| specialized differently). See the GitHub `docs/darwin-breeding.md`. |
|
|
| ## Loading |
|
|
| ```python |
| import torch |
| from nGPT_pytorch import nGPT |
| import ngpt_patch # restore __hash__ on nGPT modules; import BEFORE constructing |
| |
| ck = torch.load("uncond/styleA/model.pt", map_location="cuda", weights_only=False) |
| model = nGPT(**ck["config"]).cuda(); model.load_state_dict(ck["model"]); model.eval() |
| stoi, itos = ck["stoi"], ck["itos"] |
| # see code/sample.py in the GitHub repo for the full conditional/unconditional sampler |
| ``` |
|
|
| > Checkpoints with `variant == "smith"` need the `SmithResidual` swap before |
| > construction (see `train_compare.make_model` in the source lab). |
|
|
| ## Example — `uncond/styleA` (unconditional, T=0.6) |
|
|
| ``` |
| _ |
| ( ) |
| \ ( ) ) |
| \ /\) (/\ |
| \ /` ` |
| | dlb |
| ``` |
|
|
| Per-checkpoint sample galleries are in the GitHub repo under `galleries/`. |
|
|
| ## Provenance & license |
|
|
| Models are derived from the apehex / mrzjy ASCII-art corpora and enwik8. Released |
| MIT for the model weights and code; original ASCII art belongs to its respective |
| artists (signatures like `dlb`, `jgs`, `sjw`, `ejm` are preserved in outputs). |
|
|