--- license: agpl-3.0 language: - en tags: - text-generation - smol - permacomputer - bandit-curriculum pipeline_tag: text-generation --- # ANDREA-12M **A**utonomous **N**eural **D**ata **R**ecipe for **E**ducation and **A**gency A 12.8M parameter language model grown on a single RTX 4090 using a bandit-controlled curriculum. Part of the permacomputer project — open source, open data, open weights. ## Model Details | Property | Value | |----------|-------| | Parameters | 12.8M | | Architecture | Transformer decoder, 384d/12h/6L | | Embedding dim | 384 | | Heads | 12 | | Layers | 6 | | Context | 1024 tokens | | Tokenizer | Harris morpheme (2048 segments, 2305 vocab) | | Training steps | 43,587 | | Final SMMA loss | 2.0 | | Best single-step loss | 0.21 | | Training time | ~72 hours | | Hardware | Single NVIDIA RTX 4090 (24GB VRAM, 1.4GB used) | | CUDA engine | microgpt_cuda.cu (custom, FP32) | | Born | 2026-03-21 12:53 UTC / 08:53 EST | | License | AGPL-3.0 | ## Files | File | Step | Description | |------|------|-------------| | `ANDREA-12M.bin` | 43,587 | Final checkpoint (SMMA 2.0) | | `ANDREA-12M-best.bin` | 42,300 | Best checkpoint (lowest loss during training) | | `harris_segments.json` | — | Harris tokenizer segments (required for inference and fine-tuning) | ### Checkpoint format Binary, little-endian: `[int32 step][int32 n_params][n_params × float32 weights][n_params × float32 m][n_params × float32 v]` - **Weights**: model parameters (12.8M floats, ~49MB) - **m**: Adam first moment (same size) - **v**: Adam second moment (same size) - Total: ~147MB per checkpoint Use either checkpoint to resume fine-tuning (weights + optimizer state preserved) or extract weights only for inference (first `n_params` floats after the 8-byte header). ## Training Data Trained on a curated mix of open conversational and educational data: - **NousResearch/Hermes-3-Dataset** (general, creative, roleplay) — 590K conversations - **Dictionary** — 88K word definitions distilled from Hermes 3 8B - **Gutenberg** — public domain literature (Project Gutenberg) - Additional: chat, smoltalk, oasst, dolly, IRC, repo-docs Data mix controlled by a UCB1 multi-armed bandit with dice-based phase control. The bandit dynamically adjusts source weights during training based on per-source loss trajectories. Full curriculum specification in the white paper. ## Training Recipe - Harris morpheme tokenizer (2048 segments) - Cosine LR schedule with warm restart at step 25K (0.0004 peak) - Phase-based bandit: 2 focus arms, 1d3 dice, source floors - Checkpoints every 100 steps, SIGTERM-safe - Per-source reward attribution, epoch penalty, coverage tracking ## Capabilities ANDREA-12M learns patterns, not facts. At 12.8M parameters it produces: - Correct Q&A turn structure (`> question / < answer`) - Definition-style responses - Multi-sentence outputs with plausible grammar - Instruction-following scaffolding ("explain", "define", "describe") It does NOT produce factually accurate content — it's a pattern machine. Factual accuracy requires scaling to ANDREA-120M (planned). ## Usage ```python # Inference via microgpt from microgpt import load_model, generate_fast model = load_model('ANDREA-12M.json') results = generate_fast(model['state_dict'], model['uchars'], model['bos'], 384, 12, 6, 1024, prefix='> what is an apple? / <') print(results[0][0]) ``` ## White Paper [ANDREA-12M-WHITEPAPER.pdf](ANDREA-12M-WHITEPAPER.pdf) — full technical paper covering architecture, bandit curriculum, data sources, training recipe, and results. Source: `whitepaper/ANDREA/WHITEPAPER.rst` in the [uncloseai-cli repository](https://git.unturf.com/engineering/unturf/uncloseai-cli). ## Citation ``` ANDREA: Autonomous Neural Data Recipe for Education and Agency TimeHexOn, foxhop, russell@unturf March 2026, permacomputer.com ``` ## License AGPL-3.0. Code outlasts authors. Infrastructure outlasts builders. ● ○