| --- |
| license: agpl-3.0 |
| language: |
| - en |
| tags: |
| - text-generation |
| - smol |
| - permacomputer |
| - bandit-curriculum |
| pipeline_tag: text-generation |
| --- |
| |
| # ANDREA-12M |
|
|
| **A**utonomous **N**eural **D**ata **R**ecipe for **E**ducation and **A**gency |
|
|
| A 12.8M parameter language model grown on a single RTX 4090 using a bandit-controlled curriculum. |
| Part of the permacomputer project β open source, open data, open weights. |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |----------|-------| |
| | Parameters | 12.8M | |
| | Architecture | Transformer decoder, 384d/12h/6L | |
| | Embedding dim | 384 | |
| | Heads | 12 | |
| | Layers | 6 | |
| | Context | 1024 tokens | |
| | Tokenizer | Harris morpheme (2048 segments, 2305 vocab) | |
| | Training steps | 43,587 | |
| | Final SMMA loss | 2.0 | |
| | Best single-step loss | 0.21 | |
| | Training time | ~72 hours | |
| | Hardware | Single NVIDIA RTX 4090 (24GB VRAM, 1.4GB used) | |
| | CUDA engine | microgpt_cuda.cu (custom, FP32) | |
| | Born | 2026-03-21 12:53 UTC / 08:53 EST | |
| | License | AGPL-3.0 | |
| |
| ## Files |
| |
| | File | Step | Description | |
| |------|------|-------------| |
| | `ANDREA-12M.bin` | 43,587 | Final checkpoint (SMMA 2.0) | |
| | `ANDREA-12M-best.bin` | 42,300 | Best checkpoint (lowest loss during training) | |
| | `harris_segments.json` | β | Harris tokenizer segments (required for inference and fine-tuning) | |
|
|
| ### Checkpoint format |
|
|
| Binary, little-endian: `[int32 step][int32 n_params][n_params Γ float32 weights][n_params Γ float32 m][n_params Γ float32 v]` |
|
|
| - **Weights**: model parameters (12.8M floats, ~49MB) |
| - **m**: Adam first moment (same size) |
| - **v**: Adam second moment (same size) |
| - Total: ~147MB per checkpoint |
|
|
| Use either checkpoint to resume fine-tuning (weights + optimizer state preserved) |
| or extract weights only for inference (first `n_params` floats after the 8-byte header). |
|
|
| ## Training Data |
|
|
| Trained on a curated mix of open conversational and educational data: |
|
|
| - **NousResearch/Hermes-3-Dataset** (general, creative, roleplay) β 590K conversations |
| - **Dictionary** β 88K word definitions distilled from Hermes 3 8B |
| - **Gutenberg** β public domain literature (Project Gutenberg) |
| - Additional: chat, smoltalk, oasst, dolly, IRC, repo-docs |
|
|
| Data mix controlled by a UCB1 multi-armed bandit with dice-based phase control. |
| The bandit dynamically adjusts source weights during training based on per-source |
| loss trajectories. Full curriculum specification in the white paper. |
|
|
| ## Training Recipe |
|
|
| - Harris morpheme tokenizer (2048 segments) |
| - Cosine LR schedule with warm restart at step 25K (0.0004 peak) |
| - Phase-based bandit: 2 focus arms, 1d3 dice, source floors |
| - Checkpoints every 100 steps, SIGTERM-safe |
| - Per-source reward attribution, epoch penalty, coverage tracking |
|
|
| ## Capabilities |
|
|
| ANDREA-12M learns patterns, not facts. At 12.8M parameters it produces: |
| - Correct Q&A turn structure (`> question / < answer`) |
| - Definition-style responses |
| - Multi-sentence outputs with plausible grammar |
| - Instruction-following scaffolding ("explain", "define", "describe") |
|
|
| It does NOT produce factually accurate content β it's a pattern machine. |
| Factual accuracy requires scaling to ANDREA-120M (planned). |
|
|
| ## Usage |
|
|
| ```python |
| # Inference via microgpt |
| from microgpt import load_model, generate_fast |
| |
| model = load_model('ANDREA-12M.json') |
| results = generate_fast(model['state_dict'], model['uchars'], model['bos'], |
| 384, 12, 6, 1024, prefix='> what is an apple? / <') |
| print(results[0][0]) |
| ``` |
|
|
| ## White Paper |
|
|
| [ANDREA-12M-WHITEPAPER.pdf](ANDREA-12M-WHITEPAPER.pdf) β full technical paper covering architecture, bandit curriculum, data sources, training recipe, and results. |
|
|
| Source: `whitepaper/ANDREA/WHITEPAPER.rst` in the [uncloseai-cli repository](https://git.unturf.com/engineering/unturf/uncloseai-cli). |
|
|
| ## Citation |
|
|
| ``` |
| ANDREA: Autonomous Neural Data Recipe for Education and Agency |
| TimeHexOn, foxhop, russell@unturf |
| March 2026, permacomputer.com |
| ``` |
|
|
| ## License |
|
|
| AGPL-3.0. Code outlasts authors. Infrastructure outlasts builders. |
|
|
| β β |
|
|