ANDREA model family

Autonomous Neural Data Recipe for Education and Agency

A family of small language models grown on a single RTX 4090 using a bandit-controlled curriculum over open data. Part of a permacomputer project β€” open source, open data, open weights.

Model Params Architecture Status
ANDREA-12M 12.8M 384d / 12h / 6L / 1024ctx Shipped 2026-03-21
ANDREA-120M 98.7M 768d / 12h / 12L / 1024ctx Shipped 2026-05-27

ANDREA-12M

A 12.8M parameter language model grown on a single RTX 4090 using a bandit-controlled curriculum.

Model Details

Property Value
Parameters 12.8M
Architecture Transformer decoder, 384d/12h/6L
Embedding dim 384
Heads 12
Layers 6
Context 1024 tokens
Tokenizer Harris morpheme (2048 segments, 2305 vocab)
Training steps 43,587
Final SMMA loss 2.0
Best single-step loss 0.21
Training time ~72 hours
Hardware Single NVIDIA RTX 4090 (24GB VRAM, 1.4GB used)
CUDA engine microgpt_cuda.cu (custom, FP32)
Born 2026-03-21 12:53 UTC / 08:53 EST
License AGPL-3.0

Files

File Step Description
ANDREA-12M.bin 43,587 Final checkpoint (SMMA 2.0)
ANDREA-12M-best.bin 42,300 Best checkpoint (lowest loss during training)
ANDREA-12M.json 43,587 Portable Python-engine model
ANDREA-12M-TRAIN.json β€” Training config
harris_segments.json β€” Harris tokenizer segments (2048) β€” required for inference

Checkpoint format

Binary, little-endian: [int32 step][int32 n_params][n_params Γ— float32 weights][n_params Γ— float32 m][n_params Γ— float32 v]

  • Weights: model parameters (12.8M floats, ~49MB)
  • m: Adam first moment (same size)
  • v: Adam second moment (same size)
  • Total: ~147MB per checkpoint

Use either checkpoint to resume fine-tuning (weights + optimizer state preserved) or extract weights only for inference (first n_params floats after the 8-byte header).

Training Data

A curated mix of open conversational and educational data:

  • NousResearch/Hermes-3-Dataset (general, creative, roleplay) β€” 590K conversations
  • Dictionary β€” 88K word definitions distilled from Hermes 3 8B
  • Gutenberg β€” public domain literature (Project Gutenberg)
  • Additional: chat, smoltalk, oasst, dolly, IRC, repo-docs

Data mix controlled by a UCB1 multi-armed bandit with dice-based phase control. A bandit dynamically adjusts source weights during training based on per-source loss trajectories. Full curriculum specification in a white paper.

Training Recipe

  • Harris morpheme tokenizer (2048 segments)
  • Cosine LR schedule with warm restart at step 25K (0.0004 peak)
  • Phase-based bandit: 2 focus arms, 1d3 dice, source floors
  • Checkpoints every 100 steps, SIGTERM-safe
  • Per-source reward attribution, epoch penalty, coverage tracking

Capabilities

ANDREA-12M learns patterns, not facts. At 12.8M parameters it produces:

  • Correct Q&A turn structure (> question / < answer)
  • Definition-style responses
  • Multi-sentence outputs with plausible grammar
  • Instruction-following scaffolding ("explain", "define", "describe")

It does NOT produce factually accurate content β€” it's a pattern machine. Factual accuracy requires scaling to ANDREA-120M.

Usage

from microgpt import load_model, generate_fast

model = load_model('ANDREA-12M.json')
results = generate_fast(model['state_dict'], model['uchars'], model['bos'],
                        384, 12, 6, 1024, prefix='> what is an apple? / <')
print(results[0][0])

ANDREA-120M

A 98.7M parameter language model β€” same bandit-controlled curriculum, same permacomputer recipe, scaled up ~8x from ANDREA-12M. First member of an ANDREA family to produce factual coherence in addition to pattern coherence.

Model Details

Property Value
Parameters 98,698,752 (~98.7M, labeled "120M")
Architecture Transformer decoder, 768d/12h/12L
Embedding dim 768
Heads 12
Layers 12
Context 1024 tokens
Tokenizer Harris morpheme (8192 segments, 8449 vocab)
Training steps 149,700
Latest EMA loss ~1.38 (last 2K steps)
Hardware Single NVIDIA RTX 4090 (24GB VRAM, ~22GB used during training)
CUDA engine microgpt_cuda.cu (custom, FP16 cuBLAS, sm_89)
LR 0.0003 (cosine schedule, post-polish-pivot)
Born 2026-05-27 20:50 UTC / 16:50 EDT
License AGPL-3.0

Files

File Step Description
ANDREA-120M.bin 149,700 Latest checkpoint
ANDREA-120M-best.bin 145,500 Best checkpoint (bandit-selected, lowest SMMA loss)
ANDREA-120M.json 149,700 Portable Python-engine model (~2GB)
ANDREA-120M-TRAIN.json β€” Training config (polish-pivot variant)
ANDREA-120M-harris-segments.json β€” Harris tokenizer segments (8192) β€” required for inference

Checkpoint format

Same binary format as ANDREA-12M: [int32 step][int32 n_params][n_params Γ— float32 weights][n_params Γ— float32 m][n_params Γ— float32 v]

  • Weights: model parameters (98.7M floats, ~376MB)
  • m: Adam first moment (same size)
  • v: Adam second moment (same size)
  • Total: ~1.13GB per checkpoint

Training Data (megachat-v8 composite)

Source family Sources
Chat chat, smoltalk, oasst, dolly, unfirehose-chat, synthetic-chat
Knowledge gutenberg, dictionary
Hermes hermes3-general, hermes3-creative, hermes3-roleplay
Social irc-qa-strict, unweapon
Meta repo-commits

Sources excluded from a chatty-track curriculum: real-tool-calls, synthetic-bash, tool-calls, hermes3-code, hermes3-math, repo-docs, repo-docstrings (tool-caller and code-doc material β€” separate model family).

Per-source caps and floors are documented in ANDREA-120M-TRAIN.json.

Training Recipe

  • Harris morpheme tokenizer (8192 segments, vocab_size=8449)
  • Cosine LR schedule, lr=0.0003 peak (lr=0.001 caused gradient explosion on a v8 corpus)
  • Adam betas 0.9 / 0.999, eps 1e-8
  • Block size 1024, batch size 8 (FP16 cuBLAS on sm_89)
  • Phase-based bandit: dice-controlled UCB1 focus over 16 sources
  • Per-source reward attribution (EMA per source, alpha=0.1)
  • Indexed random-access sampling (.tok.idx byte offsets) β€” O(K) sampling per round
  • Checkpoints every 100 steps, sample every 200, SIGUSR1 checkpoint signal
  • Polish pivot at step 112K: removed repo-docs/docstrings, tightened knowledge caps, raised chat floors. Resumed from step_112600.bin.

Curriculum (Firehose Bandit v5)

Each step picks one source via weighted random, fills 1024 tokens with consecutive documents (BOS-separated), trains one step. Phases are 7-42 steps; at phase start a 1d4 dice roll selects how many focus arms come from random vs UCB1:

Dice Random arms Bandit arms
0 3 0
1 2 1
2 1 2
3 0 3

Focus arms get 2.0x weight, non-focus arms 0.5x. Random picks always go first, so a bandit cannot lock onto easy sources.

Capabilities

At 98.7M parameters, ANDREA-120M produces:

  • Multi-paragraph coherent English prose
  • Chat turn structure (> ... / < ...) with on-topic responses
  • Definitions, short factual answers (low accuracy but plausible)
  • Haiku and short verse (training data side-effect from gutenberg)
  • IRC-style chat exchanges

It is still a smol model. Factual accuracy is limited. Use it for permacomputer-scale chat β€” coherent companion, not reference oracle.

Usage

from microgpt import load_model, generate_fast

model = load_model('ANDREA-120M.json')
results = generate_fast(model['state_dict'], model['uchars'], model['bos'],
                        768, 12, 12, 1024, prefix='> what is an apple? / <')
print(results[0][0])

Resume fine-tuning

# Place ANDREA-120M.bin into a checkpoint dir as step_149700.bin
# Use scripts/train-via-proxy.py with a proxy on RTX 4090
curl -d @ANDREA-120M-TRAIN.json https://training.ai.unturf.com/train

White Paper

ANDREA-12M-WHITEPAPER.pdf β€” full technical paper covering architecture, bandit curriculum, data sources, training recipe, and results.

Source: whitepaper/ANDREA/ANDREA-WHITEPAPER.rst in an uncloseai-cli repository.

Citation

ANDREA: Autonomous Neural Data Recipe for Education and Agency
TimeHexOn, foxhop, russell@unturf
March 2026 (12M), May 2026 (120M), permacomputer.com

License

AGPL-3.0. Code outlasts authors. Infrastructure outlasts builders.


File integrity

Verify any downloaded file with md5sum <file> or sha256sum <file>. For LFS-stored files, the SHA-256 hash here also equals the HuggingFace LFS object ID, so HF's own UI shows the same value.

README.md is excluded β€” it cannot meaningfully checksum itself.

ANDREA-12M

File Size (bytes) md5 sha256
ANDREA-12M.bin 153,363,464 f8db228b75d5cc532a6f8d5ec13895ba 927baf98b44cdba986f69079a259cc3b8019eb4fab210bd5bcaad703a6d50626
ANDREA-12M-best.bin 153,363,464 33ed7f9b79872e2922579a70eb837a40 f67dc5f259b51e9cade5db845b71cdbbe15313cafb42914a5ba630278adc2f39
ANDREA-12M.json 277,281,158 f64bb0529fc3adce687e5653ece712eb fd3761a713c3c75750ba1944f7040ed208a9dbeab17df38177b8e40dfb1763c7
ANDREA-12M-TRAIN.json 1,281 c5a5ec2893ef14bf9397be43a9338d38 c559e57a7fc7424a0f64be8f52ff860b48e2be03be6cbb8b8585b76f41c1dca9
ANDREA-12M-WHITEPAPER.pdf 1,881,508 5ec11ab6dd63437a410d4cfcd1280d2d c76ad5f7b34d4baa6e6f2f8a37d5dc4cfb5d185ef4490750b41dc55cf0f01d88
harris_segments.json 18,058 9c12bbda14c087dd8eff2fd7b0df3f8f 989ff6405af744e19d17a73d5b33ab3d6169adeeb4f58c6c3fcde889081f0be3

ANDREA-120M

File Size (bytes) md5 sha256
ANDREA-120M.bin 1,184,385,032 321f77ebc85a2cceb589bdc63cbb843a 94dc089719cc0f1d4d50ae4f0d6fe6a4a433f9a82fb9c5be32e1049870de37a9
ANDREA-120M-best.bin 1,184,385,032 bd33d49765c617ad5c7d81e842b98c52 8a5332cc3655f59b67899b0127f506e2be6b474017618d666c0c095c0fc652fb
ANDREA-120M.json 2,141,351,592 755b975f3f9e0427294c5a16ee5ee41e d842f919372b74784a2dd3738fb0629ae50b98b3539c70389715a1037c6e29d9
ANDREA-120M-TRAIN.json 3,230 3d378c2ac4c51ddd6eaf3aea145191a7 6059ab2097b84b5d14c6b66b04888c89ac1963d7c24ffc940b47ee1828003eca
ANDREA-120M-harris-segments.json 79,201 e381bcb4326e111299b4b70b35a788de 8c188a76ce346f641608316cdc7a6cb25e44509ba88471ea9de22bea8452101f
ANDREA-120M-state.json 68,710 6221d6209e7cc495cfc6fa8915de1549 c04394bbb44b8d2724f169ea7285ff6bdf089f08fd53185ee00d332d815a18a9
ANDREA-120M-loss.json 659,999 89cc76b59af7634aba23a8d06fc4c32a a702da0911c692854a998f975f61dee8108d9d36375d35a4e64693a54c1ed522

Checksums computed 2026-05-27 from canonical source files (server + HF for the already-shipped 12M release). Any future re-mirror should regenerate this section.

● β—‹

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support