ANDREA model family

Autonomous Neural Data Recipe for Education and Agency

A family of small language models grown on a single RTX 4090 using a bandit-controlled curriculum over open data. Part of a permacomputer project — open source, open data, open weights.

Model	Params	Architecture	Status
ANDREA-12M	12.8M	384d / 12h / 6L / 1024ctx	Shipped 2026-03-21
ANDREA-120M	98.7M	768d / 12h / 12L / 1024ctx	Shipped 2026-05-27

ANDREA-12M

A 12.8M parameter language model grown on a single RTX 4090 using a bandit-controlled curriculum.

Model Details

Property	Value
Parameters	12.8M
Architecture	Transformer decoder, 384d/12h/6L
Embedding dim	384
Heads	12
Layers	6
Context	1024 tokens
Tokenizer	Harris morpheme (2048 segments, 2305 vocab)
Training steps	43,587
Final SMMA loss	2.0
Best single-step loss	0.21
Training time	~72 hours
Hardware	Single NVIDIA RTX 4090 (24GB VRAM, 1.4GB used)
CUDA engine	microgpt_cuda.cu (custom, FP32)
Born	2026-03-21 12:53 UTC / 08:53 EST
License	AGPL-3.0

Files

File	Step	Description
`ANDREA-12M.bin`	43,587	Final checkpoint (SMMA 2.0)
`ANDREA-12M-best.bin`	42,300	Best checkpoint (lowest loss during training)
`ANDREA-12M.json`	43,587	Portable Python-engine model
`ANDREA-12M-TRAIN.json`	—	Training config
`harris_segments.json`	—	Harris tokenizer segments (2048) — required for inference

Checkpoint format

Binary, little-endian: [int32 step][int32 n_params][n_params × float32 weights][n_params × float32 m][n_params × float32 v]

Weights: model parameters (12.8M floats, ~49MB)
m: Adam first moment (same size)
v: Adam second moment (same size)
Total: ~147MB per checkpoint

Use either checkpoint to resume fine-tuning (weights + optimizer state preserved) or extract weights only for inference (first n_params floats after the 8-byte header).

Training Data

A curated mix of open conversational and educational data:

NousResearch/Hermes-3-Dataset (general, creative, roleplay) — 590K conversations
Dictionary — 88K word definitions distilled from Hermes 3 8B
Gutenberg — public domain literature (Project Gutenberg)
Additional: chat, smoltalk, oasst, dolly, IRC, repo-docs

Data mix controlled by a UCB1 multi-armed bandit with dice-based phase control. A bandit dynamically adjusts source weights during training based on per-source loss trajectories. Full curriculum specification in a white paper.

Training Recipe

Harris morpheme tokenizer (2048 segments)
Cosine LR schedule with warm restart at step 25K (0.0004 peak)
Phase-based bandit: 2 focus arms, 1d3 dice, source floors
Checkpoints every 100 steps, SIGTERM-safe
Per-source reward attribution, epoch penalty, coverage tracking

Capabilities

ANDREA-12M learns patterns, not facts. At 12.8M parameters it produces:

Correct Q&A turn structure (> question / < answer)
Definition-style responses
Multi-sentence outputs with plausible grammar
Instruction-following scaffolding ("explain", "define", "describe")

It does NOT produce factually accurate content — it's a pattern machine. Factual accuracy requires scaling to ANDREA-120M.

Usage

from microgpt import load_model, generate_fast

model = load_model('ANDREA-12M.json')
results = generate_fast(model['state_dict'], model['uchars'], model['bos'],
                        384, 12, 6, 1024, prefix='> what is an apple? / <')
print(results[0][0])

ANDREA-120M

A 98.7M parameter language model — same bandit-controlled curriculum, same permacomputer recipe, scaled up ~8x from ANDREA-12M. First member of an ANDREA family to produce factual coherence in addition to pattern coherence.

Model Details

Property	Value
Parameters	98,698,752 (~98.7M, labeled "120M")
Architecture	Transformer decoder, 768d/12h/12L
Embedding dim	768
Heads	12
Layers	12
Context	1024 tokens
Tokenizer	Harris morpheme (8192 segments, 8449 vocab)
Training steps	149,700
Latest EMA loss	~1.38 (last 2K steps)
Hardware	Single NVIDIA RTX 4090 (24GB VRAM, ~22GB used during training)
CUDA engine	microgpt_cuda.cu (custom, FP16 cuBLAS, sm_89)
LR	0.0003 (cosine schedule, post-polish-pivot)
Born	2026-05-27 20:50 UTC / 16:50 EDT
License	AGPL-3.0

Files

File	Step	Description
`ANDREA-120M.bin`	149,700	Latest checkpoint
`ANDREA-120M-best.bin`	145,500	Best checkpoint (bandit-selected, lowest SMMA loss)
`ANDREA-120M.json`	149,700	Portable Python-engine model (~2GB)
`ANDREA-120M-TRAIN.json`	—	Training config (polish-pivot variant)
`ANDREA-120M-harris-segments.json`	—	Harris tokenizer segments (8192) — required for inference

Checkpoint format

Same binary format as ANDREA-12M: [int32 step][int32 n_params][n_params × float32 weights][n_params × float32 m][n_params × float32 v]

Weights: model parameters (98.7M floats, ~376MB)
m: Adam first moment (same size)
v: Adam second moment (same size)
Total: ~1.13GB per checkpoint

Training Data (megachat-v8 composite)

Source family	Sources
Chat	chat, smoltalk, oasst, dolly, unfirehose-chat, synthetic-chat
Knowledge	gutenberg, dictionary
Hermes	hermes3-general, hermes3-creative, hermes3-roleplay
Social	irc-qa-strict, unweapon
Meta	repo-commits

Sources excluded from a chatty-track curriculum: real-tool-calls, synthetic-bash, tool-calls, hermes3-code, hermes3-math, repo-docs, repo-docstrings (tool-caller and code-doc material — separate model family).

Per-source caps and floors are documented in ANDREA-120M-TRAIN.json.

Training Recipe

Harris morpheme tokenizer (8192 segments, vocab_size=8449)
Cosine LR schedule, lr=0.0003 peak (lr=0.001 caused gradient explosion on a v8 corpus)
Adam betas 0.9 / 0.999, eps 1e-8
Block size 1024, batch size 8 (FP16 cuBLAS on sm_89)
Phase-based bandit: dice-controlled UCB1 focus over 16 sources
Per-source reward attribution (EMA per source, alpha=0.1)
Indexed random-access sampling (.tok.idx byte offsets) — O(K) sampling per round
Checkpoints every 100 steps, sample every 200, SIGUSR1 checkpoint signal
Polish pivot at step 112K: removed repo-docs/docstrings, tightened knowledge caps, raised chat floors. Resumed from step_112600.bin.

Curriculum (Firehose Bandit v5)

Each step picks one source via weighted random, fills 1024 tokens with consecutive documents (BOS-separated), trains one step. Phases are 7-42 steps; at phase start a 1d4 dice roll selects how many focus arms come from random vs UCB1:

Dice	Random arms	Bandit arms
0	3	0
1	2	1
2	1	2
3	0	3

Focus arms get 2.0x weight, non-focus arms 0.5x. Random picks always go first, so a bandit cannot lock onto easy sources.

Capabilities

At 98.7M parameters, ANDREA-120M produces:

Multi-paragraph coherent English prose
Chat turn structure (> ... / < ...) with on-topic responses
Definitions, short factual answers (low accuracy but plausible)
Haiku and short verse (training data side-effect from gutenberg)
IRC-style chat exchanges

It is still a smol model. Factual accuracy is limited. Use it for permacomputer-scale chat — coherent companion, not reference oracle.

Usage

from microgpt import load_model, generate_fast

model = load_model('ANDREA-120M.json')
results = generate_fast(model['state_dict'], model['uchars'], model['bos'],
                        768, 12, 12, 1024, prefix='> what is an apple? / <')
print(results[0][0])

Resume fine-tuning

# Place ANDREA-120M.bin into a checkpoint dir as step_149700.bin
# Use scripts/train-via-proxy.py with a proxy on RTX 4090
curl -d @ANDREA-120M-TRAIN.json https://training.ai.unturf.com/train

White Paper

ANDREA-12M-WHITEPAPER.pdf — full technical paper covering architecture, bandit curriculum, data sources, training recipe, and results.

Source: whitepaper/ANDREA/ANDREA-WHITEPAPER.rst in an uncloseai-cli repository.

Citation

ANDREA: Autonomous Neural Data Recipe for Education and Agency
TimeHexOn, foxhop, russell@unturf
March 2026 (12M), May 2026 (120M), permacomputer.com

License

AGPL-3.0. Code outlasts authors. Infrastructure outlasts builders.

File integrity

Verify any downloaded file with md5sum <file> or sha256sum <file>. For LFS-stored files, the SHA-256 hash here also equals the HuggingFace LFS object ID, so HF's own UI shows the same value.

README.md is excluded — it cannot meaningfully checksum itself.

ANDREA-12M

File	Size (bytes)	md5	sha256
`ANDREA-12M.bin`	153,363,464	`f8db228b75d5cc532a6f8d5ec13895ba`	`927baf98b44cdba986f69079a259cc3b8019eb4fab210bd5bcaad703a6d50626`
`ANDREA-12M-best.bin`	153,363,464	`33ed7f9b79872e2922579a70eb837a40`	`f67dc5f259b51e9cade5db845b71cdbbe15313cafb42914a5ba630278adc2f39`
`ANDREA-12M.json`	277,281,158	`f64bb0529fc3adce687e5653ece712eb`	`fd3761a713c3c75750ba1944f7040ed208a9dbeab17df38177b8e40dfb1763c7`
`ANDREA-12M-TRAIN.json`	1,281	`c5a5ec2893ef14bf9397be43a9338d38`	`c559e57a7fc7424a0f64be8f52ff860b48e2be03be6cbb8b8585b76f41c1dca9`
`ANDREA-12M-WHITEPAPER.pdf`	1,881,508	`5ec11ab6dd63437a410d4cfcd1280d2d`	`c76ad5f7b34d4baa6e6f2f8a37d5dc4cfb5d185ef4490750b41dc55cf0f01d88`
`harris_segments.json`	18,058	`9c12bbda14c087dd8eff2fd7b0df3f8f`	`989ff6405af744e19d17a73d5b33ab3d6169adeeb4f58c6c3fcde889081f0be3`

ANDREA-120M

File	Size (bytes)	md5	sha256
`ANDREA-120M.bin`	1,184,385,032	`321f77ebc85a2cceb589bdc63cbb843a`	`94dc089719cc0f1d4d50ae4f0d6fe6a4a433f9a82fb9c5be32e1049870de37a9`
`ANDREA-120M-best.bin`	1,184,385,032	`bd33d49765c617ad5c7d81e842b98c52`	`8a5332cc3655f59b67899b0127f506e2be6b474017618d666c0c095c0fc652fb`
`ANDREA-120M.json`	2,141,351,592	`755b975f3f9e0427294c5a16ee5ee41e`	`d842f919372b74784a2dd3738fb0629ae50b98b3539c70389715a1037c6e29d9`
`ANDREA-120M-TRAIN.json`	3,230	`3d378c2ac4c51ddd6eaf3aea145191a7`	`6059ab2097b84b5d14c6b66b04888c89ac1963d7c24ffc940b47ee1828003eca`
`ANDREA-120M-harris-segments.json`	79,201	`e381bcb4326e111299b4b70b35a788de`	`8c188a76ce346f641608316cdc7a6cb25e44509ba88471ea9de22bea8452101f`
`ANDREA-120M-state.json`	68,710	`6221d6209e7cc495cfc6fa8915de1549`	`c04394bbb44b8d2724f169ea7285ff6bdf089f08fd53185ee00d332d815a18a9`
`ANDREA-120M-loss.json`	659,999	`89cc76b59af7634aba23a8d06fc4c32a`	`a702da0911c692854a998f975f61dee8108d9d36375d35a4e64693a54c1ed522`

Checksums computed 2026-05-27 from canonical source files (server + HF for the already-shipped 12M release). Any future re-mirror should regenerate this section.

● ○

Downloads last month: -; Downloads are not tracked for this model. How to track