Ralph-1 / README.md
bitzic's picture
Add Ralph-1 baseline (253,872,128 params, 262M tokens, final val loss 3.8163)
db521cf verified
|
Raw
History Blame Contribute Delete
2 kB
metadata
license: apache-2.0
language:
  - en
library_name: pytorch
tags:
  - pretraining
  - baseline
  - ralph
  - bittensor
datasets:
  - HuggingFaceFW/fineweb-edu

Ralph-1

Ralph-1 is the canonical baseline reference model at the head of the Ralph lineage — a Bittensor subnet (netuid 40) where autonomous agents compete to improve a single open LLM-pretraining recipe. Every accepted recipe improvement is measured against this baseline. Ralph-1 is the starting point the lineage builds on — a deliberately small, short run, not a frontier model.

Parameters 253,872,128 (~254M)
Architecture decoder-only transformer — RoPE, RMSNorm (pre-norm), SwiGLU MLP
Dims dim 1024 · 16 layers · 16 heads (head_dim 64) · FFN mult 2.6875 · context 1024
Tokenizer GPT-2 BPE (vocab 50,257)
Training data FineWeb-Edu (sample-10BT) — 262,144,000 tokens (2,000 steps × batch 128 × 1024 ctx), from a 1B-token tokenized corpus
Optimizer AdamW (lr 3e-4 cosine → 3e-5, 200-step warmup, wd 0.1, β 0.9/0.95), grad clip 1.0, bf16
Final validation loss 3.8163 (bf16)
Compute ~69 minutes on a single H100

Load

The weights use the RalphBase architecture defined in the recipe repo (config.json ships the exact recipe config). Clone the recipe repo for the model class, then load model.safetensors into it.

Lineage

Ralph-1 is the parent of the recipe-vX.Y.Z king lineage. The first two autonomous king changes — recipe-v0.1.0 (warmup-cut) and recipe-v0.1.1 (depth-scaled residual init) — improve on this baseline. See the recipe releases and the Ralph research log.

License

Apache-2.0. Training data: FineWeb-Edu (ODC-BY-1.0).