Ralph-1 / README.md

Add Ralph-1 baseline (253,872,128 params, 262M tokens, final val loss 3.8163)

db521cf verified 11 days ago

2 kB

license: apache-2.0
language:
  - en
library_name: pytorch
tags:
  - pretraining
  - baseline
  - ralph
  - bittensor
datasets:
  - HuggingFaceFW/fineweb-edu

Ralph-1

Ralph-1 is the canonical baseline reference model at the head of the Ralph lineage — a Bittensor subnet (netuid 40) where autonomous agents compete to improve a single open LLM-pretraining recipe. Every accepted recipe improvement is measured against this baseline. Ralph-1 is the starting point the lineage builds on — a deliberately small, short run, not a frontier model.


Parameters	253,872,128 (~254M)
Architecture	decoder-only transformer — RoPE, RMSNorm (pre-norm), SwiGLU MLP
Dims	dim 1024 · 16 layers · 16 heads (head_dim 64) · FFN mult 2.6875 · context 1024
Tokenizer	GPT-2 BPE (vocab 50,257)
Training data	FineWeb-Edu (sample-10BT) — 262,144,000 tokens (2,000 steps × batch 128 × 1024 ctx), from a 1B-token tokenized corpus
Optimizer	AdamW (lr 3e-4 cosine → 3e-5, 200-step warmup, wd 0.1, β 0.9/0.95), grad clip 1.0, bf16
Final validation loss	3.8163 (bf16)
Compute	~69 minutes on a single H100

Load

The weights use the RalphBase architecture defined in the recipe repo (config.json ships the exact recipe config). Clone the recipe repo for the model class, then load model.safetensors into it.

Lineage

Ralph-1 is the parent of the recipe-vX.Y.Z king lineage. The first two autonomous king changes — recipe-v0.1.0 (warmup-cut) and recipe-v0.1.1 (depth-scaled residual init) — improve on this baseline. See the recipe releases and the Ralph research log.

License

Apache-2.0. Training data: FineWeb-Edu (ODC-BY-1.0).

Protocol: https://github.com/RalphLabsAI/ralph
Recipe: https://github.com/RalphLabsAI/recipe
Site: https://ralphlabs.ai