Space LLM - 10M Parameter Astronomy Language Model

A custom decoder-only transformer trained from scratch on space/astronomy text.

Model Details

Parameters: 10.4 million
Architecture: Decoder-only transformer
- 6 layers, 256 hidden dim, 8 attention heads
- RoPE (Rotary Position Embedding)
- SwiGLU activation
- RMSNorm
- Tied input/output embeddings
Context length: 256 tokens
Vocab size: 16,000 (SentencePiece BPE)
Framework: JAX/Flax
Training: TPU v3-8

Training Data

~34 million tokens collected from:

Wikipedia (space/astronomy articles via API)
arXiv (astro-ph, gr-qc, hep-ph abstracts)
NASA APIs (APOD, Image Library, NSSDCA planetary fact sheets)
Comprehensive space knowledge base

Performance

Best validation loss: 5.24
Perplexity: 188

Files

best.pkl - Best model checkpoint (Flax params, pickle format)
final.pkl - Final model checkpoint
model_config.json - Model architecture config
space_tokenizer.model - SentencePiece tokenizer
step_*.pkl - Intermediate checkpoints

Usage

import pickle, json
import jax, jax.numpy as jnp
from flax import linen as nn
import sentencepiece as spm

# Load config and weights
with open("model_config.json") as f:
    cfg = json.load(f)
with open("best.pkl", "rb") as f:
    params = pickle.load(f)["params"]

# Load tokenizer
sp = spm.SentencePieceProcessor(model_file="space_tokenizer.model")

# See https://github.com/korfalor-cloud/space-llm for full model code

Source Code

Full training code: https://github.com/korfalor-cloud/space-llm

Limitations

This is a small (10M param) model trained on limited data (34M tokens). It learns space vocabulary and topic associations but does not produce fully coherent prose. It's an educational demonstration of training an LLM from scratch.

Downloads last month: -; Downloads are not tracked for this model. How to track