Space LLM - 10M Parameter Astronomy Language Model

A custom decoder-only transformer trained from scratch on space/astronomy text.

Model Details

  • Parameters: 10.4 million
  • Architecture: Decoder-only transformer
    • 6 layers, 256 hidden dim, 8 attention heads
    • RoPE (Rotary Position Embedding)
    • SwiGLU activation
    • RMSNorm
    • Tied input/output embeddings
  • Context length: 256 tokens
  • Vocab size: 16,000 (SentencePiece BPE)
  • Framework: JAX/Flax
  • Training: TPU v3-8

Training Data

~34 million tokens collected from:

  • Wikipedia (space/astronomy articles via API)
  • arXiv (astro-ph, gr-qc, hep-ph abstracts)
  • NASA APIs (APOD, Image Library, NSSDCA planetary fact sheets)
  • Comprehensive space knowledge base

Performance

  • Best validation loss: 5.24
  • Perplexity: 188

Files

  • best.pkl - Best model checkpoint (Flax params, pickle format)
  • final.pkl - Final model checkpoint
  • model_config.json - Model architecture config
  • space_tokenizer.model - SentencePiece tokenizer
  • step_*.pkl - Intermediate checkpoints

Usage

import pickle, json
import jax, jax.numpy as jnp
from flax import linen as nn
import sentencepiece as spm

# Load config and weights
with open("model_config.json") as f:
    cfg = json.load(f)
with open("best.pkl", "rb") as f:
    params = pickle.load(f)["params"]

# Load tokenizer
sp = spm.SentencePieceProcessor(model_file="space_tokenizer.model")

# See https://github.com/korfalor-cloud/space-llm for full model code

Source Code

Full training code: https://github.com/korfalor-cloud/space-llm

Limitations

This is a small (10M param) model trained on limited data (34M tokens). It learns space vocabulary and topic associations but does not produce fully coherent prose. It's an educational demonstration of training an LLM from scratch.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support