I Built a Tiny AI That Explains the Universe
"A towel is about the most massively useful thing an interstellar hitchhiker can have." — Douglas Adams
Douglas Adams was right about towels. He was also, accidentally, right about knowledge.
The best tool you can carry isn't a search engine. It's the ability to understand anything, quickly, without drowning in jargon or waiting for a slow web page to load. That's what I wanted to build. Something small. Something fast. Something that actually makes things click.
I called it Pocket Atlas.
The idea
Most language models are trained to do everything. They write code, draft emails, roleplay as pirates, and occasionally explain things. That last part — the explaining — is what I wanted to isolate and amplify.
Pocket Atlas does one thing:
Take any concept. Make it click.
It answers in a fixed 5-part structure:
What it is — the honest one-sentence definition
Why it matters — why any human being should care
How it works — the mechanism, without the jargon
Simple example — something you can picture
Key takeaway — the thing worth remembering
The format is opinionated. That's the point. Good explanations have a shape, and training a model on thousands of well-shaped explanations teaches it that shape.
The base model: Qwen3.5-0.8B
I built on Qwen3.5-0.8B — a 0.8 billion parameter model from Alibaba that punches well above its weight class. At 800M parameters, it's small enough to run on a MacBook, a Raspberry Pi, or an iPhone via Ollama. It's also genuinely capable — a product of the recent wave of small models that have quietly become very good.
I disabled thinking mode (enable_thinking=False). Pocket Atlas gives direct answers. No internal chain-of-thought leakage. Just clean explanations.
The dataset: Atlas Pages
The dataset is called Atlas Pages. It's synthetic — generated by Claude — and lives at cetusian/atlas-pages on HuggingFace.
~18,000 examples across three complementary sources:
| Source | Count | What it teaches |
|---|---|---|
| Atlas Pages (5-part explanations) | ~6,400 | The format: structured, warm, precise |
| arXiv abstracts | ~8,000 | Technical compression — dense → plain |
| XSum news summaries | ~3,000 | Radical brevity — one sentence, complete |
Each source teaches a different skill. The synthetic data teaches the house format — the 5-part structure I want the model to adopt. arXiv teaches the model to compress dense technical content without losing the core idea. XSum teaches radical economy of words.
Together, they produce a model that can do all three: structure, compress, simplify.
Generating Atlas Pages
I used the Anthropic Batch API to generate the core dataset. Batch API means no rate limits and 50% cheaper than standard API calls. I submitted ~6,600 topics at once and collected results an hour later.
Topics spanned 13 categories: Science & Math, Physics & Cosmology, Biology, Psychology, Philosophy, Economics, Computing & AI, Everyday Concepts, History, Medicine, Law, Engineering, Culture & Society.
Total generation cost for the dataset: ~$7.
Training: LoRA on 2× A100s
Training used Unsloth — a LoRA fine-tuning library that's genuinely fast — on Modal's cloud infrastructure.
Method: LoRA (r=16, alpha=16)
Base model: unsloth/Qwen3.5-0.8B
Dataset: ~17,000 train / 900 val
Batch size: 64 (2× A100 80GB, torchrun DDP)
Epochs: 1
Steps: 269 (with sequence packing)
LR: 2e-4 cosine decay
Optimizer: AdamW 8-bit
Precision: bf16
Train loss: 2.147
The big optimization was sequence packing (packing=True). Instead of padding each sequence to the maximum length, packing fills each batch slot with multiple shorter sequences end-to-end. This eliminated padding waste entirely and cut the step count from ~2,145 to 269 — a 8× reduction.
The second optimization was moving from a notebook to a Modal Function with torchrun DDP across both A100s. Effective batch size went from 2 to 64. Training time: ~68 minutes.
Total compute cost: ~$2.50.
The results
Running on Apple Silicon (MLX):
Generation speed: ~52 tokens/second
Peak memory: 1.6 GB
Ask it anything:
Explain the Fermi paradox.
What It Is: The Fermi paradox is the puzzling observation that, despite the high probability of extraterrestrial civilizations existing, we have found no evidence of them.
Why It Matters: It forces us to confront deep questions: Is life rare? Do civilizations destroy themselves? Are we simply not listening hard enough?
How It Works: Physicist Enrico Fermi reasoned: the universe is 13.8 billion years old, there are hundreds of billions of stars like our Sun, many with Earth-like planets — so where is everyone? The math says they should be everywhere. The silence says otherwise.
Simple Example: Imagine a huge party where statistically thousands of people should be in the kitchen, but you walk in and it's completely empty. You start to wonder: did everyone leave early? Are you the only one who showed up?
Key Takeaway: The Fermi paradox isn't a proof that we're alone — it's a proof that something in our assumptions is wrong. Which assumption? That's the question that keeps astronomers up at night.
Total cost
| Cost | |
|---|---|
| Dataset generation (Batch API) | ~$6.86 |
| arXiv + XSum (HuggingFace) | $0 |
| Modal training (2× A100, 68 min) | ~$2.50 |
| Total | ~$9.36 |
A specialized edge AI explainer, trained from scratch, for less than the price of a lunch.
Running it yourself
MLX (Apple Silicon):
from mlx_lm import load, generate
model, tokenizer = load("cetusian/pocket-atlas-0.8b")
messages = [
{"role": "system", "content": "You explain ideas clearly, warmly, and precisely. Use this format: What it is, Why it matters, How it works, Simple example, Key takeaway."},
{"role": "user", "content": "Explain entropy."}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
response = generate(model, tokenizer, prompt=prompt, max_tokens=600, verbose=True)
Transformers:
from transformers import pipeline
pipe = pipeline("text-generation", model="cetusian/pocket-atlas-0.8b")
GGUF (Ollama / llama.cpp):
Download pocket-atlas-q4_k_m.gguf from the model page.
What's next
The current model was trained on 18k examples. The next version will add:
- Angle variations — ELI5, analogy-only, misconceptions, applications (~4,000 examples)
- Multi-turn dialogues — follow-up questions, going deeper (~3,000 examples)
- More topic categories — Neuroscience, Linguistics, Game Theory, Cryptography, and more (~2,700 topics)
Total dataset target: ~28,000 examples. Training the 2B variant is also on the roadmap.
Links
- Model: cetusian/pocket-atlas-0.8b
- Dataset: cetusian/atlas-pages
- Try it: HuggingFace Space
Don't panic.