Install

Tilelli runs on CPU. You don't need a GPU. The whole install is ~120 MB (torch + the bundled 39 MB checkpoint).

CPU-only — recommended for everyone

The default pip install torch on Linux pulls the CUDA build (2+ GB, plus matching nvidia-* runtime wheels). On macOS and Windows the default wheel is already CPU; on Linux it is not. Save yourself the bandwidth:

# 1. Get CPU torch first (works on Linux, macOS, Windows)
pip install --index-url https://download.pytorch.org/whl/cpu torch

# 2. Then install Tilelli
git clone https://github.com/TilelliLab/Tilelli-llm
cd tilelli
pip install -e .

# 3. Talk to it
python chat.py "Hello, who are you?"

GPU (optional)

If you actually have a GPU and want to run faster:

# CUDA 12.x build (Linux):
pip install --index-url https://download.pytorch.org/whl/cu121 torch
# or MPS (macOS): the default macOS wheel already includes MPS.
pip install -e .

Inference works fine on CPU — the bundled v4 ckpt is 10 M parameters and the generation loop is single-threaded NumPy-friendly. A GPU buys you ~5–10× faster generation, not a different model.

Verifying the install

pip install -e ".[test]"
pytest -q tests/

You should see three smoke tests pass (model loads, tokenizer round-trips, one generation step runs).

Training your own (out of the box)

The kit ships a ~700 KB TinyStories slice at data/tinystories_demo/ so training works without any download:

# 50 steps on CPU, takes a couple of minutes:
python scripts/train.py --model tilelli-lite-fp32    --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu
python scripts/train.py --model tilelli-lite-ternary --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu
python scripts/train.py --model vanilla-fp32         --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu

Each run writes checkpoints + a per-step JSONL log to runs/<model>_<timestamp>/. The README lists the 5 supported --model configs.

Reproducing the claims

The four reproduce/0N_*.py scripts are described in the README. Each exits non-zero if the bundled v4 checkpoint fails to produce the documented number within ±5 %.

python reproduce/03_abstain_held_out.py     # held-out IDK gate
python reproduce/04_neo_false_inability.py  # false-inability probe
python reproduce/02_metacog_probe.py        # cross-regime AUROC

A fourth script (01_benchmark.py) is an architecture-only check: it loads the bundled v4 checkpoint, prints the 10.18 M parameter count, and exits PASS. It runs in ~2 s on CPU. The full val-bpc-vs-vanilla re-run requires the FineWeb-Edu training pipeline, which is NOT bundled; the documented number lives in results/claim_01_benchmark.md.

Troubleshooting

"sequence length N > max_seq_len 256": the bundled ckpt has a context window of 256 bytes. If chat.py hits this, your prompt is too long; trim it.
"weights_only=True" load error: the loader passes weights_only=False because the checkpoint was authored by us. Trust the bundled artifact; for any third-party ckpt, verify the SHA first (the SHA for v4 is in the README).
macOS Apple Silicon: PyTorch ≥2.1 ships native arm64 wheels; no Rosetta needed.
Windows: the runtime helpers in src/tilelli/utils/runtime.py touch /sys/class/thermal/ on Linux only; the calls are exception- swallowed elsewhere. No action needed.

License

Apache 2.0. See LICENSE. The bundled weights ship under the same license. The name "Tilelli" is not licensed by this file — fork freely, rename if you ship a derivative.