| # Install |
|
|
| Tilelli runs on CPU. You don't need a GPU. The whole install is ~120 MB |
| (torch + the bundled 39 MB checkpoint). |
|
|
| ## CPU-only — recommended for everyone |
|
|
| The default `pip install torch` on Linux pulls the **CUDA** build (2+ GB, |
| plus matching nvidia-* runtime wheels). On macOS and Windows the default |
| wheel is already CPU; on Linux it is not. Save yourself the bandwidth: |
|
|
| ```bash |
| # 1. Get CPU torch first (works on Linux, macOS, Windows) |
| pip install --index-url https://download.pytorch.org/whl/cpu torch |
| |
| # 2. Then install Tilelli |
| git clone https://github.com/TilelliLab/Tilelli-llm |
| cd tilelli |
| pip install -e . |
| |
| # 3. Talk to it |
| python chat.py "Hello, who are you?" |
| ``` |
|
|
| ## GPU (optional) |
|
|
| If you actually have a GPU and want to run faster: |
|
|
| ```bash |
| # CUDA 12.x build (Linux): |
| pip install --index-url https://download.pytorch.org/whl/cu121 torch |
| # or MPS (macOS): the default macOS wheel already includes MPS. |
| pip install -e . |
| ``` |
|
|
| Inference works fine on CPU — the bundled v4 ckpt is 10 M parameters and |
| the generation loop is single-threaded NumPy-friendly. A GPU buys you |
| ~5–10× faster generation, not a different model. |
|
|
| ## Verifying the install |
|
|
| ```bash |
| pip install -e ".[test]" |
| pytest -q tests/ |
| ``` |
|
|
| You should see three smoke tests pass (model loads, tokenizer round-trips, |
| one generation step runs). |
|
|
| ## Training your own (out of the box) |
|
|
| The kit ships a ~700 KB TinyStories slice at `data/tinystories_demo/` so |
| training works without any download: |
|
|
| ```bash |
| # 50 steps on CPU, takes a couple of minutes: |
| python scripts/train.py --model tilelli-lite-fp32 --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu |
| python scripts/train.py --model tilelli-lite-ternary --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu |
| python scripts/train.py --model vanilla-fp32 --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu |
| ``` |
|
|
| Each run writes checkpoints + a per-step JSONL log to `runs/<model>_<timestamp>/`. |
| The README lists the 5 supported `--model` configs. |
|
|
| ## Reproducing the claims |
|
|
| The four `reproduce/0N_*.py` scripts are described in the README. Each |
| exits non-zero if the bundled v4 checkpoint fails to produce the |
| documented number within ±5 %. |
|
|
| ```bash |
| python reproduce/03_abstain_held_out.py # held-out IDK gate |
| python reproduce/04_neo_false_inability.py # false-inability probe |
| python reproduce/02_metacog_probe.py # cross-regime AUROC |
| ``` |
|
|
| A fourth script (`01_benchmark.py`) is an architecture-only check: it |
| loads the bundled v4 checkpoint, prints the 10.18 M parameter count, |
| and exits PASS. It runs in ~2 s on CPU. The full val-bpc-vs-vanilla |
| re-run requires the FineWeb-Edu training pipeline, which is NOT bundled; |
| the documented number lives in `results/claim_01_benchmark.md`. |
|
|
| ## Troubleshooting |
|
|
| - **"sequence length N > max_seq_len 256"**: the bundled ckpt has a |
| context window of 256 bytes. If `chat.py` hits this, your prompt is |
| too long; trim it. |
| - **"weights_only=True" load error**: the loader passes |
| `weights_only=False` because the checkpoint was authored by us. Trust |
| the bundled artifact; for any third-party ckpt, verify the SHA first |
| (the SHA for v4 is in the README). |
| - **macOS Apple Silicon**: PyTorch ≥2.1 ships native arm64 wheels; no |
| Rosetta needed. |
| - **Windows**: the runtime helpers in `src/tilelli/utils/runtime.py` |
| touch `/sys/class/thermal/` on Linux only; the calls are exception- |
| swallowed elsewhere. No action needed. |
| |
| ## License |
| |
| Apache 2.0. See `LICENSE`. The bundled weights ship under the same |
| license. The name "Tilelli" is not licensed by this file — fork freely, |
| rename if you ship a derivative. |
| |