File size: 3,709 Bytes
f86dc09 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | # Install
Tilelli runs on CPU. You don't need a GPU. The whole install is ~120 MB
(torch + the bundled 39 MB checkpoint).
## CPU-only — recommended for everyone
The default `pip install torch` on Linux pulls the **CUDA** build (2+ GB,
plus matching nvidia-* runtime wheels). On macOS and Windows the default
wheel is already CPU; on Linux it is not. Save yourself the bandwidth:
```bash
# 1. Get CPU torch first (works on Linux, macOS, Windows)
pip install --index-url https://download.pytorch.org/whl/cpu torch
# 2. Then install Tilelli
git clone https://github.com/TilelliLab/Tilelli-llm
cd tilelli
pip install -e .
# 3. Talk to it
python chat.py "Hello, who are you?"
```
## GPU (optional)
If you actually have a GPU and want to run faster:
```bash
# CUDA 12.x build (Linux):
pip install --index-url https://download.pytorch.org/whl/cu121 torch
# or MPS (macOS): the default macOS wheel already includes MPS.
pip install -e .
```
Inference works fine on CPU — the bundled v4 ckpt is 10 M parameters and
the generation loop is single-threaded NumPy-friendly. A GPU buys you
~5–10× faster generation, not a different model.
## Verifying the install
```bash
pip install -e ".[test]"
pytest -q tests/
```
You should see three smoke tests pass (model loads, tokenizer round-trips,
one generation step runs).
## Training your own (out of the box)
The kit ships a ~700 KB TinyStories slice at `data/tinystories_demo/` so
training works without any download:
```bash
# 50 steps on CPU, takes a couple of minutes:
python scripts/train.py --model tilelli-lite-fp32 --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu
python scripts/train.py --model tilelli-lite-ternary --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu
python scripts/train.py --model vanilla-fp32 --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu
```
Each run writes checkpoints + a per-step JSONL log to `runs/<model>_<timestamp>/`.
The README lists the 5 supported `--model` configs.
## Reproducing the claims
The four `reproduce/0N_*.py` scripts are described in the README. Each
exits non-zero if the bundled v4 checkpoint fails to produce the
documented number within ±5 %.
```bash
python reproduce/03_abstain_held_out.py # held-out IDK gate
python reproduce/04_neo_false_inability.py # false-inability probe
python reproduce/02_metacog_probe.py # cross-regime AUROC
```
A fourth script (`01_benchmark.py`) is an architecture-only check: it
loads the bundled v4 checkpoint, prints the 10.18 M parameter count,
and exits PASS. It runs in ~2 s on CPU. The full val-bpc-vs-vanilla
re-run requires the FineWeb-Edu training pipeline, which is NOT bundled;
the documented number lives in `results/claim_01_benchmark.md`.
## Troubleshooting
- **"sequence length N > max_seq_len 256"**: the bundled ckpt has a
context window of 256 bytes. If `chat.py` hits this, your prompt is
too long; trim it.
- **"weights_only=True" load error**: the loader passes
`weights_only=False` because the checkpoint was authored by us. Trust
the bundled artifact; for any third-party ckpt, verify the SHA first
(the SHA for v4 is in the README).
- **macOS Apple Silicon**: PyTorch ≥2.1 ships native arm64 wheels; no
Rosetta needed.
- **Windows**: the runtime helpers in `src/tilelli/utils/runtime.py`
touch `/sys/class/thermal/` on Linux only; the calls are exception-
swallowed elsewhere. No action needed.
## License
Apache 2.0. See `LICENSE`. The bundled weights ship under the same
license. The name "Tilelli" is not licensed by this file — fork freely,
rename if you ship a derivative.
|