Mirror small files (code, paper, results)

f86dc09 verified 17 days ago

3.71 kB

	# Install

	Tilelli runs on CPU. You don't need a GPU. The whole install is ~120 MB
	(torch + the bundled 39 MB checkpoint).

	## CPU-only — recommended for everyone

	The default `pip install torch` on Linux pulls the CUDA build (2+ GB,
	plus matching nvidia-* runtime wheels). On macOS and Windows the default
	wheel is already CPU; on Linux it is not. Save yourself the bandwidth:

	```bash
	# 1. Get CPU torch first (works on Linux, macOS, Windows)
	pip install --index-url https://download.pytorch.org/whl/cpu torch

	# 2. Then install Tilelli
	git clone https://github.com/TilelliLab/Tilelli-llm
	cd tilelli
	pip install -e .

	# 3. Talk to it
	python chat.py "Hello, who are you?"
	```

	## GPU (optional)

	If you actually have a GPU and want to run faster:

	```bash
	# CUDA 12.x build (Linux):
	pip install --index-url https://download.pytorch.org/whl/cu121 torch
	# or MPS (macOS): the default macOS wheel already includes MPS.
	pip install -e .
	```

	Inference works fine on CPU — the bundled v4 ckpt is 10 M parameters and
	the generation loop is single-threaded NumPy-friendly. A GPU buys you
	~5–10× faster generation, not a different model.

	## Verifying the install

	```bash
	pip install -e ".[test]"
	pytest -q tests/
	```

	You should see three smoke tests pass (model loads, tokenizer round-trips,
	one generation step runs).

	## Training your own (out of the box)

	The kit ships a ~700 KB TinyStories slice at `data/tinystories_demo/` so
	training works without any download:

	```bash
	# 50 steps on CPU, takes a couple of minutes:
	python scripts/train.py --model tilelli-lite-fp32 --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu
	python scripts/train.py --model tilelli-lite-ternary --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu
	python scripts/train.py --model vanilla-fp32 --data-dir data/tinystories_demo --steps 50 --batch-size 4 --seq-len 64 --device cpu
	```

	Each run writes checkpoints + a per-step JSONL log to `runs/<model>_<timestamp>/`.
	The README lists the 5 supported `--model` configs.

	## Reproducing the claims

	The four `reproduce/0N_*.py` scripts are described in the README. Each
	exits non-zero if the bundled v4 checkpoint fails to produce the
	documented number within ±5 %.

	```bash
	python reproduce/03_abstain_held_out.py # held-out IDK gate
	python reproduce/04_neo_false_inability.py # false-inability probe
	python reproduce/02_metacog_probe.py # cross-regime AUROC
	```

	A fourth script (`01_benchmark.py`) is an architecture-only check: it
	loads the bundled v4 checkpoint, prints the 10.18 M parameter count,
	and exits PASS. It runs in ~2 s on CPU. The full val-bpc-vs-vanilla
	re-run requires the FineWeb-Edu training pipeline, which is NOT bundled;
	the documented number lives in `results/claim_01_benchmark.md`.

	## Troubleshooting

	- "sequence length N > max_seq_len 256": the bundled ckpt has a
	context window of 256 bytes. If `chat.py` hits this, your prompt is
	too long; trim it.
	- "weights_only=True" load error: the loader passes
	`weights_only=False` because the checkpoint was authored by us. Trust
	the bundled artifact; for any third-party ckpt, verify the SHA first
	(the SHA for v4 is in the README).
	- macOS Apple Silicon: PyTorch ≥2.1 ships native arm64 wheels; no
	Rosetta needed.
	- Windows: the runtime helpers in `src/tilelli/utils/runtime.py`
	touch `/sys/class/thermal/` on Linux only; the calls are exception-
	swallowed elsewhere. No action needed.

	## License

	Apache 2.0. See `LICENSE`. The bundled weights ship under the same
	license. The name "Tilelli" is not licensed by this file — fork freely,
	rename if you ship a derivative.