--- license: mit language: - en tags: - rust - burn - gpt - tinystories - from-scratch - decoder-transformer pipeline_tag: text-generation inference: false datasets: - roneneldan/TinyStories --- # RitsuGPT A small, from-scratch GPT in **pure Rust** — it trains on a single consumer GPU (an NVIDIA GeForce RTX 5060, 8 GB) and runs on your own computer. *nanoGPT, in Rust.* Trainer & source code: **[github.com/NeonixLabs/RitsuGPT](https://github.com/NeonixLabs/RitsuGPT)** · Part of [Neonix Labs](https://labs.neonix.ai). > **What it is, honestly:** a ~16.9M-parameter small language model in the spirit of [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) (Eldan & Li, 2023). It learns to write simple, coherent short English stories. It is **not** a production assistant — no world knowledge, no reasoning, no instruction following. Its value is a clean, hackable, from-scratch stack you can train and verify yourself. ## Files | File | What | |---|---| | `ritsu-step25000.mpk` | Weights at 25,000 steps (recommended) — `burn` CompactRecorder format | | `ritsu-step12000.mpk` | Weights at 12,000 steps (earlier checkpoint) | | `tokenizer.json` | Byte-level BPE tokenizer (vocab 8192), HuggingFace `tokenizers` format | ## Results Evaluation reports **bits-per-byte (BPB)** on the TinyStories validation set — tokenizer-invariant, lower is better. | Checkpoint | Steps | BPB | |---|---|---| | `ritsu-step12000.mpk` | 12,000 | 0.695 | | `ritsu-step25000.mpk` | 25,000 | **0.6843** | | byte-level baseline | — | 0.805 | ## How to run This is a Rust / `burn` model — not a `transformers` model — so there is no hosted inference widget. Run it locally with the trainer: ```bash git clone https://github.com/NeonixLabs/RitsuGPT cd RitsuGPT # put ritsu-step25000.mpk and tokenizer.json in this folder (download them from this repo) cargo run --release --bin neonix-train -- sample ./ritsu-step25000 ./tokenizer.json "Once upon a time" 200 0.8 40 ``` Pass the checkpoint path **without** the `.mpk` suffix — the loader appends it. Inference runs on CPU. ## Architecture A standard decoder-only Transformer, optimized in Rust. ## License [MIT](https://github.com/NeonixLabs/RitsuGPT). Trained on the public [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset.