---
license: mit
language:
- en
tags:
- rust
- burn
- gpt
- tinystories
- from-scratch
- decoder-transformer
pipeline_tag: text-generation
inference: false
datasets:
- roneneldan/TinyStories
---

# RitsuGPT

A small, from-scratch GPT in **pure Rust** — it trains on a single consumer GPU (an NVIDIA GeForce RTX 5060, 8 GB) and runs on your own computer. *nanoGPT, in Rust.*

Trainer & source code: **[github.com/NeonixLabs/RitsuGPT](https://github.com/NeonixLabs/RitsuGPT)** · Part of [Neonix Labs](https://labs.neonix.ai).

> **What it is, honestly:** a ~16.9M-parameter small language model in the spirit of [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) (Eldan & Li, 2023). It learns to write simple, coherent short English stories. It is **not** a production assistant — no world knowledge, no reasoning, no instruction following. Its value is a clean, hackable, from-scratch stack you can train and verify yourself.

## Files

| File | What |
|---|---|
| `ritsu-step25000.mpk` | Weights at 25,000 steps (recommended) — `burn` CompactRecorder format |
| `ritsu-step12000.mpk` | Weights at 12,000 steps (earlier checkpoint) |
| `tokenizer.json` | Byte-level BPE tokenizer (vocab 8192), HuggingFace `tokenizers` format |

## Results

Evaluation reports **bits-per-byte (BPB)** on the TinyStories validation set — tokenizer-invariant, lower is better.

| Checkpoint | Steps | BPB |
|---|---|---|
| `ritsu-step12000.mpk` | 12,000 | 0.695 |
| `ritsu-step25000.mpk` | 25,000 | **0.6843** |
| byte-level baseline | — | 0.805 |

## How to run

This is a Rust / `burn` model — not a `transformers` model — so there is no hosted inference widget. Run it locally with the trainer:

```bash
git clone https://github.com/NeonixLabs/RitsuGPT
cd RitsuGPT
# put ritsu-step25000.mpk and tokenizer.json in this folder (download them from this repo)
cargo run --release --bin neonix-train -- sample ./ritsu-step25000 ./tokenizer.json "Once upon a time" 200 0.8 40
```

Pass the checkpoint path **without** the `.mpk` suffix — the loader appends it. Inference runs on CPU.

## Architecture

A standard decoder-only Transformer, optimized in Rust.

## License

[MIT](https://github.com/NeonixLabs/RitsuGPT). Trained on the public [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset.