RitsuGPT / README.md
Ryubo's picture
Update README.md
55e713e verified
|
Raw
History Blame Contribute Delete
2.33 kB
---
license: mit
language:
- en
tags:
- rust
- burn
- gpt
- tinystories
- from-scratch
- decoder-transformer
pipeline_tag: text-generation
inference: false
datasets:
- roneneldan/TinyStories
---
# RitsuGPT
A small, from-scratch GPT in **pure Rust** β€” it trains on a single consumer GPU (an NVIDIA GeForce RTX 5060, 8 GB) and runs on your own computer. *nanoGPT, in Rust.*
Trainer & source code: **[github.com/NeonixLabs/RitsuGPT](https://github.com/NeonixLabs/RitsuGPT)** Β· Part of [Neonix Labs](https://labs.neonix.ai).
> **What it is, honestly:** a ~16.9M-parameter small language model in the spirit of [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) (Eldan & Li, 2023). It learns to write simple, coherent short English stories. It is **not** a production assistant β€” no world knowledge, no reasoning, no instruction following. Its value is a clean, hackable, from-scratch stack you can train and verify yourself.
## Files
| File | What |
|---|---|
| `ritsu-step25000.mpk` | Weights at 25,000 steps (recommended) β€” `burn` CompactRecorder format |
| `ritsu-step12000.mpk` | Weights at 12,000 steps (earlier checkpoint) |
| `tokenizer.json` | Byte-level BPE tokenizer (vocab 8192), HuggingFace `tokenizers` format |
## Results
Evaluation reports **bits-per-byte (BPB)** on the TinyStories validation set β€” tokenizer-invariant, lower is better.
| Checkpoint | Steps | BPB |
|---|---|---|
| `ritsu-step12000.mpk` | 12,000 | 0.695 |
| `ritsu-step25000.mpk` | 25,000 | **0.6843** |
| byte-level baseline | β€” | 0.805 |
## How to run
This is a Rust / `burn` model β€” not a `transformers` model β€” so there is no hosted inference widget. Run it locally with the trainer:
```bash
git clone https://github.com/NeonixLabs/RitsuGPT
cd RitsuGPT
# put ritsu-step25000.mpk and tokenizer.json in this folder (download them from this repo)
cargo run --release --bin neonix-train -- sample ./ritsu-step25000 ./tokenizer.json "Once upon a time" 200 0.8 40
```
Pass the checkpoint path **without** the `.mpk` suffix β€” the loader appends it. Inference runs on CPU.
## Architecture
A standard decoder-only Transformer, optimized in Rust.
## License
[MIT](https://github.com/NeonixLabs/RitsuGPT). Trained on the public [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset.