RitsuGPT / README.md

Ryubo

Update README.md

55e713e verified 22 days ago

preview code

Raw

History Blame Contribute Delete

2.33 kB

metadata

license: mit
language:
  - en
tags:
  - rust
  - burn
  - gpt
  - tinystories
  - from-scratch
  - decoder-transformer
pipeline_tag: text-generation
inference: false
datasets:
  - roneneldan/TinyStories

RitsuGPT

A small, from-scratch GPT in pure Rust — it trains on a single consumer GPU (an NVIDIA GeForce RTX 5060, 8 GB) and runs on your own computer. nanoGPT, in Rust.

Trainer & source code: github.com/NeonixLabs/RitsuGPT · Part of Neonix Labs.

What it is, honestly: a ~16.9M-parameter small language model in the spirit of TinyStories (Eldan & Li, 2023). It learns to write simple, coherent short English stories. It is not a production assistant — no world knowledge, no reasoning, no instruction following. Its value is a clean, hackable, from-scratch stack you can train and verify yourself.

Files

File	What
`ritsu-step25000.mpk`	Weights at 25,000 steps (recommended) — `burn` CompactRecorder format
`ritsu-step12000.mpk`	Weights at 12,000 steps (earlier checkpoint)
`tokenizer.json`	Byte-level BPE tokenizer (vocab 8192), HuggingFace `tokenizers` format

Results

Evaluation reports bits-per-byte (BPB) on the TinyStories validation set — tokenizer-invariant, lower is better.

Checkpoint	Steps	BPB
`ritsu-step12000.mpk`	12,000	0.695
`ritsu-step25000.mpk`	25,000	0.6843
byte-level baseline	—	0.805

How to run

This is a Rust / burn model — not a transformers model — so there is no hosted inference widget. Run it locally with the trainer:

git clone https://github.com/NeonixLabs/RitsuGPT
cd RitsuGPT
# put ritsu-step25000.mpk and tokenizer.json in this folder (download them from this repo)
cargo run --release --bin neonix-train -- sample ./ritsu-step25000 ./tokenizer.json "Once upon a time" 200 0.8 40

Pass the checkpoint path without the .mpk suffix — the loader appends it. Inference runs on CPU.

Architecture

A standard decoder-only Transformer, optimized in Rust.

License

MIT. Trained on the public TinyStories dataset.