| | --- |
| | language: en |
| | license: mit |
| | tags: |
| | - text-generation |
| | - causal-lm |
| | - randygpt |
| | - rust |
| | --- |
| | |
| | # randyGPT — model-s2 |
| |
|
| | A GPT-style language model trained from scratch in Rust on Project Gutenberg. |
| |
|
| | ## Model Details |
| |
|
| | | | | |
| | |---|---| |
| | | Architecture | Transformer (causal LM) | |
| | | Parameters | 1.99M | |
| | | Layers | 8 | |
| | | Heads | 4 | |
| | | Embedding dim | 128 | |
| | | Context window | 256 tokens | |
| | | Vocab size | 1500 (BPE) | |
| | | Training iters | 2925 | |
| | | Best val loss | 4.4183 | |
| |
|
| | ## Training |
| |
|
| | Trained on ~103MB of cleaned Project Gutenberg text (114 public domain books) |
| | with BPE-1500 tokenization, AdamW optimizer, cosine LR decay, |
| | and ReduceLROnPlateau. Metal GPU via Candle on Apple Silicon. |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from modeling_randygpt import RandyGPTConfig, RandyGPTForCausalLM |
| | from tokenizer_randygpt import RandyGPTTokenizer |
| | from safetensors.torch import load_file |
| | import torch |
| | |
| | # Load |
| | cfg = RandyGPTConfig.from_pretrained("MonumentalSystems/randygpt-s2") |
| | model = RandyGPTForCausalLM(cfg) |
| | state = load_file("model.safetensors") |
| | model.load_state_dict(state, strict=True) |
| | model.eval() |
| | |
| | tok = RandyGPTTokenizer.from_file("tokenizer.json") |
| | |
| | # Generate |
| | prompt = "Once upon a time" |
| | ids = torch.tensor([tok.encode(prompt)], dtype=torch.long) |
| | out_ids = model.generate_text(ids, max_new_tokens=200, temperature=0.8) |
| | print(tok.decode(out_ids[0].tolist())) |
| | ``` |
| |
|
| | ## Source |
| |
|
| | Trained with [randyGPT](https://github.com/MonumentalSystems/RandyGPT) — |
| | a GPT implementation in Rust with Metal GPU acceleration. |
| |
|