| | --- |
| | language: en |
| | license: mit |
| | tags: |
| | - text-generation |
| | - causal-lm |
| | - randygpt |
| | - rust |
| | --- |
| | |
| | # randyGPT — model-ds2 |
| |
|
| | A GPT-style language model trained from scratch in Rust on Project Gutenberg. |
| |
|
| | ## Model Details |
| |
|
| | | | | |
| | |---|---| |
| | | Architecture | Transformer (causal LM) | |
| | | Parameters | 2.90M | |
| | | Layers | 12 | |
| | | Heads | 4 | |
| | | Embedding dim | 128 | |
| | | Context window | 256 tokens | |
| | | Vocab size | 2000 (BPE) | |
| | | Training iters | 14375 | |
| | | Best val loss | 3.8242 | |
| |
|
| | ## Training |
| |
|
| | Trained on ~98MB of cleaned Project Gutenberg text (112 public domain books, |
| | v3 cleaning with Unicode normalization) with BPE-2000 tokenization, |
| | AdamW optimizer, cosine LR decay, ReduceLROnPlateau, dropout=0.1, and |
| | Metal GPU via Candle on Apple Silicon. |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from modeling_randygpt import RandyGPTConfig, RandyGPTForCausalLM |
| | from tokenizer_randygpt import RandyGPTTokenizer |
| | from safetensors.torch import load_file |
| | import torch |
| | |
| | # Load |
| | cfg = RandyGPTConfig.from_pretrained("MonumentalSystems/randygpt-ds2") |
| | model = RandyGPTForCausalLM(cfg) |
| | state = load_file("model.safetensors") |
| | model.load_state_dict(state, strict=True) |
| | model.eval() |
| | |
| | tok = RandyGPTTokenizer.from_file("tokenizer.json") |
| | |
| | # Generate |
| | prompt = "Once upon a time" |
| | ids = torch.tensor([tok.encode(prompt)], dtype=torch.long) |
| | out_ids = model.generate_text(ids, max_new_tokens=200, temperature=0.8) |
| | print(tok.decode(out_ids[0].tolist())) |
| | ``` |
| |
|
| | ## Source |
| |
|
| | Trained with [randyGPT](https://github.com/MonumentalSystems/RandyGPT) — |
| | a GPT implementation in Rust with Metal GPU acceleration. |
| |
|