randyGPT โ€” model-ds2

A GPT-style language model trained from scratch in Rust on Project Gutenberg.

Model Details

Architecture Transformer (causal LM)
Parameters 2.90M
Layers 12
Heads 4
Embedding dim 128
Context window 256 tokens
Vocab size 2000 (BPE)
Training iters 14375
Best val loss 3.8242

Training

Trained on ~98MB of cleaned Project Gutenberg text (112 public domain books, v3 cleaning with Unicode normalization) with BPE-2000 tokenization, AdamW optimizer, cosine LR decay, ReduceLROnPlateau, dropout=0.1, and Metal GPU via Candle on Apple Silicon.

Usage

from modeling_randygpt import RandyGPTConfig, RandyGPTForCausalLM
from tokenizer_randygpt import RandyGPTTokenizer
from safetensors.torch import load_file
import torch

# Load
cfg   = RandyGPTConfig.from_pretrained("MonumentalSystems/randygpt-ds2")
model = RandyGPTForCausalLM(cfg)
state = load_file("model.safetensors")
model.load_state_dict(state, strict=True)
model.eval()

tok = RandyGPTTokenizer.from_file("tokenizer.json")

# Generate
prompt  = "Once upon a time"
ids     = torch.tensor([tok.encode(prompt)], dtype=torch.long)
out_ids = model.generate_text(ids, max_new_tokens=200, temperature=0.8)
print(tok.decode(out_ids[0].tolist()))

Source

Trained with randyGPT โ€” a GPT implementation in Rust with Metal GPU acceleration.

Downloads last month
45
Safetensors
Model size
2.9M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support