rudygpt

A 124M parameter causal language model trained from scratch using rudyon/pipeline on the HuggingFaceFW/fineweb-edu dataset.

Training was done on a vast.ai instance with 2x 4090S Ti. Training cost was about ~$10.

usage

import torch
from transformers import AutoTokenizer
# download model.py and pytorch_model.bin manually or via hf_hub_download
from model import GPT, GPTConfig

device = 'cuda' if torch.cuda.is_available() else 'cpu'

tokenizer = AutoTokenizer.from_pretrained("rudyon/rudygpt")
model = GPT(GPTConfig(depth=12, vocab_size=50304))
state_dict = torch.load("pytorch_model.bin", map_location='cpu')
model.load_state_dict(state_dict)
model.eval()
print(model.generate("Hello!"))
Downloads last month
67
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train rudyon/rudygpt