rudygpt
A 124M parameter causal language model trained from scratch using rudyon/pipeline on the HuggingFaceFW/fineweb-edu dataset.
Training was done on a vast.ai instance with 2x 4090S Ti. Training cost was about ~$10.
usage
import torch
from transformers import AutoTokenizer
# download model.py and pytorch_model.bin manually or via hf_hub_download
from model import GPT, GPTConfig
device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = AutoTokenizer.from_pretrained("rudyon/rudygpt")
model = GPT(GPTConfig(depth=12, vocab_size=50304))
state_dict = torch.load("pytorch_model.bin", map_location='cpu')
model.load_state_dict(state_dict)
model.eval()
print(model.generate("Hello!"))
- Downloads last month
- 67