rudygpt

A 124M parameter causal language model trained from scratch using rudyon/pipeline on the HuggingFaceFW/fineweb-edu dataset.

Training was done on a vast.ai instance with 2x 4090S Ti. Training cost was about ~$10.

usage

import torch
from transformers import AutoTokenizer
# download model.py and pytorch_model.bin manually or via hf_hub_download
from model import GPT, GPTConfig

device = 'cuda' if torch.cuda.is_available() else 'cpu'

tokenizer = AutoTokenizer.from_pretrained("rudyon/rudygpt")
model = GPT(GPTConfig(depth=12, vocab_size=50304))
state_dict = torch.load("pytorch_model.bin", map_location=device)
model.load_state_dict(state_dict)
model.eval()
print(model.generate("Hello!"))

Downloads last month: 409

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for rudyon/rudygpt

Finetunes

1 model

rudyon
/

rudygpt

rudygpt

usage

Model tree for rudyon/rudygpt

Dataset used to train rudyon/rudygpt