HuggingFaceFW/fineweb-edu
Viewer • Updated • 3.5B • 514k • 1.12k
The second model in the "piko" family, which is my take on training smaller GPT-2 like models. Not tuned, just a base model.
Trained on a single 3090 for ~30k steps with Karpathy's train_gpt2.py script from the llm.c repo. Dataset used is edu_fineweb10B from the aforementioned repo. The model achieved the val loss of ~3.57.
Compared to the pathfinder 16M model variant this one is:
This repo contains the .pt file which has the following structure
{
'step': step,
'config': asdict(model.config),
'model_state_dict': model.state_dict(),
},
To load the model you can use the following piece of code (not very pretty, I know)
checkpoint = torch.load(path, weights_only=True)
config = GPTConfig(**checkpoint['config'])
model = GPT(config)
any_key = next(iter(checkpoint['model_state_dict'].keys()))
if any_key.startswith("_orig_mod."):
# strip "_orig_mod." if the model was compiled
model_state_dict = {k[10:]: v for k, v in checkpoint['model_state_dict'].items()}
else:
model_state_dict = checkpoint['model_state_dict']
model.load_state_dict(model_state_dict)