Elriggs
/

gpt2-debug-baseline

Model card Files Files and versions

gpt2-debug-baseline / README.md

Elriggs's picture

Upload folder using huggingface_hub

a18132a verified about 2 months ago

|

history blame contribute delete

968 Bytes

	# Modded NanoGPT Model

	This is a GPT-2 style model trained with modifications from modded-nanogpt.

	## Model Config

	- Layers: 2
	- Heads: 4
	- Embedding dimension: 64
	- Vocab size: 50304
	- Squared MLP: False
	- Bilinear: False
	- Gated: False
	- Expansion factor: 4

	## Training

	- Training step: 500

	## Usage

	```python
	from huggingface_hub import hf_hub_download
	import torch
	from train_gpt2 import GPT, GPTConfig
	import json

	# Download config
	config_path = hf_hub_download(repo_id="Elriggs/gpt2-debug-baseline", filename="config.json")
	with open(config_path) as f:
	config_dict = json.load(f)

	# Remove non-GPTConfig fields
	config_dict.pop('step', None)

	# Create model
	config = GPTConfig(**config_dict)
	model = GPT(config)

	# Download and load weights
	weights_path = hf_hub_download(repo_id="Elriggs/gpt2-debug-baseline", filename="pytorch_model.bin")
	state_dict = torch.load(weights_path, map_location='cpu')
	model.load_state_dict(state_dict)

	model.eval()
	```