MonumentalSystems
/

randygpt-s2

Text Generation

Model card Files Files and versions

randygpt-s2 / README.md

rsafier's picture

Upload folder using huggingface_hub

eaf4088 verified 5 days ago

|

history blame contribute delete

1.5 kB

	---
	language: en
	license: mit
	tags:
	- text-generation
	- causal-lm
	- randygpt
	- rust
	---

	# randyGPT — model-s2

	A GPT-style language model trained from scratch in Rust on Project Gutenberg.

	## Model Details

	\| \| \|
	\|---\|---\|
	\| Architecture \| Transformer (causal LM) \|
	\| Parameters \| 1.99M \|
	\| Layers \| 8 \|
	\| Heads \| 4 \|
	\| Embedding dim \| 128 \|
	\| Context window \| 256 tokens \|
	\| Vocab size \| 1500 (BPE) \|
	\| Training iters \| 2925 \|
	\| Best val loss \| 4.4183 \|

	## Training

	Trained on ~103MB of cleaned Project Gutenberg text (114 public domain books)
	with BPE-1500 tokenization, AdamW optimizer, cosine LR decay,
	and ReduceLROnPlateau. Metal GPU via Candle on Apple Silicon.

	## Usage

	```python
	from modeling_randygpt import RandyGPTConfig, RandyGPTForCausalLM
	from tokenizer_randygpt import RandyGPTTokenizer
	from safetensors.torch import load_file
	import torch

	# Load
	cfg = RandyGPTConfig.from_pretrained("MonumentalSystems/randygpt-s2")
	model = RandyGPTForCausalLM(cfg)
	state = load_file("model.safetensors")
	model.load_state_dict(state, strict=True)
	model.eval()

	tok = RandyGPTTokenizer.from_file("tokenizer.json")

	# Generate
	prompt = "Once upon a time"
	ids = torch.tensor([tok.encode(prompt)], dtype=torch.long)
	out_ids = model.generate_text(ids, max_new_tokens=200, temperature=0.8)
	print(tok.decode(out_ids[0].tolist()))
	```

	## Source

	Trained with [randyGPT](https://github.com/MonumentalSystems/RandyGPT) —
	a GPT implementation in Rust with Metal GPU acceleration.