MonumentalSystems
/

randygpt-ds2

Text Generation

Model card Files Files and versions

randygpt-ds2 / README.md

rsafier's picture

Upload folder using huggingface_hub

f369425 verified 4 days ago

|

history blame contribute delete

1.55 kB

	---
	language: en
	license: mit
	tags:
	- text-generation
	- causal-lm
	- randygpt
	- rust
	---

	# randyGPT — model-ds2

	A GPT-style language model trained from scratch in Rust on Project Gutenberg.

	## Model Details

	\| \| \|
	\|---\|---\|
	\| Architecture \| Transformer (causal LM) \|
	\| Parameters \| 2.90M \|
	\| Layers \| 12 \|
	\| Heads \| 4 \|
	\| Embedding dim \| 128 \|
	\| Context window \| 256 tokens \|
	\| Vocab size \| 2000 (BPE) \|
	\| Training iters \| 14375 \|
	\| Best val loss \| 3.8242 \|

	## Training

	Trained on ~98MB of cleaned Project Gutenberg text (112 public domain books,
	v3 cleaning with Unicode normalization) with BPE-2000 tokenization,
	AdamW optimizer, cosine LR decay, ReduceLROnPlateau, dropout=0.1, and
	Metal GPU via Candle on Apple Silicon.

	## Usage

	```python
	from modeling_randygpt import RandyGPTConfig, RandyGPTForCausalLM
	from tokenizer_randygpt import RandyGPTTokenizer
	from safetensors.torch import load_file
	import torch

	# Load
	cfg = RandyGPTConfig.from_pretrained("MonumentalSystems/randygpt-ds2")
	model = RandyGPTForCausalLM(cfg)
	state = load_file("model.safetensors")
	model.load_state_dict(state, strict=True)
	model.eval()

	tok = RandyGPTTokenizer.from_file("tokenizer.json")

	# Generate
	prompt = "Once upon a time"
	ids = torch.tensor([tok.encode(prompt)], dtype=torch.long)
	out_ids = model.generate_text(ids, max_new_tokens=200, temperature=0.8)
	print(tok.decode(out_ids[0].tolist()))
	```

	## Source

	Trained with [randyGPT](https://github.com/MonumentalSystems/RandyGPT) —
	a GPT implementation in Rust with Metal GPU acceleration.