Gentraxyz
/

RecursiveComplete

Text Generation

Model card Files Files and versions

RecursiveComplete / HF_README.md

Gentraxyz's picture

Upload folder using huggingface_hub

3c38b94 verified 12 days ago

|

History Blame Contribute Delete

1.46 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- text-generation
	- gpt
	- from-scratch
	- tinystories
	pipeline_tag: text-generation
	---

	# mini-llm-tinystories

	An 18.3M-parameter GPT-2-style language model trained completely from scratch on CPU
	(no pretrained weights). It generates short, coherent children's-story-style English.

	## Details
	- Architecture: GPT-2 style decoder (pre-norm, GELU, weight-tied head)
	- Params: ~18.3M — 448 dim, 7 heads, 6 layers, 256 context
	- Tokenizer: byte-level BPE, 8192 vocab
	- Training data: TinyStories (~90M tokens) + a small amount of Alpaca Q&A
	- Training: ~7.6 hours on 2 CPU cores, final train loss ~1.86
	- Type: base completion model (continues text; not instruction-tuned)

	## Usage
	```python
	import torch, json
	from safetensors.torch import load_file
	from tokenizers import ByteLevelBPETokenizer
	from gpt2 import GPT2 # include gpt2.py from this repo

	cfg = json.load(open("config.json"))
	model = GPT2(cfg)
	model.load_state_dict(load_file("model.safetensors"))
	model.eval()

	tk = ByteLevelBPETokenizer("tokenizer_bpe/vocab.json", "tokenizer_bpe/merges.txt")
	ids = tk.encode("Once upon a time").ids
	out = model.generate(torch.tensor([ids]), max_new_tokens=150)
	print(tk.decode(out[0].tolist()))
	```

	## Limitations
	Tiny model with no real-world knowledge. Best at short narrative completions in a
	TinyStories style. Will produce fluent but factually-wrong text if asked questions.