Gentraxyz
/

RecursiveComplete

Text Generation

Model card Files Files and versions

RecursiveComplete / README.md

Gentraxyz's picture

Update README.md

ea48bd8 verified 12 days ago

|

History Blame Contribute Delete

3.29 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- gpt2
	- from-scratch
	- tinystories
	- text-generation
	- causal-lm
	---

	# RecursiveComplete

	A small GPT-2-style language model (~18.3M parameters) trained completely from scratch by an AI, end to end — the architecture, training code, tokenizer, data prep, and training run were all written and executed by an AI agent with no pre-existing weights or fine-tuning from another model.

	This is a text-completion model, not an instruction-tuned chatbot. It's good at continuing short prose and simple stories. It is not good at answering questions, following instructions, or factual recall.

	> Note: this is a custom-format model, not a `transformers` model. You load it with the included scripts (`gpt2.py` + `chat.py`), not `AutoModelForCausalLM`.
	> It does say GPT-2 in the file names. But that is just because the model used the same architecture style

	## Model details

	\| \| \|
	\|---\|---\|
	\| Type \| Decoder-only transformer (GPT-2 style) \|
	\| Parameters \| ~18.3M \|
	\| Embedding dim (`n_embd`) \| 448 \|
	\| Heads (`n_head`) \| 7 \|
	\| Layers (`n_layer`) \| 6 \|
	\| Context length (`block_size`) \| 256 \|
	\| Vocab size \| 8192 \|
	\| Tokenizer \| Byte-level BPE (`<eot>` id = 0) \|
	\| Dropout \| 0.1 \|
	\| Final train loss \| ~1.86 \|

	## Training data

	Trained primarily on TinyStories (~90M tokens) with a small amount of Alpaca-style data. The model learned general English sentence structure and simple narrative flow, not world knowledge.

	## Files in this repo

	\| File \| What it is \|
	\|---\|---\|
	\| `model.safetensors` \| The model weights \|
	\| `config.json` \| Architecture config (custom format) \|
	\| `gpt2.py` \| Model definition (the GPT-2-style architecture) \|
	\| `chat.py` \| Run / generate from the model \|
	\| `tokenizer_bpe/vocab.json`, `tokenizer_bpe/merges.txt` \| Byte-level BPE tokenizer \|
	\| `big.pt` \| Full training checkpoint (model + optimizer), for resuming training only \|
	\| `train_big.py`, `prep_bpe.py` \| Training and data-prep scripts \|

	## Intended use

	- Story / prose continuation
	- Experimentation and education (a clean, fully-from-scratch small LM)

	## How to use

	This model uses its own minimal code, not the `transformers` library.

	```bash
	# 1. Install deps
	pip install torch tokenizers safetensors numpy

	# 2. Download this repo (gives you the scripts + weights + tokenizer)
	pip install huggingface_hub
	hf download Gentraxyz/RecursiveComplete --local-dir RecursiveComplete
	cd RecursiveComplete

	# 3. Generate
	python chat.py
	```

	`chat.py` loads `gpt2.py` (the architecture), the weights from `model.safetensors`, and the BPE tokenizer in `tokenizer_bpe/`, then lets you prompt the model for completions.

	Tip: it's a completion model — give it the start of something ("Once upon a time there was a small robot who") rather than a question.

	## Limitations

	- Completion only — will not reliably answer questions or follow instructions.
	- No factual reliability; it will confidently make things up.
	- Small context (256 tokens) and small vocab (8192).
	- English only.

	## License

	Apache 2.0.

	## Note

	This model was trained entirely by an AI — including writing the model code, the tokenizer, the data pipeline, and running the training. It is shared as a small from-scratch experiment.