Update README.md

ea48bd8 verified 12 days ago

3.29 kB

license: apache-2.0
language:
  - en
pipeline_tag: text-generation
tags:
  - gpt2
  - from-scratch
  - tinystories
  - text-generation
  - causal-lm

RecursiveComplete

A small GPT-2-style language model (~18.3M parameters) trained completely from scratch by an AI, end to end — the architecture, training code, tokenizer, data prep, and training run were all written and executed by an AI agent with no pre-existing weights or fine-tuning from another model.

This is a text-completion model, not an instruction-tuned chatbot. It's good at continuing short prose and simple stories. It is not good at answering questions, following instructions, or factual recall.

Note: this is a custom-format model, not a transformers model. You load it with the included scripts (gpt2.py + chat.py), not AutoModelForCausalLM. It does say GPT-2 in the file names. But that is just because the model used the same architecture style

Model details


Type	Decoder-only transformer (GPT-2 style)
Parameters	~18.3M
Embedding dim (`n_embd`)	448
Heads (`n_head`)	7
Layers (`n_layer`)	6
Context length (`block_size`)	256
Vocab size	8192
Tokenizer	Byte-level BPE (`<eot>` id = 0)
Dropout	0.1
Final train loss	~1.86

Training data

Trained primarily on TinyStories (~90M tokens) with a small amount of Alpaca-style data. The model learned general English sentence structure and simple narrative flow, not world knowledge.

Files in this repo

File	What it is
`model.safetensors`	The model weights
`config.json`	Architecture config (custom format)
`gpt2.py`	Model definition (the GPT-2-style architecture)
`chat.py`	Run / generate from the model
`tokenizer_bpe/vocab.json`, `tokenizer_bpe/merges.txt`	Byte-level BPE tokenizer
`big.pt`	Full training checkpoint (model + optimizer), for resuming training only
`train_big.py`, `prep_bpe.py`	Training and data-prep scripts

Intended use

Story / prose continuation
Experimentation and education (a clean, fully-from-scratch small LM)

How to use

This model uses its own minimal code, not the transformers library.

# 1. Install deps
pip install torch tokenizers safetensors numpy

# 2. Download this repo (gives you the scripts + weights + tokenizer)
pip install huggingface_hub
hf download Gentraxyz/RecursiveComplete --local-dir RecursiveComplete
cd RecursiveComplete

# 3. Generate
python chat.py

chat.py loads gpt2.py (the architecture), the weights from model.safetensors, and the BPE tokenizer in tokenizer_bpe/, then lets you prompt the model for completions.

Tip: it's a completion model — give it the start of something ("Once upon a time there was a small robot who") rather than a question.

Limitations

Completion only — will not reliably answer questions or follow instructions.
No factual reliability; it will confidently make things up.
Small context (256 tokens) and small vocab (8192).
English only.

License

Apache 2.0.

Note

This model was trained entirely by an AI — including writing the model code, the tokenizer, the data pipeline, and running the training. It is shared as a small from-scratch experiment.