| --- |
| license: apache-2.0 |
| language: |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - gpt2 |
| - from-scratch |
| - tinystories |
| - text-generation |
| - causal-lm |
| --- |
| |
| # RecursiveComplete |
|
|
| A small GPT-2-style language model (~18.3M parameters) trained **completely from scratch by an AI**, end to end — the architecture, training code, tokenizer, data prep, and training run were all written and executed by an AI agent with no pre-existing weights or fine-tuning from another model. |
|
|
| This is a **text-completion model**, not an instruction-tuned chatbot. It's good at continuing short prose and simple stories. It is **not** good at answering questions, following instructions, or factual recall. |
|
|
| > **Note:** this is a custom-format model, not a `transformers` model. You load it with the included scripts (`gpt2.py` + `chat.py`), not `AutoModelForCausalLM`. |
| > It does say GPT-2 in the file names. But that is just because the model used the same architecture style |
|
|
| ## Model details |
|
|
| | | | |
| |---|---| |
| | Type | Decoder-only transformer (GPT-2 style) | |
| | Parameters | ~18.3M | |
| | Embedding dim (`n_embd`) | 448 | |
| | Heads (`n_head`) | 7 | |
| | Layers (`n_layer`) | 6 | |
| | Context length (`block_size`) | 256 | |
| | Vocab size | 8192 | |
| | Tokenizer | Byte-level BPE (`<eot>` id = 0) | |
| | Dropout | 0.1 | |
| | Final train loss | ~1.86 | |
|
|
| ## Training data |
|
|
| Trained primarily on **TinyStories** (~90M tokens) with a small amount of Alpaca-style data. The model learned general English sentence structure and simple narrative flow, not world knowledge. |
|
|
| ## Files in this repo |
|
|
| | File | What it is | |
| |---|---| |
| | `model.safetensors` | The model weights | |
| | `config.json` | Architecture config (custom format) | |
| | `gpt2.py` | Model definition (the GPT-2-style architecture) | |
| | `chat.py` | Run / generate from the model | |
| | `tokenizer_bpe/vocab.json`, `tokenizer_bpe/merges.txt` | Byte-level BPE tokenizer | |
| | `big.pt` | Full training checkpoint (model + optimizer), for **resuming training** only | |
| | `train_big.py`, `prep_bpe.py` | Training and data-prep scripts | |
|
|
| ## Intended use |
|
|
| - Story / prose continuation |
| - Experimentation and education (a clean, fully-from-scratch small LM) |
|
|
| ## How to use |
|
|
| This model uses its own minimal code, not the `transformers` library. |
|
|
| ```bash |
| # 1. Install deps |
| pip install torch tokenizers safetensors numpy |
| |
| # 2. Download this repo (gives you the scripts + weights + tokenizer) |
| pip install huggingface_hub |
| hf download Gentraxyz/RecursiveComplete --local-dir RecursiveComplete |
| cd RecursiveComplete |
| |
| # 3. Generate |
| python chat.py |
| ``` |
|
|
| `chat.py` loads `gpt2.py` (the architecture), the weights from `model.safetensors`, and the BPE tokenizer in `tokenizer_bpe/`, then lets you prompt the model for completions. |
|
|
| Tip: it's a completion model — give it the *start* of something ("Once upon a time there was a small robot who") rather than a question. |
|
|
| ## Limitations |
|
|
| - Completion only — will not reliably answer questions or follow instructions. |
| - No factual reliability; it will confidently make things up. |
| - Small context (256 tokens) and small vocab (8192). |
| - English only. |
|
|
| ## License |
|
|
| Apache 2.0. |
|
|
| ## Note |
|
|
| This model was trained entirely by an AI — including writing the model code, the tokenizer, the data pipeline, and running the training. It is shared as a small from-scratch experiment. |