RecursiveComplete / README.md
Gentraxyz's picture
Update README.md
ea48bd8 verified
|
Raw
History Blame Contribute Delete
3.29 kB
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- gpt2
- from-scratch
- tinystories
- text-generation
- causal-lm
---
# RecursiveComplete
A small GPT-2-style language model (~18.3M parameters) trained **completely from scratch by an AI**, end to end — the architecture, training code, tokenizer, data prep, and training run were all written and executed by an AI agent with no pre-existing weights or fine-tuning from another model.
This is a **text-completion model**, not an instruction-tuned chatbot. It's good at continuing short prose and simple stories. It is **not** good at answering questions, following instructions, or factual recall.
> **Note:** this is a custom-format model, not a `transformers` model. You load it with the included scripts (`gpt2.py` + `chat.py`), not `AutoModelForCausalLM`.
> It does say GPT-2 in the file names. But that is just because the model used the same architecture style
## Model details
| | |
|---|---|
| Type | Decoder-only transformer (GPT-2 style) |
| Parameters | ~18.3M |
| Embedding dim (`n_embd`) | 448 |
| Heads (`n_head`) | 7 |
| Layers (`n_layer`) | 6 |
| Context length (`block_size`) | 256 |
| Vocab size | 8192 |
| Tokenizer | Byte-level BPE (`<eot>` id = 0) |
| Dropout | 0.1 |
| Final train loss | ~1.86 |
## Training data
Trained primarily on **TinyStories** (~90M tokens) with a small amount of Alpaca-style data. The model learned general English sentence structure and simple narrative flow, not world knowledge.
## Files in this repo
| File | What it is |
|---|---|
| `model.safetensors` | The model weights |
| `config.json` | Architecture config (custom format) |
| `gpt2.py` | Model definition (the GPT-2-style architecture) |
| `chat.py` | Run / generate from the model |
| `tokenizer_bpe/vocab.json`, `tokenizer_bpe/merges.txt` | Byte-level BPE tokenizer |
| `big.pt` | Full training checkpoint (model + optimizer), for **resuming training** only |
| `train_big.py`, `prep_bpe.py` | Training and data-prep scripts |
## Intended use
- Story / prose continuation
- Experimentation and education (a clean, fully-from-scratch small LM)
## How to use
This model uses its own minimal code, not the `transformers` library.
```bash
# 1. Install deps
pip install torch tokenizers safetensors numpy
# 2. Download this repo (gives you the scripts + weights + tokenizer)
pip install huggingface_hub
hf download Gentraxyz/RecursiveComplete --local-dir RecursiveComplete
cd RecursiveComplete
# 3. Generate
python chat.py
```
`chat.py` loads `gpt2.py` (the architecture), the weights from `model.safetensors`, and the BPE tokenizer in `tokenizer_bpe/`, then lets you prompt the model for completions.
Tip: it's a completion model — give it the *start* of something ("Once upon a time there was a small robot who") rather than a question.
## Limitations
- Completion only — will not reliably answer questions or follow instructions.
- No factual reliability; it will confidently make things up.
- Small context (256 tokens) and small vocab (8192).
- English only.
## License
Apache 2.0.
## Note
This model was trained entirely by an AI — including writing the model code, the tokenizer, the data pipeline, and running the training. It is shared as a small from-scratch experiment.