kjv-model: A Tiny GPT-2 Trained on the King James Bible This is a small causal language model trained from scratch on the full text of the King James Version (KJV) of the Bible. It was built as a learning exercise to understand how to train language models using Hugging Face Transformers and the 🤗 Trainer API.

Despite its small size (only ~4 million parameters), it captures patterns in biblical language and can generate scripture-like text. This model is ideal for educational purposes, experimentation with small-scale language modeling, or exploring how neural networks interpret religious texts.

Model Details Architecture: Custom GPT-2 (initialized from scratch) Layers: 4 Attention heads: 4 Embedding dimension: 128 Maximum sequence length: 64 tokens Vocabulary size: Matches GPT-2 tokenizer (~50k) Total parameters: 7.23 million Training data: Full KJV Bible text (one verse per line) Training objective: Causal language modeling (next-token prediction) Training Configuration Epochs: 5 Per-device batch size: 8 Learning rate: 1e-3 Weight decay: 0.01 Warmup steps: 50 Optimizer: AdamW (default in Trainer) Tokenizer: GPT-2 tokenizer (with pad_token set to eos_token) Data collator: DataCollatorForLanguageModeling with mlm=False How to Use You can load and run inference with this model using the Hugging Face Transformers library:

python

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cpine505/kjv-model") model = AutoModelForCausalLM.from_pretrained("cpine505/kjv-model")

input_text = "In the beginning" inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=64)

outputs = model.generate( inputs["input_ids"], max_length=100, do_sample=True, top_k=50, temperature=0.9, pad_token_id=tokenizer.eos_token_id )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

License Apache 2.0

Acknowledgements Built with Hugging Face Transformers Training data: King James Version of the Bible (public domain) Inspired by Hugging Face documentation and community tutorials

Downloads last month: 3

Safetensors

Model size

7.23M params

Tensor type

F32