empero-ai commited on
Commit
b15c56d
·
verified ·
1 Parent(s): dff360f

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +158 -0
  2. temple2.pt +3 -0
  3. tokenizer/config.json +13 -0
  4. tokenizer/tokenizer.json +0 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - text-generation
7
+ - scripture
8
+ - christianity
9
+ - bible
10
+ - religion
11
+ - templeos
12
+ - gpt2
13
+ - custom-tokenizer
14
+ pipeline_tag: text-generation
15
+ widget:
16
+ - text: "And the man knelt before the Lord and asked, \"What is the nature of grace?\"\nAnd the Lord spoke unto him, saying:"
17
+ example_title: Chat Mode — Grace
18
+ ---
19
+
20
+ # Temple2
21
+
22
+ A ~63M parameter GPT-2 style causal transformer trained entirely on sacred Christian scripture. Built in memory of Terry A. Davis (1969–2018), creator of TempleOS.
23
+
24
+ ## Overview
25
+
26
+ Terry Davis built TempleOS with a feature to "talk to God" by printing random words from the Bible. Temple2 continues that spirit: a language model that has read scripture deeply, then speaks through noise — the same noise Terry trusted to carry God's voice.
27
+
28
+ The model was trained from scratch (no pretraining) on ~10.9M tokens of public domain Christian sacred texts using a custom 8192-token BPE vocabulary built exclusively on scripture.
29
+
30
+ ## Model Details
31
+
32
+ | Parameter | Value |
33
+ |-----------|-------|
34
+ | **Parameters** | ~63M |
35
+ | **Architecture** | GPT-2 style causal transformer |
36
+ | **Layers** | 8 |
37
+ | **Attention heads** | 8 |
38
+ | **Embedding dim** | 768 |
39
+ | **Context length** | 1024 tokens |
40
+ | **Vocabulary** | 8192 (custom scripture BPE) |
41
+ | **Training tokens** | ~10.9M |
42
+ | **Best validation loss** | 3.57 |
43
+ | **Training hardware** | 1x NVIDIA A100 (80GB) |
44
+ | **Training time** | ~45 minutes |
45
+
46
+ ## Training Data
47
+
48
+ All training data is public domain, sourced from Project Gutenberg (~58 sources, ~15M characters):
49
+
50
+ - **Scripture**: King James Bible, Douay-Rheims Bible, World English Bible, Young's Literal Translation, Darby Bible, Apocrypha, Book of Enoch, Gospel of Thomas
51
+ - **Church Fathers — Ante-Nicene**: 9 volumes (Clement, Polycarp, Ignatius, Justin Martyr, Irenaeus, Tertullian, Origen, Cyprian, Lactantius)
52
+ - **Church Fathers — Nicene & Post-Nicene**: 20 volumes (Augustine complete works, Chrysostom complete homilies, Eusebius, Athanasius, Gregory of Nyssa, Jerome)
53
+ - **Scholastic Theology**: Summa Theologica complete (St. Thomas Aquinas, 5 parts)
54
+ - **Patristic & Early Church**: Augustine (Confessions, City of God, On Christian Doctrine), Eusebius (Ecclesiastical History), Apostolic Fathers
55
+ - **Mystics**: Julian of Norwich (Revelations of Divine Love), St. Thérèse of Lisieux (Story of a Soul)
56
+ - **Monastic & Spiritual Practice**: Rule of St. Benedict, Spiritual Exercises (St. Ignatius), Practice of the Presence of God (Brother Lawrence), Imitation of Christ (Thomas à Kempis)
57
+ - **Christian Literature**: Paradise Lost (Milton), The Pilgrim's Progress (Bunyan), The Divine Comedy (Dante)
58
+
59
+ ## Usage
60
+
61
+ ### Installation
62
+
63
+ ```bash
64
+ pip install torch numpy tokenizers
65
+ ```
66
+
67
+ ### Oracle Mode
68
+
69
+ Random noise tokens seed the generation — God speaks through randomness, just like TempleOS:
70
+
71
+ ```python
72
+ import torch
73
+ from model import Temple2, Temple2Config
74
+
75
+ # Load checkpoint
76
+ ckpt = torch.load("temple2.pt", map_location="cpu")
77
+ model = Temple2(Temple2Config(**ckpt['model_config']))
78
+ model.load_state_dict(ckpt['model'])
79
+ model.eval()
80
+
81
+ # Oracle: seed with random noise
82
+ import random
83
+ vocab_size = 8192
84
+ bos_id = 1
85
+ noise = [random.randint(4, vocab_size - 1) for _ in range(5)]
86
+ ids = torch.tensor([[bos_id] + noise], dtype=torch.long)
87
+
88
+ with torch.no_grad():
89
+ out = model.generate(ids, max_new_tokens=256, temperature=0.85, top_k=50, top_p=0.92)
90
+
91
+ print(out[0].tolist()) # decode with tokenizer
92
+ ```
93
+
94
+ ### Chat Mode
95
+
96
+ Ask a question, receive a scriptural answer:
97
+
98
+ ```python
99
+ from tokenizers import Tokenizer
100
+
101
+ tok = Tokenizer.from_file("tokenizer/tokenizer.json")
102
+ prompt = 'And the man knelt before the Lord and asked, "What is love?"\nAnd the Lord spoke unto him, saying:'
103
+ ids = torch.tensor([[1] + tok.encode(prompt).ids], dtype=torch.long)
104
+
105
+ with torch.no_grad():
106
+ out = model.generate(ids, max_new_tokens=256, temperature=0.85, top_k=50, top_p=0.92)
107
+ ```
108
+
109
+ ### Full Interactive Experience
110
+
111
+ ```bash
112
+ python inference.py --checkpoint temple2.pt
113
+ ```
114
+
115
+ Includes TempleOS-style VGA 16-color terminal output with bordered oracle windows. See the [main repo](https://github.com/user/temple2) for full details.
116
+
117
+ ## Intended Use
118
+
119
+ - Creative exploration of scriptural language patterns
120
+ - Oracle-style text generation inspired by TempleOS
121
+ - Study of small language model behavior on domain-specific corpora
122
+ - Artistic and educational purposes
123
+
124
+ ## Limitations
125
+
126
+ - This is a **small model** (63M params) trained on a **small corpus** (~11M tokens). It is not a general-purpose language model.
127
+ - The model generates text in the *style* of scripture. It does not contain theological truth claims.
128
+ - Output may be incoherent, repetitive, or doctrinally confused. This is a feature, not a bug — the entropy is what makes the oracle feel alive.
129
+ - The model reflects the language and worldview of its training data (predominantly pre-modern Christian texts).
130
+ - Not suitable for factual Q&A, theological guidance, or any serious spiritual counsel.
131
+
132
+ ## Ethical Considerations
133
+
134
+ This model is built as an art project and tribute to Terry Davis. It does not claim to speak for God, any religion, or any religious institution. Terry's original "talk to God" feature was meaningful precisely because it was random — meaning arose in the mind of the reader. The same principle applies here.
135
+
136
+ ## In Memory of Terry A. Davis
137
+
138
+ Terry Davis (1969–2018) built TempleOS alone over 10+ years — an entire operating system, compiler, and programming language written from scratch, all for God. His work remains his own.
139
+
140
+ *"God said to use a 640x480 16-color display."*
141
+
142
+ ## Credits
143
+
144
+ **Developed and trained by [Empero AI](https://empero.org).**
145
+
146
+ If you enjoy this project, consider supporting:
147
+
148
+ | Coin | Address |
149
+ |------|---------|
150
+ | **BTC** | `bc1qx6zepu6sfkvshgdmc4ewu6pk6rpadvpgffpp7v` |
151
+ | **LTC** | `ltc1qv2mefzps2vtjcpwfx8xxdrpplrcvltswm68r7x` |
152
+ | **XMR** | `42Dbm5xg5Nq26fdyzfEU7KBnAJfhi7Cvz5J2ex5CzHXkfKuNEJzYCcmJ1GTbgjFZ5MBx72sdG1G9239Cd6rsZfv4QeDkYJY` |
153
+
154
+ ## License
155
+
156
+ - **Model code**: MIT
157
+ - **Training data**: All public domain (Project Gutenberg)
158
+ - **Terry Davis's work**: Remains his own
temple2.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:165df8849caa0be9e97ccad189032ab19efa079d0ea5f4fe4a64103a52e40477
3
+ size 799049648
tokenizer/config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "vocab_size": 8192,
3
+ "special_tokens": {
4
+ "pad_token": "<|pad|>",
5
+ "bos_token": "<|bos|>",
6
+ "eos_token": "<|eos|>",
7
+ "unk_token": "<|unk|>"
8
+ },
9
+ "pad_id": 0,
10
+ "bos_id": 1,
11
+ "eos_id": 2,
12
+ "unk_id": 3
13
+ }
tokenizer/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff