Codfskitraceon commited on
Commit
55c6db3
Β·
verified Β·
1 Parent(s): 7c48757

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +302 -290
README.md CHANGED
@@ -1,290 +1,302 @@
1
- <div align="center">
2
-
3
- # 🧲 TRIGNUM-300M
4
-
5
- ### The Pre-Flight Check for Autonomous AI
6
-
7
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
- [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
9
- [![Benchmarked](https://img.shields.io/badge/HaluEval-58%2C293_samples-green.svg)](#-benchmark-results)
10
- [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18672142.svg)](https://doi.org/10.5281/zenodo.18672142)
11
-
12
- > **"You wouldn't let a plane take off without a pre-flight check.**
13
- > **Why are we letting AI agents act without one?"**
14
-
15
- <img src="assets/roadmap_architecture.jpg" width="800" alt="TRIGNUM-300M Architecture Flowchart" />
16
- </div>
17
-
18
- ---
19
-
20
- <div align="center">
21
- <!--
22
- TODO: Add your demo GIF here!
23
- 1. Record demo/index.html with ScreenToGif
24
- 2. Save as assets/trignum_demo.gif
25
- 3. Uncomment line below:
26
- -->
27
- <!-- <img src="assets/trignum_demo.gif" width="800" alt="TRIGNUM-300M Demo" /> -->
28
- </div>
29
-
30
- ## What Is This?
31
-
32
- TRIGNUM-300M is a **zero-model reasoning integrity validator** for LLM outputs. It catches structural logic failures β€” contradictions, circular reasoning, non-sequiturs β€” before an AI agent acts on them.
33
-
34
- ```python
35
- from trignum_core.subtractive_filter import SubtractiveFilter
36
-
37
- sf = SubtractiveFilter()
38
- result = sf.apply(agent_output)
39
-
40
- if result.illogics_found:
41
- agent.halt(reason=result.illogics_found)
42
- # T-CHIP glows RED πŸ”΄ β†’ Human review required
43
- else:
44
- agent.execute()
45
- # T-CHIP glows BLUE πŸ”΅ β†’ Cleared for takeoff
46
- ```
47
-
48
- **No LLM. No API. No training data. ~300 lines of Python. <1ms.**
49
-
50
- ---
51
-
52
- ## πŸ”¬ Benchmark Results
53
-
54
- We expanded our evaluation to **58,000+ real LLM outputs** including a new **517-sample curated dataset** for structural reasoning. Honest results:
55
-
56
- | Benchmark | Samples | Precision | Recall | F1 | Speed |
57
- | ---------------------------- | ------- | --------- | ------ | --------- | ----- |
58
- | **Structural illogic (curated)** | **517** | **100%** | **98.9%** | **99.5%** | **<1ms** |
59
- | HaluEval (full dataset) | 58,293 | 60% | 2.1% | 4.0% | 706ms |
60
-
61
- ### What this means:
62
-
63
- - **99.5% F1 on structural reasoning failures** β€” contradictions, circular logic, unsupported conclusions
64
- - **4.0% F1 on factual hallucinations** β€” we don't catch wrong facts
65
-
66
- **That's the point.** There are 100 tools for fact-checking. There are **zero tools for reasoning-checking.** Until now.
67
-
68
- ### Per-Task Breakdown (HaluEval)
69
-
70
- | Task | n | Precision | Recall | F1 |
71
- | ------------- | ------ | --------- | ------ | ----- |
72
- | QA | 18,316 | 83.3% | 0.25% | 0.50% |
73
- | Dialogue | 19,977 | 60.1% | 4.38% | 8.16% |
74
- | Summarization | 20,000 | 57.4% | 1.60% | 3.11% |
75
-
76
- **Throughput: 146,866 samples/second** β€” orders of magnitude faster than LLM-based validation.
77
-
78
- ---
79
-
80
- ## ✈️ The Pre-Flight Check Analogy
81
-
82
- A pre-flight checklist doesn't verify that London exists. It verifies that:
83
-
84
- - βœ… Instruments don't **contradict** each other
85
- - βœ… There are no **circular faults** (sensor A confirms B confirms A)
86
- - βœ… The flight computer draws **conclusions from actual data**
87
- - βœ… Systems are **logically consistent**
88
-
89
- The Subtractive Filter does the same for AI reasoning:
90
-
91
- ```
92
- LLM Output β†’ Subtractive Filter β†’ [PASS] πŸ”΅ β†’ Agent Executes
93
- β†’ [FAIL] πŸ”΄ β†’ Agent Halts β†’ Human Review
94
- ```
95
-
96
- ---
97
-
98
- ## πŸ€– The Missing "Agentic Validator"
99
-
100
- In the context of the recent shift towards **Agentic Reasoning**, autonomous LLMs are moving from static prompts to dynamic _thought-action_ loops involving planning, tool-use, and multi-agent collaboration.
101
-
102
- Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. **But there has been no validator for pure logic.** If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades.
103
-
104
- TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms **Validator-Driven Feedback** gate. It halts execution if the agent's internal thought (`zt`) contains a structural illogic, providing an immediate failure signal (`rt = 0`) _before_ the agent commits to an irreversible external action (`at`).
105
-
106
- ---
107
-
108
- ## πŸ”Ί Core Architecture
109
-
110
- ### The Trignum Pyramid
111
-
112
- Three faces acting as magnetic poles for data separation:
113
-
114
- | Face | Role | What It Does |
115
- | --------------- | --------------- | ----------------------------------------------------- |
116
- | **Ξ± (Logic)** | Truth detection | Identifies structurally sound reasoning |
117
- | **Ξ² (Illogic)** | Error detection | Catches contradictions, circular logic, non-sequiturs |
118
- | **Ξ³ (Context)** | Human grounding | Anchors output to human intent |
119
-
120
- ### T-CHIP: The Tensor Character
121
-
122
- ```
123
- ╔═══════════════════════════════════════════════════════╗
124
- β•‘ T-CHIP [v.300M] β•‘
125
- β•‘ β•‘
126
- β•‘ πŸ”΅ Blue = Logic Stable (Cleared for Takeoff) β•‘
127
- β•‘ πŸ”΄ Red = Illogic Detected (THE FREEZE) β•‘
128
- β•‘ 🟑 Gold = Human Pulse Locked (Sovereign Override) β•‘
129
- β•‘ β•‘
130
- β•‘ Response time: <1ms | False alarms: 0% (structural) β•‘
131
- β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
132
- ```
133
-
134
- ### The Subtractive Filter
135
-
136
- Four detection layers, all pattern-based:
137
-
138
- | Layer | Catches | Method |
139
- | ------------------ | ------------------------------------ | -------------------------------- |
140
- | **Contradiction** | "X is always true. X is never true." | Antonym pairs, negation patterns |
141
- | **Circular Logic** | A proves B proves A | Reference chain analysis |
142
- | **Non-Sequitur** | "Therefore X" without premises | Causal connective analysis |
143
- | **Depth Check** | Claims without any reasoning | Assertion density scoring |
144
-
145
- ---
146
-
147
- ## πŸ“¦ Repository Structure
148
-
149
- ```
150
- TRIGNUM-300M-TCHIP/
151
- β”œβ”€β”€ src/
152
- β”‚ └── trignum_core/ # Core Python library
153
- β”‚ β”œβ”€β”€ pyramid.py # Trignum Pyramid (3 magnetic faces)
154
- β”‚ β”œβ”€β”€ tchip.py # T-CHIP (glow states)
155
- β”‚ β”œβ”€β”€ subtractive_filter.py # β˜… The Subtractive Filter
156
- β”‚ β”œβ”€β”€ human_pulse.py # Human sovereignty layer
157
- β”‚ └── magnetic_trillage.py # Data separation
158
- β”œβ”€β”€ tests/ # 34 unit tests (all passing)
159
- β”œβ”€β”€ benchmarks/
160
- β”‚ β”œβ”€β”€ hallucination_benchmark.py # Curated structural test
161
- β”‚ β”œβ”€β”€ full_halueval_benchmark.py # Full 58K HaluEval test
162
- β”‚ β”œβ”€β”€ results.json # Structural benchmark results
163
- β”‚ └── full_halueval_results.json # Full HaluEval results
164
- β”œβ”€β”€ demo/
165
- β”‚ └── index.html # Three.js 3D interactive demo
166
- β”œβ”€β”€ paper/
167
- β”‚ └── TRIGNUM_300M_Position_Paper.md # Position paper
168
- β”œβ”€β”€ docs/
169
- β”‚ └── theory/ # 6 foundational theory documents
170
- β”œβ”€β”€ T-CHIP CLEARED FOR TAKEOFF.md # The pitch
171
- └── ROADMAP.md # 2-quarter development plan
172
- ```
173
-
174
- ---
175
-
176
- ## πŸš€ Quick Start
177
-
178
- ```bash
179
- # Clone
180
- git clone https://github.com/trace-on-lab/trignum-300m.git
181
- cd trignum-300m
182
-
183
- # Install
184
- pip install -r requirements.txt
185
- pip install -e .
186
-
187
- # Run the structural benchmark
188
- python benchmarks/hallucination_benchmark.py
189
-
190
- # Run the full HaluEval benchmark (downloads ~13MB of data)
191
- python benchmarks/full_halueval_benchmark.py
192
-
193
- # Run tests
194
- pytest tests/ -v
195
- ```
196
-
197
- ---
198
-
199
- ## 🌐 Prior Art: Nobody Is Doing This
200
-
201
- We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference:
202
-
203
- | System | Requires Model | Validates Reasoning |
204
- | ---------------------------- | :-------------: | :-----------------: |
205
- | VerifyLLM (2025) | βœ… Yes | Partially |
206
- | ContraGen | βœ… Yes | Partially |
207
- | Process Supervision (OpenAI) | βœ… Yes | Yes |
208
- | Guardrails AI | βœ… Configurable | No (content) |
209
- | **Subtractive Filter** | **❌ No** | **βœ… Yes** |
210
-
211
- > **Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.**
212
-
213
- Read the full analysis in our [position paper](paper/TRIGNUM_300M_Position_Paper.md).
214
-
215
- ---
216
-
217
- ## βš›οΈ Quantum Integration: TQPE
218
-
219
- [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18751914.svg)](https://doi.org/10.5281/zenodo.18751914)
220
-
221
- TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for **Trignumental Quantum Phase Estimation (TQPE)**.
222
-
223
- In our groundbreaking case study estimating the ground state energy of the **Hβ‚‚ molecule**, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit _before execution_. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of **82.8%** on the final estimate (-1.1384 Ha).
224
-
225
- Read the full `BUILDING THE BRIDGE` paper on Trignumentality and TQPE in the foundational [Trignumentality](https://github.com/Codfski/trignumentality) repository.
226
-
227
- ---
228
-
229
- ## πŸ“š Documentation
230
-
231
- | Document | Description |
232
- | ---------------------------------------------------------------- | ----------------------------------- |
233
- | [Core Postulate](docs/theory/01_core_postulate.md) | The fundamental axioms of Trignum |
234
- | [Three Faces](docs/theory/02_three_faces.md) | Ξ± (Logic), Ξ² (Illogic), Ξ³ (Context) |
235
- | [Magnetic Trillage](docs/theory/03_magnetic_trillage.md) | Data separation mechanism |
236
- | [T-CHIP Spec](docs/theory/04_tchip_spec.md) | The Tensor Character in detail |
237
- | [Cold State Hardware](docs/theory/05_cold_state_hardware.md) | Hardware implications |
238
- | [Hallucination Paradox](docs/theory/06_hallucination_paradox.md) | Reframing the "Big Monster" |
239
- | [Position Paper](paper/TRIGNUM_300M_Position_Paper.md) | Full academic paper with benchmarks |
240
- | [Roadmap](ROADMAP.md) | 2-quarter development plan |
241
-
242
- ---
243
-
244
- ## πŸ’Ž The Golden Gems
245
-
246
- | Gem | Wisdom |
247
- | ----- | --------------------------------------- |
248
- | GEM 1 | "The Human Pulse is the Master Clock" |
249
- | GEM 2 | "The Illogic is the Compass" |
250
- | GEM 3 | "Magnetic Trillage Over Brute Force" |
251
- | GEM 4 | "The Hallucination is the Raw Material" |
252
- | GEM 5 | "T-CHIP is the Mirror" |
253
-
254
- ---
255
-
256
- ## 🀝 Contributing
257
-
258
- See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
259
-
260
- ---
261
-
262
- ## πŸ“„ License
263
-
264
- MIT License β€” see [LICENSE](LICENSE).
265
-
266
- ---
267
-
268
- ## πŸ“ž Contact
269
-
270
- **TRACE ON LAB**
271
- πŸ“§ traceonlab@proton.me
272
-
273
- ---
274
-
275
- ## πŸ›‘οΈ The Call
276
-
277
- > _"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."_
278
-
279
- ```
280
- ╔═══════════════════════════════════════════════════════╗
281
- β•‘ 🧲 TRACE ON LAB β€” TRIGNUM-300M β€” v.300M β•‘
282
- β•‘ β•‘
283
- β•‘ The Pre-Flight Check for Autonomous AI. β•‘
284
- β•‘ Zero models. Zero API calls. 146,866 samples/second. β•‘
285
- β•‘ β•‘
286
- β•‘ πŸ”΅ T-CHIP: CLEARED FOR TAKEOFF. β•‘
287
- β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
288
- ```
289
-
290
- ⭐ **Star this repo if you believe AI should check its logic before it acts.**
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - zero-shot
7
+ - natural-language-inference
8
+ - self-reflection
9
+ - logic
10
+ - reasoning
11
+ - evaluation
12
+ ---
13
+ <div align="center">
14
+
15
+ # 🧲 TRIGNUM-300M
16
+
17
+ ### The Pre-Flight Check for Autonomous AI
18
+
19
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
20
+ [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
21
+ [![Benchmarked](https://img.shields.io/badge/HaluEval-58%2C293_samples-green.svg)](#-benchmark-results)
22
+ [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18672142.svg)](https://doi.org/10.5281/zenodo.18672142)
23
+
24
+ > **"You wouldn't let a plane take off without a pre-flight check.**
25
+ > **Why are we letting AI agents act without one?"**
26
+
27
+ <img src="assets/roadmap_architecture.jpg" width="800" alt="TRIGNUM-300M Architecture Flowchart" />
28
+ </div>
29
+
30
+ ---
31
+
32
+ <div align="center">
33
+ <!--
34
+ TODO: Add your demo GIF here!
35
+ 1. Record demo/index.html with ScreenToGif
36
+ 2. Save as assets/trignum_demo.gif
37
+ 3. Uncomment line below:
38
+ -->
39
+ <!-- <img src="assets/trignum_demo.gif" width="800" alt="TRIGNUM-300M Demo" /> -->
40
+ </div>
41
+
42
+ ## What Is This?
43
+
44
+ TRIGNUM-300M is a **zero-model reasoning integrity validator** for LLM outputs. It catches structural logic failures β€” contradictions, circular reasoning, non-sequiturs β€” before an AI agent acts on them.
45
+
46
+ ```python
47
+ from trignum_core.subtractive_filter import SubtractiveFilter
48
+
49
+ sf = SubtractiveFilter()
50
+ result = sf.apply(agent_output)
51
+
52
+ if result.illogics_found:
53
+ agent.halt(reason=result.illogics_found)
54
+ # T-CHIP glows RED πŸ”΄ β†’ Human review required
55
+ else:
56
+ agent.execute()
57
+ # T-CHIP glows BLUE πŸ”΅ β†’ Cleared for takeoff
58
+ ```
59
+
60
+ **No LLM. No API. No training data. ~300 lines of Python. <1ms.**
61
+
62
+ ---
63
+
64
+ ## πŸ”¬ Benchmark Results
65
+
66
+ We expanded our evaluation to **58,000+ real LLM outputs** including a new **517-sample curated dataset** for structural reasoning. Honest results:
67
+
68
+ | Benchmark | Samples | Precision | Recall | F1 | Speed |
69
+ | ---------------------------- | ------- | --------- | ------ | --------- | ----- |
70
+ | **Structural illogic (curated)** | **517** | **100%** | **98.9%** | **99.5%** | **<1ms** |
71
+ | HaluEval (full dataset) | 58,293 | 60% | 2.1% | 4.0% | 706ms |
72
+
73
+ ### What this means:
74
+
75
+ - **99.5% F1 on structural reasoning failures** β€” contradictions, circular logic, unsupported conclusions
76
+ - **4.0% F1 on factual hallucinations** β€” we don't catch wrong facts
77
+
78
+ **That's the point.** There are 100 tools for fact-checking. There are **zero tools for reasoning-checking.** Until now.
79
+
80
+ ### Per-Task Breakdown (HaluEval)
81
+
82
+ | Task | n | Precision | Recall | F1 |
83
+ | ------------- | ------ | --------- | ------ | ----- |
84
+ | QA | 18,316 | 83.3% | 0.25% | 0.50% |
85
+ | Dialogue | 19,977 | 60.1% | 4.38% | 8.16% |
86
+ | Summarization | 20,000 | 57.4% | 1.60% | 3.11% |
87
+
88
+ **Throughput: 146,866 samples/second** β€” orders of magnitude faster than LLM-based validation.
89
+
90
+ ---
91
+
92
+ ## ✈️ The Pre-Flight Check Analogy
93
+
94
+ A pre-flight checklist doesn't verify that London exists. It verifies that:
95
+
96
+ - βœ… Instruments don't **contradict** each other
97
+ - βœ… There are no **circular faults** (sensor A confirms B confirms A)
98
+ - βœ… The flight computer draws **conclusions from actual data**
99
+ - βœ… Systems are **logically consistent**
100
+
101
+ The Subtractive Filter does the same for AI reasoning:
102
+
103
+ ```
104
+ LLM Output β†’ Subtractive Filter β†’ [PASS] πŸ”΅ β†’ Agent Executes
105
+ β†’ [FAIL] πŸ”΄ β†’ Agent Halts β†’ Human Review
106
+ ```
107
+
108
+ ---
109
+
110
+ ## πŸ€– The Missing "Agentic Validator"
111
+
112
+ In the context of the recent shift towards **Agentic Reasoning**, autonomous LLMs are moving from static prompts to dynamic _thought-action_ loops involving planning, tool-use, and multi-agent collaboration.
113
+
114
+ Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. **But there has been no validator for pure logic.** If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades.
115
+
116
+ TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms **Validator-Driven Feedback** gate. It halts execution if the agent's internal thought (`zt`) contains a structural illogic, providing an immediate failure signal (`rt = 0`) _before_ the agent commits to an irreversible external action (`at`).
117
+
118
+ ---
119
+
120
+ ## πŸ”Ί Core Architecture
121
+
122
+ ### The Trignum Pyramid
123
+
124
+ Three faces acting as magnetic poles for data separation:
125
+
126
+ | Face | Role | What It Does |
127
+ | --------------- | --------------- | ----------------------------------------------------- |
128
+ | **Ξ± (Logic)** | Truth detection | Identifies structurally sound reasoning |
129
+ | **Ξ² (Illogic)** | Error detection | Catches contradictions, circular logic, non-sequiturs |
130
+ | **Ξ³ (Context)** | Human grounding | Anchors output to human intent |
131
+
132
+ ### T-CHIP: The Tensor Character
133
+
134
+ ```
135
+ ╔═══════════════════════════════════════════════════════╗
136
+ β•‘ T-CHIP [v.300M] β•‘
137
+ β•‘ β•‘
138
+ β•‘ πŸ”΅ Blue = Logic Stable (Cleared for Takeoff) β•‘
139
+ β•‘ πŸ”΄ Red = Illogic Detected (THE FREEZE) β•‘
140
+ β•‘ 🟑 Gold = Human Pulse Locked (Sovereign Override) β•‘
141
+ β•‘ β•‘
142
+ β•‘ Response time: <1ms | False alarms: 0% (structural) β•‘
143
+ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
144
+ ```
145
+
146
+ ### The Subtractive Filter
147
+
148
+ Four detection layers, all pattern-based:
149
+
150
+ | Layer | Catches | Method |
151
+ | ------------------ | ------------------------------------ | -------------------------------- |
152
+ | **Contradiction** | "X is always true. X is never true." | Antonym pairs, negation patterns |
153
+ | **Circular Logic** | A proves B proves A | Reference chain analysis |
154
+ | **Non-Sequitur** | "Therefore X" without premises | Causal connective analysis |
155
+ | **Depth Check** | Claims without any reasoning | Assertion density scoring |
156
+
157
+ ---
158
+
159
+ ## πŸ“¦ Repository Structure
160
+
161
+ ```
162
+ TRIGNUM-300M-TCHIP/
163
+ β”œβ”€β”€ src/
164
+ β”‚ └── trignum_core/ # Core Python library
165
+ β”‚ β”œβ”€β”€ pyramid.py # Trignum Pyramid (3 magnetic faces)
166
+ β”‚ β”œβ”€β”€ tchip.py # T-CHIP (glow states)
167
+ β”‚ β”œβ”€β”€ subtractive_filter.py # β˜… The Subtractive Filter
168
+ β”‚ β”œβ”€β”€ human_pulse.py # Human sovereignty layer
169
+ β”‚ └── magnetic_trillage.py # Data separation
170
+ β”œβ”€β”€ tests/ # 34 unit tests (all passing)
171
+ β”œβ”€β”€ benchmarks/
172
+ β”‚ β”œβ”€β”€ hallucination_benchmark.py # Curated structural test
173
+ β”‚ β”œβ”€β”€ full_halueval_benchmark.py # Full 58K HaluEval test
174
+ β”‚ β”œβ”€β”€ results.json # Structural benchmark results
175
+ β”‚ └── full_halueval_results.json # Full HaluEval results
176
+ β”œβ”€β”€ demo/
177
+ β”‚ └── index.html # Three.js 3D interactive demo
178
+ β”œβ”€β”€ paper/
179
+ β”‚ └── TRIGNUM_300M_Position_Paper.md # Position paper
180
+ β”œβ”€β”€ docs/
181
+ β”‚ └── theory/ # 6 foundational theory documents
182
+ β”œβ”€β”€ T-CHIP CLEARED FOR TAKEOFF.md # The pitch
183
+ └── ROADMAP.md # 2-quarter development plan
184
+ ```
185
+
186
+ ---
187
+
188
+ ## πŸš€ Quick Start
189
+
190
+ ```bash
191
+ # Clone
192
+ git clone https://github.com/trace-on-lab/trignum-300m.git
193
+ cd trignum-300m
194
+
195
+ # Install
196
+ pip install -r requirements.txt
197
+ pip install -e .
198
+
199
+ # Run the structural benchmark
200
+ python benchmarks/hallucination_benchmark.py
201
+
202
+ # Run the full HaluEval benchmark (downloads ~13MB of data)
203
+ python benchmarks/full_halueval_benchmark.py
204
+
205
+ # Run tests
206
+ pytest tests/ -v
207
+ ```
208
+
209
+ ---
210
+
211
+ ## 🌐 Prior Art: Nobody Is Doing This
212
+
213
+ We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference:
214
+
215
+ | System | Requires Model | Validates Reasoning |
216
+ | ---------------------------- | :-------------: | :-----------------: |
217
+ | VerifyLLM (2025) | βœ… Yes | Partially |
218
+ | ContraGen | βœ… Yes | Partially |
219
+ | Process Supervision (OpenAI) | βœ… Yes | Yes |
220
+ | Guardrails AI | βœ… Configurable | No (content) |
221
+ | **Subtractive Filter** | **❌ No** | **βœ… Yes** |
222
+
223
+ > **Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.**
224
+
225
+ Read the full analysis in our [position paper](paper/TRIGNUM_300M_Position_Paper.md).
226
+
227
+ ---
228
+
229
+ ## βš›οΈ Quantum Integration: TQPE
230
+
231
+ [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18751914.svg)](https://doi.org/10.5281/zenodo.18751914)
232
+
233
+ TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for **Trignumental Quantum Phase Estimation (TQPE)**.
234
+
235
+ In our groundbreaking case study estimating the ground state energy of the **Hβ‚‚ molecule**, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit _before execution_. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of **82.8%** on the final estimate (-1.1384 Ha).
236
+
237
+ Read the full `BUILDING THE BRIDGE` paper on Trignumentality and TQPE in the foundational [Trignumentality](https://github.com/Codfski/trignumentality) repository.
238
+
239
+ ---
240
+
241
+ ## πŸ“š Documentation
242
+
243
+ | Document | Description |
244
+ | ---------------------------------------------------------------- | ----------------------------------- |
245
+ | [Core Postulate](docs/theory/01_core_postulate.md) | The fundamental axioms of Trignum |
246
+ | [Three Faces](docs/theory/02_three_faces.md) | Ξ± (Logic), Ξ² (Illogic), Ξ³ (Context) |
247
+ | [Magnetic Trillage](docs/theory/03_magnetic_trillage.md) | Data separation mechanism |
248
+ | [T-CHIP Spec](docs/theory/04_tchip_spec.md) | The Tensor Character in detail |
249
+ | [Cold State Hardware](docs/theory/05_cold_state_hardware.md) | Hardware implications |
250
+ | [Hallucination Paradox](docs/theory/06_hallucination_paradox.md) | Reframing the "Big Monster" |
251
+ | [Position Paper](paper/TRIGNUM_300M_Position_Paper.md) | Full academic paper with benchmarks |
252
+ | [Roadmap](ROADMAP.md) | 2-quarter development plan |
253
+
254
+ ---
255
+
256
+ ## πŸ’Ž The Golden Gems
257
+
258
+ | Gem | Wisdom |
259
+ | ----- | --------------------------------------- |
260
+ | GEM 1 | "The Human Pulse is the Master Clock" |
261
+ | GEM 2 | "The Illogic is the Compass" |
262
+ | GEM 3 | "Magnetic Trillage Over Brute Force" |
263
+ | GEM 4 | "The Hallucination is the Raw Material" |
264
+ | GEM 5 | "T-CHIP is the Mirror" |
265
+
266
+ ---
267
+
268
+ ## 🀝 Contributing
269
+
270
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
271
+
272
+ ---
273
+
274
+ ## πŸ“„ License
275
+
276
+ MIT License β€” see [LICENSE](LICENSE).
277
+
278
+ ---
279
+
280
+ ## πŸ“ž Contact
281
+
282
+ **TRACE ON LAB**
283
+ πŸ“§ traceonlab@proton.me
284
+
285
+ ---
286
+
287
+ ## πŸ›‘οΈ The Call
288
+
289
+ > _"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."_
290
+
291
+ ```
292
+ ╔═══════════════════════════════════════════════════════╗
293
+ β•‘ 🧲 TRACE ON LAB β€” TRIGNUM-300M β€” v.300M β•‘
294
+ β•‘ β•‘
295
+ β•‘ The Pre-Flight Check for Autonomous AI. β•‘
296
+ β•‘ Zero models. Zero API calls. 146,866 samples/second. β•‘
297
+ β•‘ β•‘
298
+ β•‘ πŸ”΅ T-CHIP: CLEARED FOR TAKEOFF. β•‘
299
+ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
300
+ ```
301
+
302
+ ⭐ **Star this repo if you believe AI should check its logic before it acts.**