File size: 12,717 Bytes
55c6db3
 
 
 
 
 
 
 
 
 
 
3f30095
 
55c6db3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
---
license: mit
language:
- en
tags:
- zero-shot
- natural-language-inference
- self-reflection
- logic
- reasoning
- evaluation
- trignum
- trignumentality
---
<div align="center">

# 🧲 TRIGNUM-300M

### The Pre-Flight Check for Autonomous AI

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![Benchmarked](https://img.shields.io/badge/HaluEval-58%2C293_samples-green.svg)](#-benchmark-results)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18672142.svg)](https://doi.org/10.5281/zenodo.18672142)

> **"You wouldn't let a plane take off without a pre-flight check.**  
> **Why are we letting AI agents act without one?"**

<img src="assets/roadmap_architecture.jpg" width="800" alt="TRIGNUM-300M Architecture Flowchart" />
</div>

---

<div align="center">
  <!-- 
    TODO: Add your demo GIF here! 
    1. Record demo/index.html with ScreenToGif
    2. Save as assets/trignum_demo.gif
    3. Uncomment line below:
  -->
  <!-- <img src="assets/trignum_demo.gif" width="800" alt="TRIGNUM-300M Demo" /> -->
</div>

## What Is This?

TRIGNUM-300M is a **zero-model reasoning integrity validator** for LLM outputs. It catches structural logic failures β€” contradictions, circular reasoning, non-sequiturs β€” before an AI agent acts on them.

```python
from trignum_core.subtractive_filter import SubtractiveFilter

sf = SubtractiveFilter()
result = sf.apply(agent_output)

if result.illogics_found:
    agent.halt(reason=result.illogics_found)
    # T-CHIP glows RED πŸ”΄ β†’ Human review required
else:
    agent.execute()
    # T-CHIP glows BLUE πŸ”΅ β†’ Cleared for takeoff
```

**No LLM. No API. No training data. ~300 lines of Python. <1ms.**

---

## πŸ”¬ Benchmark Results

We expanded our evaluation to **58,000+ real LLM outputs** including a new **517-sample curated dataset** for structural reasoning. Honest results:

| Benchmark                    | Samples | Precision | Recall | F1        | Speed |
| ---------------------------- | ------- | --------- | ------ | --------- | ----- |
| **Structural illogic (curated)** | **517**      | **100%**  | **98.9%**    | **99.5%** | **<1ms**  |
| HaluEval (full dataset)      | 58,293  | 60%       | 2.1%   | 4.0%      | 706ms |

### What this means:

- **99.5% F1 on structural reasoning failures** β€” contradictions, circular logic, unsupported conclusions
- **4.0% F1 on factual hallucinations** β€” we don't catch wrong facts

**That's the point.** There are 100 tools for fact-checking. There are **zero tools for reasoning-checking.** Until now.

### Per-Task Breakdown (HaluEval)

| Task          | n      | Precision | Recall | F1    |
| ------------- | ------ | --------- | ------ | ----- |
| QA            | 18,316 | 83.3%     | 0.25%  | 0.50% |
| Dialogue      | 19,977 | 60.1%     | 4.38%  | 8.16% |
| Summarization | 20,000 | 57.4%     | 1.60%  | 3.11% |

**Throughput: 146,866 samples/second** β€” orders of magnitude faster than LLM-based validation.

---

## ✈️ The Pre-Flight Check Analogy

A pre-flight checklist doesn't verify that London exists. It verifies that:

- βœ… Instruments don't **contradict** each other
- βœ… There are no **circular faults** (sensor A confirms B confirms A)
- βœ… The flight computer draws **conclusions from actual data**
- βœ… Systems are **logically consistent**

The Subtractive Filter does the same for AI reasoning:

```
LLM Output β†’ Subtractive Filter β†’ [PASS] πŸ”΅ β†’ Agent Executes
                                 β†’ [FAIL] πŸ”΄ β†’ Agent Halts β†’ Human Review
```

---

## πŸ€– The Missing "Agentic Validator"

In the context of the recent shift towards **Agentic Reasoning**, autonomous LLMs are moving from static prompts to dynamic _thought-action_ loops involving planning, tool-use, and multi-agent collaboration.

Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. **But there has been no validator for pure logic.** If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades.

TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms **Validator-Driven Feedback** gate. It halts execution if the agent's internal thought (`zt`) contains a structural illogic, providing an immediate failure signal (`rt = 0`) _before_ the agent commits to an irreversible external action (`at`).

---

## πŸ”Ί Core Architecture

### The Trignum Pyramid

Three faces acting as magnetic poles for data separation:

| Face            | Role            | What It Does                                          |
| --------------- | --------------- | ----------------------------------------------------- |
| **Ξ± (Logic)**   | Truth detection | Identifies structurally sound reasoning               |
| **Ξ² (Illogic)** | Error detection | Catches contradictions, circular logic, non-sequiturs |
| **Ξ³ (Context)** | Human grounding | Anchors output to human intent                        |

### T-CHIP: The Tensor Character

```
╔═══════════════════════════════════════════════════════╗
β•‘  T-CHIP [v.300M]                                      β•‘
β•‘                                                       β•‘
β•‘  πŸ”΅ Blue  = Logic Stable (Cleared for Takeoff)        β•‘
β•‘  πŸ”΄ Red   = Illogic Detected (THE FREEZE)             β•‘
β•‘  🟑 Gold  = Human Pulse Locked (Sovereign Override)   β•‘
β•‘                                                       β•‘
β•‘  Response time: <1ms | False alarms: 0% (structural)  β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
```

### The Subtractive Filter

Four detection layers, all pattern-based:

| Layer              | Catches                              | Method                           |
| ------------------ | ------------------------------------ | -------------------------------- |
| **Contradiction**  | "X is always true. X is never true." | Antonym pairs, negation patterns |
| **Circular Logic** | A proves B proves A                  | Reference chain analysis         |
| **Non-Sequitur**   | "Therefore X" without premises       | Causal connective analysis       |
| **Depth Check**    | Claims without any reasoning         | Assertion density scoring        |

---

## πŸ“¦ Repository Structure

```
TRIGNUM-300M-TCHIP/
β”œβ”€β”€ src/
β”‚   └── trignum_core/              # Core Python library
β”‚       β”œβ”€β”€ pyramid.py             # Trignum Pyramid (3 magnetic faces)
β”‚       β”œβ”€β”€ tchip.py               # T-CHIP (glow states)
β”‚       β”œβ”€β”€ subtractive_filter.py  # β˜… The Subtractive Filter
β”‚       β”œβ”€β”€ human_pulse.py         # Human sovereignty layer
β”‚       └── magnetic_trillage.py   # Data separation
β”œβ”€β”€ tests/                         # 34 unit tests (all passing)
β”œβ”€β”€ benchmarks/
β”‚   β”œβ”€β”€ hallucination_benchmark.py     # Curated structural test
β”‚   β”œβ”€β”€ full_halueval_benchmark.py     # Full 58K HaluEval test
β”‚   β”œβ”€β”€ results.json                   # Structural benchmark results
β”‚   └── full_halueval_results.json     # Full HaluEval results
β”œβ”€β”€ demo/
β”‚   └── index.html                 # Three.js 3D interactive demo
β”œβ”€β”€ paper/
β”‚   └── TRIGNUM_300M_Position_Paper.md  # Position paper
β”œβ”€β”€ docs/
β”‚   └── theory/                    # 6 foundational theory documents
β”œβ”€β”€ T-CHIP CLEARED FOR TAKEOFF.md  # The pitch
└── ROADMAP.md                     # 2-quarter development plan
```

---

## πŸš€ Quick Start

```bash
# Clone
git clone https://github.com/trace-on-lab/trignum-300m.git
cd trignum-300m

# Install
pip install -r requirements.txt
pip install -e .

# Run the structural benchmark
python benchmarks/hallucination_benchmark.py

# Run the full HaluEval benchmark (downloads ~13MB of data)
python benchmarks/full_halueval_benchmark.py

# Run tests
pytest tests/ -v
```

---

## 🌐 Prior Art: Nobody Is Doing This

We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference:

| System                       | Requires Model  | Validates Reasoning |
| ---------------------------- | :-------------: | :-----------------: |
| VerifyLLM (2025)             |     βœ… Yes      |      Partially      |
| ContraGen                    |     βœ… Yes      |      Partially      |
| Process Supervision (OpenAI) |     βœ… Yes      |         Yes         |
| Guardrails AI                | βœ… Configurable |    No (content)     |
| **Subtractive Filter**       |    **❌ No**    |     **βœ… Yes**      |

> **Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.**

Read the full analysis in our [position paper](paper/TRIGNUM_300M_Position_Paper.md).

---

## βš›οΈ Quantum Integration: TQPE

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18751914.svg)](https://doi.org/10.5281/zenodo.18751914)

TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for **Trignumental Quantum Phase Estimation (TQPE)**.

In our groundbreaking case study estimating the ground state energy of the **Hβ‚‚ molecule**, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit _before execution_. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of **82.8%** on the final estimate (-1.1384 Ha).

Read the full `BUILDING THE BRIDGE` paper on Trignumentality and TQPE in the foundational [Trignumentality](https://github.com/Codfski/trignumentality) repository.

---

## πŸ“š Documentation

| Document                                                         | Description                         |
| ---------------------------------------------------------------- | ----------------------------------- |
| [Core Postulate](docs/theory/01_core_postulate.md)               | The fundamental axioms of Trignum   |
| [Three Faces](docs/theory/02_three_faces.md)                     | Ξ± (Logic), Ξ² (Illogic), Ξ³ (Context) |
| [Magnetic Trillage](docs/theory/03_magnetic_trillage.md)         | Data separation mechanism           |
| [T-CHIP Spec](docs/theory/04_tchip_spec.md)                      | The Tensor Character in detail      |
| [Cold State Hardware](docs/theory/05_cold_state_hardware.md)     | Hardware implications               |
| [Hallucination Paradox](docs/theory/06_hallucination_paradox.md) | Reframing the "Big Monster"         |
| [Position Paper](paper/TRIGNUM_300M_Position_Paper.md)           | Full academic paper with benchmarks |
| [Roadmap](ROADMAP.md)                                            | 2-quarter development plan          |

---

## πŸ’Ž The Golden Gems

| Gem   | Wisdom                                  |
| ----- | --------------------------------------- |
| GEM 1 | "The Human Pulse is the Master Clock"   |
| GEM 2 | "The Illogic is the Compass"            |
| GEM 3 | "Magnetic Trillage Over Brute Force"    |
| GEM 4 | "The Hallucination is the Raw Material" |
| GEM 5 | "T-CHIP is the Mirror"                  |

---

## 🀝 Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

---

## πŸ“„ License

MIT License β€” see [LICENSE](LICENSE).

---

## πŸ“ž Contact

**TRACE ON LAB**  
πŸ“§ traceonlab@proton.me

---

## πŸ›‘οΈ The Call

> _"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."_

```
╔═══════════════════════════════════════════════════════╗
β•‘  🧲 TRACE ON LAB β€” TRIGNUM-300M β€” v.300M              β•‘
β•‘                                                       β•‘
β•‘  The Pre-Flight Check for Autonomous AI.              β•‘
β•‘  Zero models. Zero API calls. 146,866 samples/second. β•‘
β•‘                                                       β•‘
β•‘  πŸ”΅ T-CHIP: CLEARED FOR TAKEOFF.                      β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
```

⭐ **Star this repo if you believe AI should check its logic before it acts.**