File size: 12,717 Bytes
55c6db3 3f30095 55c6db3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 | ---
license: mit
language:
- en
tags:
- zero-shot
- natural-language-inference
- self-reflection
- logic
- reasoning
- evaluation
- trignum
- trignumentality
---
<div align="center">
# π§² TRIGNUM-300M
### The Pre-Flight Check for Autonomous AI
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](#-benchmark-results)
[](https://doi.org/10.5281/zenodo.18672142)
> **"You wouldn't let a plane take off without a pre-flight check.**
> **Why are we letting AI agents act without one?"**
<img src="assets/roadmap_architecture.jpg" width="800" alt="TRIGNUM-300M Architecture Flowchart" />
</div>
---
<div align="center">
<!--
TODO: Add your demo GIF here!
1. Record demo/index.html with ScreenToGif
2. Save as assets/trignum_demo.gif
3. Uncomment line below:
-->
<!-- <img src="assets/trignum_demo.gif" width="800" alt="TRIGNUM-300M Demo" /> -->
</div>
## What Is This?
TRIGNUM-300M is a **zero-model reasoning integrity validator** for LLM outputs. It catches structural logic failures β contradictions, circular reasoning, non-sequiturs β before an AI agent acts on them.
```python
from trignum_core.subtractive_filter import SubtractiveFilter
sf = SubtractiveFilter()
result = sf.apply(agent_output)
if result.illogics_found:
agent.halt(reason=result.illogics_found)
# T-CHIP glows RED π΄ β Human review required
else:
agent.execute()
# T-CHIP glows BLUE π΅ β Cleared for takeoff
```
**No LLM. No API. No training data. ~300 lines of Python. <1ms.**
---
## π¬ Benchmark Results
We expanded our evaluation to **58,000+ real LLM outputs** including a new **517-sample curated dataset** for structural reasoning. Honest results:
| Benchmark | Samples | Precision | Recall | F1 | Speed |
| ---------------------------- | ------- | --------- | ------ | --------- | ----- |
| **Structural illogic (curated)** | **517** | **100%** | **98.9%** | **99.5%** | **<1ms** |
| HaluEval (full dataset) | 58,293 | 60% | 2.1% | 4.0% | 706ms |
### What this means:
- **99.5% F1 on structural reasoning failures** β contradictions, circular logic, unsupported conclusions
- **4.0% F1 on factual hallucinations** β we don't catch wrong facts
**That's the point.** There are 100 tools for fact-checking. There are **zero tools for reasoning-checking.** Until now.
### Per-Task Breakdown (HaluEval)
| Task | n | Precision | Recall | F1 |
| ------------- | ------ | --------- | ------ | ----- |
| QA | 18,316 | 83.3% | 0.25% | 0.50% |
| Dialogue | 19,977 | 60.1% | 4.38% | 8.16% |
| Summarization | 20,000 | 57.4% | 1.60% | 3.11% |
**Throughput: 146,866 samples/second** β orders of magnitude faster than LLM-based validation.
---
## βοΈ The Pre-Flight Check Analogy
A pre-flight checklist doesn't verify that London exists. It verifies that:
- β
Instruments don't **contradict** each other
- β
There are no **circular faults** (sensor A confirms B confirms A)
- β
The flight computer draws **conclusions from actual data**
- β
Systems are **logically consistent**
The Subtractive Filter does the same for AI reasoning:
```
LLM Output β Subtractive Filter β [PASS] π΅ β Agent Executes
β [FAIL] π΄ β Agent Halts β Human Review
```
---
## π€ The Missing "Agentic Validator"
In the context of the recent shift towards **Agentic Reasoning**, autonomous LLMs are moving from static prompts to dynamic _thought-action_ loops involving planning, tool-use, and multi-agent collaboration.
Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. **But there has been no validator for pure logic.** If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades.
TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms **Validator-Driven Feedback** gate. It halts execution if the agent's internal thought (`zt`) contains a structural illogic, providing an immediate failure signal (`rt = 0`) _before_ the agent commits to an irreversible external action (`at`).
---
## πΊ Core Architecture
### The Trignum Pyramid
Three faces acting as magnetic poles for data separation:
| Face | Role | What It Does |
| --------------- | --------------- | ----------------------------------------------------- |
| **Ξ± (Logic)** | Truth detection | Identifies structurally sound reasoning |
| **Ξ² (Illogic)** | Error detection | Catches contradictions, circular logic, non-sequiturs |
| **Ξ³ (Context)** | Human grounding | Anchors output to human intent |
### T-CHIP: The Tensor Character
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β T-CHIP [v.300M] β
β β
β π΅ Blue = Logic Stable (Cleared for Takeoff) β
β π΄ Red = Illogic Detected (THE FREEZE) β
β π‘ Gold = Human Pulse Locked (Sovereign Override) β
β β
β Response time: <1ms | False alarms: 0% (structural) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### The Subtractive Filter
Four detection layers, all pattern-based:
| Layer | Catches | Method |
| ------------------ | ------------------------------------ | -------------------------------- |
| **Contradiction** | "X is always true. X is never true." | Antonym pairs, negation patterns |
| **Circular Logic** | A proves B proves A | Reference chain analysis |
| **Non-Sequitur** | "Therefore X" without premises | Causal connective analysis |
| **Depth Check** | Claims without any reasoning | Assertion density scoring |
---
## π¦ Repository Structure
```
TRIGNUM-300M-TCHIP/
βββ src/
β βββ trignum_core/ # Core Python library
β βββ pyramid.py # Trignum Pyramid (3 magnetic faces)
β βββ tchip.py # T-CHIP (glow states)
β βββ subtractive_filter.py # β
The Subtractive Filter
β βββ human_pulse.py # Human sovereignty layer
β βββ magnetic_trillage.py # Data separation
βββ tests/ # 34 unit tests (all passing)
βββ benchmarks/
β βββ hallucination_benchmark.py # Curated structural test
β βββ full_halueval_benchmark.py # Full 58K HaluEval test
β βββ results.json # Structural benchmark results
β βββ full_halueval_results.json # Full HaluEval results
βββ demo/
β βββ index.html # Three.js 3D interactive demo
βββ paper/
β βββ TRIGNUM_300M_Position_Paper.md # Position paper
βββ docs/
β βββ theory/ # 6 foundational theory documents
βββ T-CHIP CLEARED FOR TAKEOFF.md # The pitch
βββ ROADMAP.md # 2-quarter development plan
```
---
## π Quick Start
```bash
# Clone
git clone https://github.com/trace-on-lab/trignum-300m.git
cd trignum-300m
# Install
pip install -r requirements.txt
pip install -e .
# Run the structural benchmark
python benchmarks/hallucination_benchmark.py
# Run the full HaluEval benchmark (downloads ~13MB of data)
python benchmarks/full_halueval_benchmark.py
# Run tests
pytest tests/ -v
```
---
## π Prior Art: Nobody Is Doing This
We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference:
| System | Requires Model | Validates Reasoning |
| ---------------------------- | :-------------: | :-----------------: |
| VerifyLLM (2025) | β
Yes | Partially |
| ContraGen | β
Yes | Partially |
| Process Supervision (OpenAI) | β
Yes | Yes |
| Guardrails AI | β
Configurable | No (content) |
| **Subtractive Filter** | **β No** | **β
Yes** |
> **Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.**
Read the full analysis in our [position paper](paper/TRIGNUM_300M_Position_Paper.md).
---
## βοΈ Quantum Integration: TQPE
[](https://doi.org/10.5281/zenodo.18751914)
TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for **Trignumental Quantum Phase Estimation (TQPE)**.
In our groundbreaking case study estimating the ground state energy of the **Hβ molecule**, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit _before execution_. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of **82.8%** on the final estimate (-1.1384 Ha).
Read the full `BUILDING THE BRIDGE` paper on Trignumentality and TQPE in the foundational [Trignumentality](https://github.com/Codfski/trignumentality) repository.
---
## π Documentation
| Document | Description |
| ---------------------------------------------------------------- | ----------------------------------- |
| [Core Postulate](docs/theory/01_core_postulate.md) | The fundamental axioms of Trignum |
| [Three Faces](docs/theory/02_three_faces.md) | Ξ± (Logic), Ξ² (Illogic), Ξ³ (Context) |
| [Magnetic Trillage](docs/theory/03_magnetic_trillage.md) | Data separation mechanism |
| [T-CHIP Spec](docs/theory/04_tchip_spec.md) | The Tensor Character in detail |
| [Cold State Hardware](docs/theory/05_cold_state_hardware.md) | Hardware implications |
| [Hallucination Paradox](docs/theory/06_hallucination_paradox.md) | Reframing the "Big Monster" |
| [Position Paper](paper/TRIGNUM_300M_Position_Paper.md) | Full academic paper with benchmarks |
| [Roadmap](ROADMAP.md) | 2-quarter development plan |
---
## π The Golden Gems
| Gem | Wisdom |
| ----- | --------------------------------------- |
| GEM 1 | "The Human Pulse is the Master Clock" |
| GEM 2 | "The Illogic is the Compass" |
| GEM 3 | "Magnetic Trillage Over Brute Force" |
| GEM 4 | "The Hallucination is the Raw Material" |
| GEM 5 | "T-CHIP is the Mirror" |
---
## π€ Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
---
## π License
MIT License β see [LICENSE](LICENSE).
---
## π Contact
**TRACE ON LAB**
π§ traceonlab@proton.me
---
## π‘οΈ The Call
> _"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."_
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π§² TRACE ON LAB β TRIGNUM-300M β v.300M β
β β
β The Pre-Flight Check for Autonomous AI. β
β Zero models. Zero API calls. 146,866 samples/second. β
β β
β π΅ T-CHIP: CLEARED FOR TAKEOFF. β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
β **Star this repo if you believe AI should check its logic before it acts.**
|