File size: 12,717 Bytes

---
license: mit
language:
- en
tags:
- zero-shot
- natural-language-inference
- self-reflection
- logic
- reasoning
- evaluation
- trignum
- trignumentality
---
<div align="center">

# 🧲 TRIGNUM-300M

### The Pre-Flight Check for Autonomous AI

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![Benchmarked](https://img.shields.io/badge/HaluEval-58%2C293_samples-green.svg)](#-benchmark-results)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18672142.svg)](https://doi.org/10.5281/zenodo.18672142)

> **"You wouldn't let a plane take off without a pre-flight check.**  
> **Why are we letting AI agents act without one?"**

<img src="assets/roadmap_architecture.jpg" width="800" alt="TRIGNUM-300M Architecture Flowchart" />
</div>

---

<div align="center">
  <!-- 
    TODO: Add your demo GIF here! 
    1. Record demo/index.html with ScreenToGif
    2. Save as assets/trignum_demo.gif
    3. Uncomment line below:
  -->
  <!-- <img src="assets/trignum_demo.gif" width="800" alt="TRIGNUM-300M Demo" /> -->
</div>

## What Is This?

TRIGNUM-300M is a **zero-model reasoning integrity validator** for LLM outputs. It catches structural logic failures — contradictions, circular reasoning, non-sequiturs — before an AI agent acts on them.

```python
from trignum_core.subtractive_filter import SubtractiveFilter

sf = SubtractiveFilter()
result = sf.apply(agent_output)

if result.illogics_found:
    agent.halt(reason=result.illogics_found)
    # T-CHIP glows RED 🔴 → Human review required
else:
    agent.execute()
    # T-CHIP glows BLUE 🔵 → Cleared for takeoff
```

**No LLM. No API. No training data. ~300 lines of Python. <1ms.**

---

## 🔬 Benchmark Results

We expanded our evaluation to **58,000+ real LLM outputs** including a new **517-sample curated dataset** for structural reasoning. Honest results:

| Benchmark                    | Samples | Precision | Recall | F1        | Speed |
| ---------------------------- | ------- | --------- | ------ | --------- | ----- |
| **Structural illogic (curated)** | **517**      | **100%**  | **98.9%**    | **99.5%** | **<1ms**  |
| HaluEval (full dataset)      | 58,293  | 60%       | 2.1%   | 4.0%      | 706ms |

### What this means:

- **99.5% F1 on structural reasoning failures** — contradictions, circular logic, unsupported conclusions
- **4.0% F1 on factual hallucinations** — we don't catch wrong facts

**That's the point.** There are 100 tools for fact-checking. There are **zero tools for reasoning-checking.** Until now.

### Per-Task Breakdown (HaluEval)

| Task          | n      | Precision | Recall | F1    |
| ------------- | ------ | --------- | ------ | ----- |
| QA            | 18,316 | 83.3%     | 0.25%  | 0.50% |
| Dialogue      | 19,977 | 60.1%     | 4.38%  | 8.16% |
| Summarization | 20,000 | 57.4%     | 1.60%  | 3.11% |

**Throughput: 146,866 samples/second** — orders of magnitude faster than LLM-based validation.

---

## ✈️ The Pre-Flight Check Analogy

A pre-flight checklist doesn't verify that London exists. It verifies that:

- ✅ Instruments don't **contradict** each other
- ✅ There are no **circular faults** (sensor A confirms B confirms A)
- ✅ The flight computer draws **conclusions from actual data**
- ✅ Systems are **logically consistent**

The Subtractive Filter does the same for AI reasoning:

```
LLM Output → Subtractive Filter → [PASS] 🔵 → Agent Executes
                                 → [FAIL] 🔴 → Agent Halts → Human Review
```

---

## 🤖 The Missing "Agentic Validator"

In the context of the recent shift towards **Agentic Reasoning**, autonomous LLMs are moving from static prompts to dynamic _thought-action_ loops involving planning, tool-use, and multi-agent collaboration.

Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. **But there has been no validator for pure logic.** If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades.

TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms **Validator-Driven Feedback** gate. It halts execution if the agent's internal thought (`zt`) contains a structural illogic, providing an immediate failure signal (`rt = 0`) _before_ the agent commits to an irreversible external action (`at`).

---

## 🔺 Core Architecture

### The Trignum Pyramid

Three faces acting as magnetic poles for data separation:

| Face            | Role            | What It Does                                          |
| --------------- | --------------- | ----------------------------------------------------- |
| **α (Logic)**   | Truth detection | Identifies structurally sound reasoning               |
| **β (Illogic)** | Error detection | Catches contradictions, circular logic, non-sequiturs |
| **γ (Context)** | Human grounding | Anchors output to human intent                        |

### T-CHIP: The Tensor Character

```
╔═══════════════════════════════════════════════════════╗
║  T-CHIP [v.300M]                                      ║
║                                                       ║
║  🔵 Blue  = Logic Stable (Cleared for Takeoff)        ║
║  🔴 Red   = Illogic Detected (THE FREEZE)             ║
║  🟡 Gold  = Human Pulse Locked (Sovereign Override)   ║
║                                                       ║
║  Response time: <1ms | False alarms: 0% (structural)  ║
╚═══════════════════════════════════════════════════════╝
```

### The Subtractive Filter

Four detection layers, all pattern-based:

| Layer              | Catches                              | Method                           |
| ------------------ | ------------------------------------ | -------------------------------- |
| **Contradiction**  | "X is always true. X is never true." | Antonym pairs, negation patterns |
| **Circular Logic** | A proves B proves A                  | Reference chain analysis         |
| **Non-Sequitur**   | "Therefore X" without premises       | Causal connective analysis       |
| **Depth Check**    | Claims without any reasoning         | Assertion density scoring        |

---

## 📦 Repository Structure

```
TRIGNUM-300M-TCHIP/
├── src/
│   └── trignum_core/              # Core Python library
│       ├── pyramid.py             # Trignum Pyramid (3 magnetic faces)
│       ├── tchip.py               # T-CHIP (glow states)
│       ├── subtractive_filter.py  # ★ The Subtractive Filter
│       ├── human_pulse.py         # Human sovereignty layer
│       └── magnetic_trillage.py   # Data separation
├── tests/                         # 34 unit tests (all passing)
├── benchmarks/
│   ├── hallucination_benchmark.py     # Curated structural test
│   ├── full_halueval_benchmark.py     # Full 58K HaluEval test
│   ├── results.json                   # Structural benchmark results
│   └── full_halueval_results.json     # Full HaluEval results
├── demo/
│   └── index.html                 # Three.js 3D interactive demo
├── paper/
│   └── TRIGNUM_300M_Position_Paper.md  # Position paper
├── docs/
│   └── theory/                    # 6 foundational theory documents
├── T-CHIP CLEARED FOR TAKEOFF.md  # The pitch
└── ROADMAP.md                     # 2-quarter development plan
```

---

## 🚀 Quick Start

```bash
# Clone
git clone https://github.com/trace-on-lab/trignum-300m.git
cd trignum-300m

# Install
pip install -r requirements.txt
pip install -e .

# Run the structural benchmark
python benchmarks/hallucination_benchmark.py

# Run the full HaluEval benchmark (downloads ~13MB of data)
python benchmarks/full_halueval_benchmark.py

# Run tests
pytest tests/ -v
```

---

## 🌐 Prior Art: Nobody Is Doing This

We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference:

| System                       | Requires Model  | Validates Reasoning |
| ---------------------------- | :-------------: | :-----------------: |
| VerifyLLM (2025)             |     ✅ Yes      |      Partially      |
| ContraGen                    |     ✅ Yes      |      Partially      |
| Process Supervision (OpenAI) |     ✅ Yes      |         Yes         |
| Guardrails AI                | ✅ Configurable |    No (content)     |
| **Subtractive Filter**       |    **❌ No**    |     **✅ Yes**      |

> **Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.**

Read the full analysis in our [position paper](paper/TRIGNUM_300M_Position_Paper.md).

---

## ⚛️ Quantum Integration: TQPE

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18751914.svg)](https://doi.org/10.5281/zenodo.18751914)

TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for **Trignumental Quantum Phase Estimation (TQPE)**.

In our groundbreaking case study estimating the ground state energy of the **H₂ molecule**, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit _before execution_. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of **82.8%** on the final estimate (-1.1384 Ha).

Read the full `BUILDING THE BRIDGE` paper on Trignumentality and TQPE in the foundational [Trignumentality](https://github.com/Codfski/trignumentality) repository.

---

## 📚 Documentation

| Document                                                         | Description                         |
| ---------------------------------------------------------------- | ----------------------------------- |
| [Core Postulate](docs/theory/01_core_postulate.md)               | The fundamental axioms of Trignum   |
| [Three Faces](docs/theory/02_three_faces.md)                     | α (Logic), β (Illogic), γ (Context) |
| [Magnetic Trillage](docs/theory/03_magnetic_trillage.md)         | Data separation mechanism           |
| [T-CHIP Spec](docs/theory/04_tchip_spec.md)                      | The Tensor Character in detail      |
| [Cold State Hardware](docs/theory/05_cold_state_hardware.md)     | Hardware implications               |
| [Hallucination Paradox](docs/theory/06_hallucination_paradox.md) | Reframing the "Big Monster"         |
| [Position Paper](paper/TRIGNUM_300M_Position_Paper.md)           | Full academic paper with benchmarks |
| [Roadmap](ROADMAP.md)                                            | 2-quarter development plan          |

---

## 💎 The Golden Gems

| Gem   | Wisdom                                  |
| ----- | --------------------------------------- |
| GEM 1 | "The Human Pulse is the Master Clock"   |
| GEM 2 | "The Illogic is the Compass"            |
| GEM 3 | "Magnetic Trillage Over Brute Force"    |
| GEM 4 | "The Hallucination is the Raw Material" |
| GEM 5 | "T-CHIP is the Mirror"                  |

---

## 🤝 Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

---

## 📄 License

MIT License — see [LICENSE](LICENSE).

---

## 📞 Contact

**TRACE ON LAB**  
📧 traceonlab@proton.me

---

## 🛡️ The Call

> _"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."_

```
╔═══════════════════════════════════════════════════════╗
║  🧲 TRACE ON LAB — TRIGNUM-300M — v.300M              ║
║                                                       ║
║  The Pre-Flight Check for Autonomous AI.              ║
║  Zero models. Zero API calls. 146,866 samples/second. ║
║                                                       ║
║  🔵 T-CHIP: CLEARED FOR TAKEOFF.                      ║
╚═══════════════════════════════════════════════════════╝
```

⭐ **Star this repo if you believe AI should check its logic before it acts.**