--- license: mit language: - en tags: - zero-shot - natural-language-inference - self-reflection - logic - reasoning - evaluation - trignum - trignumentality ---
# 🧲 TRIGNUM-300M ### The Pre-Flight Check for Autonomous AI [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/) [![Benchmarked](https://img.shields.io/badge/HaluEval-58%2C293_samples-green.svg)](#-benchmark-results) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18672142.svg)](https://doi.org/10.5281/zenodo.18672142) > **"You wouldn't let a plane take off without a pre-flight check.** > **Why are we letting AI agents act without one?"** TRIGNUM-300M Architecture Flowchart
---
## What Is This? TRIGNUM-300M is a **zero-model reasoning integrity validator** for LLM outputs. It catches structural logic failures — contradictions, circular reasoning, non-sequiturs — before an AI agent acts on them. ```python from trignum_core.subtractive_filter import SubtractiveFilter sf = SubtractiveFilter() result = sf.apply(agent_output) if result.illogics_found: agent.halt(reason=result.illogics_found) # T-CHIP glows RED 🔴 → Human review required else: agent.execute() # T-CHIP glows BLUE 🔵 → Cleared for takeoff ``` **No LLM. No API. No training data. ~300 lines of Python. <1ms.** --- ## 🔬 Benchmark Results We expanded our evaluation to **58,000+ real LLM outputs** including a new **517-sample curated dataset** for structural reasoning. Honest results: | Benchmark | Samples | Precision | Recall | F1 | Speed | | ---------------------------- | ------- | --------- | ------ | --------- | ----- | | **Structural illogic (curated)** | **517** | **100%** | **98.9%** | **99.5%** | **<1ms** | | HaluEval (full dataset) | 58,293 | 60% | 2.1% | 4.0% | 706ms | ### What this means: - **99.5% F1 on structural reasoning failures** — contradictions, circular logic, unsupported conclusions - **4.0% F1 on factual hallucinations** — we don't catch wrong facts **That's the point.** There are 100 tools for fact-checking. There are **zero tools for reasoning-checking.** Until now. ### Per-Task Breakdown (HaluEval) | Task | n | Precision | Recall | F1 | | ------------- | ------ | --------- | ------ | ----- | | QA | 18,316 | 83.3% | 0.25% | 0.50% | | Dialogue | 19,977 | 60.1% | 4.38% | 8.16% | | Summarization | 20,000 | 57.4% | 1.60% | 3.11% | **Throughput: 146,866 samples/second** — orders of magnitude faster than LLM-based validation. --- ## ✈️ The Pre-Flight Check Analogy A pre-flight checklist doesn't verify that London exists. It verifies that: - ✅ Instruments don't **contradict** each other - ✅ There are no **circular faults** (sensor A confirms B confirms A) - ✅ The flight computer draws **conclusions from actual data** - ✅ Systems are **logically consistent** The Subtractive Filter does the same for AI reasoning: ``` LLM Output → Subtractive Filter → [PASS] 🔵 → Agent Executes → [FAIL] 🔴 → Agent Halts → Human Review ``` --- ## 🤖 The Missing "Agentic Validator" In the context of the recent shift towards **Agentic Reasoning**, autonomous LLMs are moving from static prompts to dynamic _thought-action_ loops involving planning, tool-use, and multi-agent collaboration. Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. **But there has been no validator for pure logic.** If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades. TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms **Validator-Driven Feedback** gate. It halts execution if the agent's internal thought (`zt`) contains a structural illogic, providing an immediate failure signal (`rt = 0`) _before_ the agent commits to an irreversible external action (`at`). --- ## 🔺 Core Architecture ### The Trignum Pyramid Three faces acting as magnetic poles for data separation: | Face | Role | What It Does | | --------------- | --------------- | ----------------------------------------------------- | | **α (Logic)** | Truth detection | Identifies structurally sound reasoning | | **β (Illogic)** | Error detection | Catches contradictions, circular logic, non-sequiturs | | **γ (Context)** | Human grounding | Anchors output to human intent | ### T-CHIP: The Tensor Character ``` ╔═══════════════════════════════════════════════════════╗ ║ T-CHIP [v.300M] ║ ║ ║ ║ 🔵 Blue = Logic Stable (Cleared for Takeoff) ║ ║ 🔴 Red = Illogic Detected (THE FREEZE) ║ ║ 🟡 Gold = Human Pulse Locked (Sovereign Override) ║ ║ ║ ║ Response time: <1ms | False alarms: 0% (structural) ║ ╚═══════════════════════════════════════════════════════╝ ``` ### The Subtractive Filter Four detection layers, all pattern-based: | Layer | Catches | Method | | ------------------ | ------------------------------------ | -------------------------------- | | **Contradiction** | "X is always true. X is never true." | Antonym pairs, negation patterns | | **Circular Logic** | A proves B proves A | Reference chain analysis | | **Non-Sequitur** | "Therefore X" without premises | Causal connective analysis | | **Depth Check** | Claims without any reasoning | Assertion density scoring | --- ## 📦 Repository Structure ``` TRIGNUM-300M-TCHIP/ ├── src/ │ └── trignum_core/ # Core Python library │ ├── pyramid.py # Trignum Pyramid (3 magnetic faces) │ ├── tchip.py # T-CHIP (glow states) │ ├── subtractive_filter.py # ★ The Subtractive Filter │ ├── human_pulse.py # Human sovereignty layer │ └── magnetic_trillage.py # Data separation ├── tests/ # 34 unit tests (all passing) ├── benchmarks/ │ ├── hallucination_benchmark.py # Curated structural test │ ├── full_halueval_benchmark.py # Full 58K HaluEval test │ ├── results.json # Structural benchmark results │ └── full_halueval_results.json # Full HaluEval results ├── demo/ │ └── index.html # Three.js 3D interactive demo ├── paper/ │ └── TRIGNUM_300M_Position_Paper.md # Position paper ├── docs/ │ └── theory/ # 6 foundational theory documents ├── T-CHIP CLEARED FOR TAKEOFF.md # The pitch └── ROADMAP.md # 2-quarter development plan ``` --- ## 🚀 Quick Start ```bash # Clone git clone https://github.com/trace-on-lab/trignum-300m.git cd trignum-300m # Install pip install -r requirements.txt pip install -e . # Run the structural benchmark python benchmarks/hallucination_benchmark.py # Run the full HaluEval benchmark (downloads ~13MB of data) python benchmarks/full_halueval_benchmark.py # Run tests pytest tests/ -v ``` --- ## 🌐 Prior Art: Nobody Is Doing This We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference: | System | Requires Model | Validates Reasoning | | ---------------------------- | :-------------: | :-----------------: | | VerifyLLM (2025) | ✅ Yes | Partially | | ContraGen | ✅ Yes | Partially | | Process Supervision (OpenAI) | ✅ Yes | Yes | | Guardrails AI | ✅ Configurable | No (content) | | **Subtractive Filter** | **❌ No** | **✅ Yes** | > **Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.** Read the full analysis in our [position paper](paper/TRIGNUM_300M_Position_Paper.md). --- ## ⚛️ Quantum Integration: TQPE [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18751914.svg)](https://doi.org/10.5281/zenodo.18751914) TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for **Trignumental Quantum Phase Estimation (TQPE)**. In our groundbreaking case study estimating the ground state energy of the **H₂ molecule**, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit _before execution_. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of **82.8%** on the final estimate (-1.1384 Ha). Read the full `BUILDING THE BRIDGE` paper on Trignumentality and TQPE in the foundational [Trignumentality](https://github.com/Codfski/trignumentality) repository. --- ## 📚 Documentation | Document | Description | | ---------------------------------------------------------------- | ----------------------------------- | | [Core Postulate](docs/theory/01_core_postulate.md) | The fundamental axioms of Trignum | | [Three Faces](docs/theory/02_three_faces.md) | α (Logic), β (Illogic), γ (Context) | | [Magnetic Trillage](docs/theory/03_magnetic_trillage.md) | Data separation mechanism | | [T-CHIP Spec](docs/theory/04_tchip_spec.md) | The Tensor Character in detail | | [Cold State Hardware](docs/theory/05_cold_state_hardware.md) | Hardware implications | | [Hallucination Paradox](docs/theory/06_hallucination_paradox.md) | Reframing the "Big Monster" | | [Position Paper](paper/TRIGNUM_300M_Position_Paper.md) | Full academic paper with benchmarks | | [Roadmap](ROADMAP.md) | 2-quarter development plan | --- ## 💎 The Golden Gems | Gem | Wisdom | | ----- | --------------------------------------- | | GEM 1 | "The Human Pulse is the Master Clock" | | GEM 2 | "The Illogic is the Compass" | | GEM 3 | "Magnetic Trillage Over Brute Force" | | GEM 4 | "The Hallucination is the Raw Material" | | GEM 5 | "T-CHIP is the Mirror" | --- ## 🤝 Contributing See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. --- ## 📄 License MIT License — see [LICENSE](LICENSE). --- ## 📞 Contact **TRACE ON LAB** 📧 traceonlab@proton.me --- ## 🛡️ The Call > _"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."_ ``` ╔═══════════════════════════════════════════════════════╗ ║ 🧲 TRACE ON LAB — TRIGNUM-300M — v.300M ║ ║ ║ ║ The Pre-Flight Check for Autonomous AI. ║ ║ Zero models. Zero API calls. 146,866 samples/second. ║ ║ ║ ║ 🔵 T-CHIP: CLEARED FOR TAKEOFF. ║ ╚═══════════════════════════════════════════════════════╝ ``` ⭐ **Star this repo if you believe AI should check its logic before it acts.**