Update README.md

3f30095 verified 5 days ago

12.7 kB

	---
	license: mit
	language:
	- en
	tags:
	- zero-shot
	- natural-language-inference
	- self-reflection
	- logic
	- reasoning
	- evaluation
	- trignum
	- trignumentality
	---
	<div align="center">

	# 🧲 TRIGNUM-300M

	### The Pre-Flight Check for Autonomous AI

	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
	[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
	[![Benchmarked](https://img.shields.io/badge/HaluEval-58%2C293_samples-green.svg)](#-benchmark-results)
	[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18672142.svg)](https://doi.org/10.5281/zenodo.18672142)

	> "You wouldn't let a plane take off without a pre-flight check.
	> Why are we letting AI agents act without one?"

	<img src="assets/roadmap_architecture.jpg" width="800" alt="TRIGNUM-300M Architecture Flowchart" />
	</div>

	---

	<div align="center">
	<!--
	TODO: Add your demo GIF here!
	1. Record demo/index.html with ScreenToGif
	2. Save as assets/trignum_demo.gif
	3. Uncomment line below:
	-->
	<!-- <img src="assets/trignum_demo.gif" width="800" alt="TRIGNUM-300M Demo" /> -->
	</div>

	## What Is This?

	TRIGNUM-300M is a zero-model reasoning integrity validator for LLM outputs. It catches structural logic failures — contradictions, circular reasoning, non-sequiturs — before an AI agent acts on them.

	```python
	from trignum_core.subtractive_filter import SubtractiveFilter

	sf = SubtractiveFilter()
	result = sf.apply(agent_output)

	if result.illogics_found:
	agent.halt(reason=result.illogics_found)
	# T-CHIP glows RED 🔴 → Human review required
	else:
	agent.execute()
	# T-CHIP glows BLUE 🔵 → Cleared for takeoff
	```

	No LLM. No API. No training data. ~300 lines of Python. <1ms.

	---

	## 🔬 Benchmark Results

	We expanded our evaluation to 58,000+ real LLM outputs including a new 517-sample curated dataset for structural reasoning. Honest results:

	\| Benchmark \| Samples \| Precision \| Recall \| F1 \| Speed \|
	\| ---------------------------- \| ------- \| --------- \| ------ \| --------- \| ----- \|
	\| Structural illogic (curated) \| 517 \| 100% \| 98.9% \| 99.5% \| <1ms \|
	\| HaluEval (full dataset) \| 58,293 \| 60% \| 2.1% \| 4.0% \| 706ms \|

	### What this means:

	- 99.5% F1 on structural reasoning failures — contradictions, circular logic, unsupported conclusions
	- 4.0% F1 on factual hallucinations — we don't catch wrong facts

	That's the point. There are 100 tools for fact-checking. There are zero tools for reasoning-checking. Until now.

	### Per-Task Breakdown (HaluEval)

	\| Task \| n \| Precision \| Recall \| F1 \|
	\| ------------- \| ------ \| --------- \| ------ \| ----- \|
	\| QA \| 18,316 \| 83.3% \| 0.25% \| 0.50% \|
	\| Dialogue \| 19,977 \| 60.1% \| 4.38% \| 8.16% \|
	\| Summarization \| 20,000 \| 57.4% \| 1.60% \| 3.11% \|

	Throughput: 146,866 samples/second — orders of magnitude faster than LLM-based validation.

	---

	## ✈️ The Pre-Flight Check Analogy

	A pre-flight checklist doesn't verify that London exists. It verifies that:

	- ✅ Instruments don't contradict each other
	- ✅ There are no circular faults (sensor A confirms B confirms A)
	- ✅ The flight computer draws conclusions from actual data
	- ✅ Systems are logically consistent

	The Subtractive Filter does the same for AI reasoning:

	```
	LLM Output → Subtractive Filter → [PASS] 🔵 → Agent Executes
	→ [FAIL] 🔴 → Agent Halts → Human Review
	```

	---

	## 🤖 The Missing "Agentic Validator"

	In the context of the recent shift towards Agentic Reasoning, autonomous LLMs are moving from static prompts to dynamic _thought-action_ loops involving planning, tool-use, and multi-agent collaboration.

	Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. But there has been no validator for pure logic. If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades.

	TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms Validator-Driven Feedback gate. It halts execution if the agent's internal thought (`zt`) contains a structural illogic, providing an immediate failure signal (`rt = 0`) _before_ the agent commits to an irreversible external action (`at`).

	---

	## 🔺 Core Architecture

	### The Trignum Pyramid

	Three faces acting as magnetic poles for data separation:

	\| Face \| Role \| What It Does \|
	\| --------------- \| --------------- \| ----------------------------------------------------- \|
	\| α (Logic) \| Truth detection \| Identifies structurally sound reasoning \|
	\| β (Illogic) \| Error detection \| Catches contradictions, circular logic, non-sequiturs \|
	\| γ (Context) \| Human grounding \| Anchors output to human intent \|

	### T-CHIP: The Tensor Character

	```
	╔═══════════════════════════════════════════════════════╗
	║ T-CHIP [v.300M] ║
	║ ║
	║ 🔵 Blue = Logic Stable (Cleared for Takeoff) ║
	║ 🔴 Red = Illogic Detected (THE FREEZE) ║
	║ 🟡 Gold = Human Pulse Locked (Sovereign Override) ║
	║ ║
	║ Response time: <1ms \| False alarms: 0% (structural) ║
	╚═══════════════════════════════════════════════════════╝
	```

	### The Subtractive Filter

	Four detection layers, all pattern-based:

	\| Layer \| Catches \| Method \|
	\| ------------------ \| ------------------------------------ \| -------------------------------- \|
	\| Contradiction \| "X is always true. X is never true." \| Antonym pairs, negation patterns \|
	\| Circular Logic \| A proves B proves A \| Reference chain analysis \|
	\| Non-Sequitur \| "Therefore X" without premises \| Causal connective analysis \|
	\| Depth Check \| Claims without any reasoning \| Assertion density scoring \|

	---

	## 📦 Repository Structure

	```
	TRIGNUM-300M-TCHIP/
	├── src/
	│ └── trignum_core/ # Core Python library
	│ ├── pyramid.py # Trignum Pyramid (3 magnetic faces)
	│ ├── tchip.py # T-CHIP (glow states)
	│ ├── subtractive_filter.py # ★ The Subtractive Filter
	│ ├── human_pulse.py # Human sovereignty layer
	│ └── magnetic_trillage.py # Data separation
	├── tests/ # 34 unit tests (all passing)
	├── benchmarks/
	│ ├── hallucination_benchmark.py # Curated structural test
	│ ├── full_halueval_benchmark.py # Full 58K HaluEval test
	│ ├── results.json # Structural benchmark results
	│ └── full_halueval_results.json # Full HaluEval results
	├── demo/
	│ └── index.html # Three.js 3D interactive demo
	├── paper/
	│ └── TRIGNUM_300M_Position_Paper.md # Position paper
	├── docs/
	│ └── theory/ # 6 foundational theory documents
	├── T-CHIP CLEARED FOR TAKEOFF.md # The pitch
	└── ROADMAP.md # 2-quarter development plan
	```

	---

	## 🚀 Quick Start

	```bash
	# Clone
	git clone https://github.com/trace-on-lab/trignum-300m.git
	cd trignum-300m

	# Install
	pip install -r requirements.txt
	pip install -e .

	# Run the structural benchmark
	python benchmarks/hallucination_benchmark.py

	# Run the full HaluEval benchmark (downloads ~13MB of data)
	python benchmarks/full_halueval_benchmark.py

	# Run tests
	pytest tests/ -v
	```

	---

	## 🌐 Prior Art: Nobody Is Doing This

	We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference:

	\| System \| Requires Model \| Validates Reasoning \|
	\| ---------------------------- \| :-------------: \| :-----------------: \|
	\| VerifyLLM (2025) \| ✅ Yes \| Partially \|
	\| ContraGen \| ✅ Yes \| Partially \|
	\| Process Supervision (OpenAI) \| ✅ Yes \| Yes \|
	\| Guardrails AI \| ✅ Configurable \| No (content) \|
	\| Subtractive Filter \| ❌ No \| ✅ Yes \|

	> Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.

	Read the full analysis in our [position paper](paper/TRIGNUM_300M_Position_Paper.md).

	---

	## ⚛️ Quantum Integration: TQPE

	[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18751914.svg)](https://doi.org/10.5281/zenodo.18751914)

	TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for Trignumental Quantum Phase Estimation (TQPE).

	In our groundbreaking case study estimating the ground state energy of the H₂ molecule, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit _before execution_. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of 82.8% on the final estimate (-1.1384 Ha).

	Read the full `BUILDING THE BRIDGE` paper on Trignumentality and TQPE in the foundational [Trignumentality](https://github.com/Codfski/trignumentality) repository.

	---

	## 📚 Documentation

	\| Document \| Description \|
	\| ---------------------------------------------------------------- \| ----------------------------------- \|
	\| [Core Postulate](docs/theory/01_core_postulate.md) \| The fundamental axioms of Trignum \|
	\| [Three Faces](docs/theory/02_three_faces.md) \| α (Logic), β (Illogic), γ (Context) \|
	\| [Magnetic Trillage](docs/theory/03_magnetic_trillage.md) \| Data separation mechanism \|
	\| [T-CHIP Spec](docs/theory/04_tchip_spec.md) \| The Tensor Character in detail \|
	\| [Cold State Hardware](docs/theory/05_cold_state_hardware.md) \| Hardware implications \|
	\| [Hallucination Paradox](docs/theory/06_hallucination_paradox.md) \| Reframing the "Big Monster" \|
	\| [Position Paper](paper/TRIGNUM_300M_Position_Paper.md) \| Full academic paper with benchmarks \|
	\| [Roadmap](ROADMAP.md) \| 2-quarter development plan \|

	---

	## 💎 The Golden Gems

	\| Gem \| Wisdom \|
	\| ----- \| --------------------------------------- \|
	\| GEM 1 \| "The Human Pulse is the Master Clock" \|
	\| GEM 2 \| "The Illogic is the Compass" \|
	\| GEM 3 \| "Magnetic Trillage Over Brute Force" \|
	\| GEM 4 \| "The Hallucination is the Raw Material" \|
	\| GEM 5 \| "T-CHIP is the Mirror" \|

	---

	## 🤝 Contributing

	See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

	---

	## 📄 License

	MIT License — see [LICENSE](LICENSE).

	---

	## 📞 Contact

	TRACE ON LAB
	📧 traceonlab@proton.me

	---

	## 🛡️ The Call

	> _"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."_

	```
	╔═══════════════════════════════════════════════════════╗
	║ 🧲 TRACE ON LAB — TRIGNUM-300M — v.300M ║
	║ ║
	║ The Pre-Flight Check for Autonomous AI. ║
	║ Zero models. Zero API calls. 146,866 samples/second. ║
	║ ║
	║ 🔵 T-CHIP: CLEARED FOR TAKEOFF. ║
	╚═══════════════════════════════════════════════════════╝
	```

	⭐ Star this repo if you believe AI should check its logic before it acts.