TRIGNUM-300M / README.md
Codfskitraceon's picture
Update README.md
3f30095 verified
---
license: mit
language:
- en
tags:
- zero-shot
- natural-language-inference
- self-reflection
- logic
- reasoning
- evaluation
- trignum
- trignumentality
---
<div align="center">
# 🧲 TRIGNUM-300M
### The Pre-Flight Check for Autonomous AI
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![Benchmarked](https://img.shields.io/badge/HaluEval-58%2C293_samples-green.svg)](#-benchmark-results)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18672142.svg)](https://doi.org/10.5281/zenodo.18672142)
> **"You wouldn't let a plane take off without a pre-flight check.**
> **Why are we letting AI agents act without one?"**
<img src="assets/roadmap_architecture.jpg" width="800" alt="TRIGNUM-300M Architecture Flowchart" />
</div>
---
<div align="center">
<!--
TODO: Add your demo GIF here!
1. Record demo/index.html with ScreenToGif
2. Save as assets/trignum_demo.gif
3. Uncomment line below:
-->
<!-- <img src="assets/trignum_demo.gif" width="800" alt="TRIGNUM-300M Demo" /> -->
</div>
## What Is This?
TRIGNUM-300M is a **zero-model reasoning integrity validator** for LLM outputs. It catches structural logic failures β€” contradictions, circular reasoning, non-sequiturs β€” before an AI agent acts on them.
```python
from trignum_core.subtractive_filter import SubtractiveFilter
sf = SubtractiveFilter()
result = sf.apply(agent_output)
if result.illogics_found:
agent.halt(reason=result.illogics_found)
# T-CHIP glows RED πŸ”΄ β†’ Human review required
else:
agent.execute()
# T-CHIP glows BLUE πŸ”΅ β†’ Cleared for takeoff
```
**No LLM. No API. No training data. ~300 lines of Python. <1ms.**
---
## πŸ”¬ Benchmark Results
We expanded our evaluation to **58,000+ real LLM outputs** including a new **517-sample curated dataset** for structural reasoning. Honest results:
| Benchmark | Samples | Precision | Recall | F1 | Speed |
| ---------------------------- | ------- | --------- | ------ | --------- | ----- |
| **Structural illogic (curated)** | **517** | **100%** | **98.9%** | **99.5%** | **<1ms** |
| HaluEval (full dataset) | 58,293 | 60% | 2.1% | 4.0% | 706ms |
### What this means:
- **99.5% F1 on structural reasoning failures** β€” contradictions, circular logic, unsupported conclusions
- **4.0% F1 on factual hallucinations** β€” we don't catch wrong facts
**That's the point.** There are 100 tools for fact-checking. There are **zero tools for reasoning-checking.** Until now.
### Per-Task Breakdown (HaluEval)
| Task | n | Precision | Recall | F1 |
| ------------- | ------ | --------- | ------ | ----- |
| QA | 18,316 | 83.3% | 0.25% | 0.50% |
| Dialogue | 19,977 | 60.1% | 4.38% | 8.16% |
| Summarization | 20,000 | 57.4% | 1.60% | 3.11% |
**Throughput: 146,866 samples/second** β€” orders of magnitude faster than LLM-based validation.
---
## ✈️ The Pre-Flight Check Analogy
A pre-flight checklist doesn't verify that London exists. It verifies that:
- βœ… Instruments don't **contradict** each other
- βœ… There are no **circular faults** (sensor A confirms B confirms A)
- βœ… The flight computer draws **conclusions from actual data**
- βœ… Systems are **logically consistent**
The Subtractive Filter does the same for AI reasoning:
```
LLM Output β†’ Subtractive Filter β†’ [PASS] πŸ”΅ β†’ Agent Executes
β†’ [FAIL] πŸ”΄ β†’ Agent Halts β†’ Human Review
```
---
## πŸ€– The Missing "Agentic Validator"
In the context of the recent shift towards **Agentic Reasoning**, autonomous LLMs are moving from static prompts to dynamic _thought-action_ loops involving planning, tool-use, and multi-agent collaboration.
Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. **But there has been no validator for pure logic.** If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades.
TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms **Validator-Driven Feedback** gate. It halts execution if the agent's internal thought (`zt`) contains a structural illogic, providing an immediate failure signal (`rt = 0`) _before_ the agent commits to an irreversible external action (`at`).
---
## πŸ”Ί Core Architecture
### The Trignum Pyramid
Three faces acting as magnetic poles for data separation:
| Face | Role | What It Does |
| --------------- | --------------- | ----------------------------------------------------- |
| **Ξ± (Logic)** | Truth detection | Identifies structurally sound reasoning |
| **Ξ² (Illogic)** | Error detection | Catches contradictions, circular logic, non-sequiturs |
| **Ξ³ (Context)** | Human grounding | Anchors output to human intent |
### T-CHIP: The Tensor Character
```
╔═══════════════════════════════════════════════════════╗
β•‘ T-CHIP [v.300M] β•‘
β•‘ β•‘
β•‘ πŸ”΅ Blue = Logic Stable (Cleared for Takeoff) β•‘
β•‘ πŸ”΄ Red = Illogic Detected (THE FREEZE) β•‘
β•‘ 🟑 Gold = Human Pulse Locked (Sovereign Override) β•‘
β•‘ β•‘
β•‘ Response time: <1ms | False alarms: 0% (structural) β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
```
### The Subtractive Filter
Four detection layers, all pattern-based:
| Layer | Catches | Method |
| ------------------ | ------------------------------------ | -------------------------------- |
| **Contradiction** | "X is always true. X is never true." | Antonym pairs, negation patterns |
| **Circular Logic** | A proves B proves A | Reference chain analysis |
| **Non-Sequitur** | "Therefore X" without premises | Causal connective analysis |
| **Depth Check** | Claims without any reasoning | Assertion density scoring |
---
## πŸ“¦ Repository Structure
```
TRIGNUM-300M-TCHIP/
β”œβ”€β”€ src/
β”‚ └── trignum_core/ # Core Python library
β”‚ β”œβ”€β”€ pyramid.py # Trignum Pyramid (3 magnetic faces)
β”‚ β”œβ”€β”€ tchip.py # T-CHIP (glow states)
β”‚ β”œβ”€β”€ subtractive_filter.py # β˜… The Subtractive Filter
β”‚ β”œβ”€β”€ human_pulse.py # Human sovereignty layer
β”‚ └── magnetic_trillage.py # Data separation
β”œβ”€β”€ tests/ # 34 unit tests (all passing)
β”œβ”€β”€ benchmarks/
β”‚ β”œβ”€β”€ hallucination_benchmark.py # Curated structural test
β”‚ β”œβ”€β”€ full_halueval_benchmark.py # Full 58K HaluEval test
β”‚ β”œβ”€β”€ results.json # Structural benchmark results
β”‚ └── full_halueval_results.json # Full HaluEval results
β”œβ”€β”€ demo/
β”‚ └── index.html # Three.js 3D interactive demo
β”œβ”€β”€ paper/
β”‚ └── TRIGNUM_300M_Position_Paper.md # Position paper
β”œβ”€β”€ docs/
β”‚ └── theory/ # 6 foundational theory documents
β”œβ”€β”€ T-CHIP CLEARED FOR TAKEOFF.md # The pitch
└── ROADMAP.md # 2-quarter development plan
```
---
## πŸš€ Quick Start
```bash
# Clone
git clone https://github.com/trace-on-lab/trignum-300m.git
cd trignum-300m
# Install
pip install -r requirements.txt
pip install -e .
# Run the structural benchmark
python benchmarks/hallucination_benchmark.py
# Run the full HaluEval benchmark (downloads ~13MB of data)
python benchmarks/full_halueval_benchmark.py
# Run tests
pytest tests/ -v
```
---
## 🌐 Prior Art: Nobody Is Doing This
We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference:
| System | Requires Model | Validates Reasoning |
| ---------------------------- | :-------------: | :-----------------: |
| VerifyLLM (2025) | βœ… Yes | Partially |
| ContraGen | βœ… Yes | Partially |
| Process Supervision (OpenAI) | βœ… Yes | Yes |
| Guardrails AI | βœ… Configurable | No (content) |
| **Subtractive Filter** | **❌ No** | **βœ… Yes** |
> **Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.**
Read the full analysis in our [position paper](paper/TRIGNUM_300M_Position_Paper.md).
---
## βš›οΈ Quantum Integration: TQPE
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18751914.svg)](https://doi.org/10.5281/zenodo.18751914)
TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for **Trignumental Quantum Phase Estimation (TQPE)**.
In our groundbreaking case study estimating the ground state energy of the **Hβ‚‚ molecule**, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit _before execution_. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of **82.8%** on the final estimate (-1.1384 Ha).
Read the full `BUILDING THE BRIDGE` paper on Trignumentality and TQPE in the foundational [Trignumentality](https://github.com/Codfski/trignumentality) repository.
---
## πŸ“š Documentation
| Document | Description |
| ---------------------------------------------------------------- | ----------------------------------- |
| [Core Postulate](docs/theory/01_core_postulate.md) | The fundamental axioms of Trignum |
| [Three Faces](docs/theory/02_three_faces.md) | Ξ± (Logic), Ξ² (Illogic), Ξ³ (Context) |
| [Magnetic Trillage](docs/theory/03_magnetic_trillage.md) | Data separation mechanism |
| [T-CHIP Spec](docs/theory/04_tchip_spec.md) | The Tensor Character in detail |
| [Cold State Hardware](docs/theory/05_cold_state_hardware.md) | Hardware implications |
| [Hallucination Paradox](docs/theory/06_hallucination_paradox.md) | Reframing the "Big Monster" |
| [Position Paper](paper/TRIGNUM_300M_Position_Paper.md) | Full academic paper with benchmarks |
| [Roadmap](ROADMAP.md) | 2-quarter development plan |
---
## πŸ’Ž The Golden Gems
| Gem | Wisdom |
| ----- | --------------------------------------- |
| GEM 1 | "The Human Pulse is the Master Clock" |
| GEM 2 | "The Illogic is the Compass" |
| GEM 3 | "Magnetic Trillage Over Brute Force" |
| GEM 4 | "The Hallucination is the Raw Material" |
| GEM 5 | "T-CHIP is the Mirror" |
---
## 🀝 Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
---
## πŸ“„ License
MIT License β€” see [LICENSE](LICENSE).
---
## πŸ“ž Contact
**TRACE ON LAB**
πŸ“§ traceonlab@proton.me
---
## πŸ›‘οΈ The Call
> _"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."_
```
╔═══════════════════════════════════════════════════════╗
β•‘ 🧲 TRACE ON LAB β€” TRIGNUM-300M β€” v.300M β•‘
β•‘ β•‘
β•‘ The Pre-Flight Check for Autonomous AI. β•‘
β•‘ Zero models. Zero API calls. 146,866 samples/second. β•‘
β•‘ β•‘
β•‘ πŸ”΅ T-CHIP: CLEARED FOR TAKEOFF. β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
```
⭐ **Star this repo if you believe AI should check its logic before it acts.**