| | --- |
| | license: mit |
| | language: |
| | - en |
| | tags: |
| | - zero-shot |
| | - natural-language-inference |
| | - self-reflection |
| | - logic |
| | - reasoning |
| | - evaluation |
| | - trignum |
| | - trignumentality |
| | --- |
| | <div align="center"> |
| |
|
| | # π§² TRIGNUM-300M |
| |
|
| | ### The Pre-Flight Check for Autonomous AI |
| |
|
| | [](https://opensource.org/licenses/MIT) |
| | [](https://www.python.org/downloads/) |
| | [](#-benchmark-results) |
| | [](https://doi.org/10.5281/zenodo.18672142) |
| |
|
| | > **"You wouldn't let a plane take off without a pre-flight check.** |
| | > **Why are we letting AI agents act without one?"** |
| |
|
| | <img src="assets/roadmap_architecture.jpg" width="800" alt="TRIGNUM-300M Architecture Flowchart" /> |
| | </div> |
| |
|
| | --- |
| |
|
| | <div align="center"> |
| | <!-- |
| | TODO: Add your demo GIF here! |
| | 1. Record demo/index.html with ScreenToGif |
| | 2. Save as assets/trignum_demo.gif |
| | 3. Uncomment line below: |
| | --> |
| | <!-- <img src="assets/trignum_demo.gif" width="800" alt="TRIGNUM-300M Demo" /> --> |
| | </div> |
| | |
| | ## What Is This? |
| |
|
| | TRIGNUM-300M is a **zero-model reasoning integrity validator** for LLM outputs. It catches structural logic failures β contradictions, circular reasoning, non-sequiturs β before an AI agent acts on them. |
| |
|
| | ```python |
| | from trignum_core.subtractive_filter import SubtractiveFilter |
| | |
| | sf = SubtractiveFilter() |
| | result = sf.apply(agent_output) |
| | |
| | if result.illogics_found: |
| | agent.halt(reason=result.illogics_found) |
| | # T-CHIP glows RED π΄ β Human review required |
| | else: |
| | agent.execute() |
| | # T-CHIP glows BLUE π΅ β Cleared for takeoff |
| | ``` |
| |
|
| | **No LLM. No API. No training data. ~300 lines of Python. <1ms.** |
| |
|
| | --- |
| |
|
| | ## π¬ Benchmark Results |
| |
|
| | We expanded our evaluation to **58,000+ real LLM outputs** including a new **517-sample curated dataset** for structural reasoning. Honest results: |
| |
|
| | | Benchmark | Samples | Precision | Recall | F1 | Speed | |
| | | ---------------------------- | ------- | --------- | ------ | --------- | ----- | |
| | | **Structural illogic (curated)** | **517** | **100%** | **98.9%** | **99.5%** | **<1ms** | |
| | | HaluEval (full dataset) | 58,293 | 60% | 2.1% | 4.0% | 706ms | |
| |
|
| | ### What this means: |
| |
|
| | - **99.5% F1 on structural reasoning failures** β contradictions, circular logic, unsupported conclusions |
| | - **4.0% F1 on factual hallucinations** β we don't catch wrong facts |
| |
|
| | **That's the point.** There are 100 tools for fact-checking. There are **zero tools for reasoning-checking.** Until now. |
| |
|
| | ### Per-Task Breakdown (HaluEval) |
| |
|
| | | Task | n | Precision | Recall | F1 | |
| | | ------------- | ------ | --------- | ------ | ----- | |
| | | QA | 18,316 | 83.3% | 0.25% | 0.50% | |
| | | Dialogue | 19,977 | 60.1% | 4.38% | 8.16% | |
| | | Summarization | 20,000 | 57.4% | 1.60% | 3.11% | |
| |
|
| | **Throughput: 146,866 samples/second** β orders of magnitude faster than LLM-based validation. |
| |
|
| | --- |
| |
|
| | ## βοΈ The Pre-Flight Check Analogy |
| |
|
| | A pre-flight checklist doesn't verify that London exists. It verifies that: |
| |
|
| | - β
Instruments don't **contradict** each other |
| | - β
There are no **circular faults** (sensor A confirms B confirms A) |
| | - β
The flight computer draws **conclusions from actual data** |
| | - β
Systems are **logically consistent** |
| |
|
| | The Subtractive Filter does the same for AI reasoning: |
| |
|
| | ``` |
| | LLM Output β Subtractive Filter β [PASS] π΅ β Agent Executes |
| | β [FAIL] π΄ β Agent Halts β Human Review |
| | ``` |
| |
|
| | --- |
| |
|
| | ## π€ The Missing "Agentic Validator" |
| |
|
| | In the context of the recent shift towards **Agentic Reasoning**, autonomous LLMs are moving from static prompts to dynamic _thought-action_ loops involving planning, tool-use, and multi-agent collaboration. |
| |
|
| | Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. **But there has been no validator for pure logic.** If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades. |
| |
|
| | TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms **Validator-Driven Feedback** gate. It halts execution if the agent's internal thought (`zt`) contains a structural illogic, providing an immediate failure signal (`rt = 0`) _before_ the agent commits to an irreversible external action (`at`). |
| |
|
| | --- |
| |
|
| | ## πΊ Core Architecture |
| |
|
| | ### The Trignum Pyramid |
| |
|
| | Three faces acting as magnetic poles for data separation: |
| |
|
| | | Face | Role | What It Does | |
| | | --------------- | --------------- | ----------------------------------------------------- | |
| | | **Ξ± (Logic)** | Truth detection | Identifies structurally sound reasoning | |
| | | **Ξ² (Illogic)** | Error detection | Catches contradictions, circular logic, non-sequiturs | |
| | | **Ξ³ (Context)** | Human grounding | Anchors output to human intent | |
| |
|
| | ### T-CHIP: The Tensor Character |
| |
|
| | ``` |
| | βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| | β T-CHIP [v.300M] β |
| | β β |
| | β π΅ Blue = Logic Stable (Cleared for Takeoff) β |
| | β π΄ Red = Illogic Detected (THE FREEZE) β |
| | β π‘ Gold = Human Pulse Locked (Sovereign Override) β |
| | β β |
| | β Response time: <1ms | False alarms: 0% (structural) β |
| | βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| | ``` |
| |
|
| | ### The Subtractive Filter |
| |
|
| | Four detection layers, all pattern-based: |
| |
|
| | | Layer | Catches | Method | |
| | | ------------------ | ------------------------------------ | -------------------------------- | |
| | | **Contradiction** | "X is always true. X is never true." | Antonym pairs, negation patterns | |
| | | **Circular Logic** | A proves B proves A | Reference chain analysis | |
| | | **Non-Sequitur** | "Therefore X" without premises | Causal connective analysis | |
| | | **Depth Check** | Claims without any reasoning | Assertion density scoring | |
| |
|
| | --- |
| |
|
| | ## π¦ Repository Structure |
| |
|
| | ``` |
| | TRIGNUM-300M-TCHIP/ |
| | βββ src/ |
| | β βββ trignum_core/ # Core Python library |
| | β βββ pyramid.py # Trignum Pyramid (3 magnetic faces) |
| | β βββ tchip.py # T-CHIP (glow states) |
| | β βββ subtractive_filter.py # β
The Subtractive Filter |
| | β βββ human_pulse.py # Human sovereignty layer |
| | β βββ magnetic_trillage.py # Data separation |
| | βββ tests/ # 34 unit tests (all passing) |
| | βββ benchmarks/ |
| | β βββ hallucination_benchmark.py # Curated structural test |
| | β βββ full_halueval_benchmark.py # Full 58K HaluEval test |
| | β βββ results.json # Structural benchmark results |
| | β βββ full_halueval_results.json # Full HaluEval results |
| | βββ demo/ |
| | β βββ index.html # Three.js 3D interactive demo |
| | βββ paper/ |
| | β βββ TRIGNUM_300M_Position_Paper.md # Position paper |
| | βββ docs/ |
| | β βββ theory/ # 6 foundational theory documents |
| | βββ T-CHIP CLEARED FOR TAKEOFF.md # The pitch |
| | βββ ROADMAP.md # 2-quarter development plan |
| | ``` |
| |
|
| | --- |
| |
|
| | ## π Quick Start |
| |
|
| | ```bash |
| | # Clone |
| | git clone https://github.com/trace-on-lab/trignum-300m.git |
| | cd trignum-300m |
| | |
| | # Install |
| | pip install -r requirements.txt |
| | pip install -e . |
| | |
| | # Run the structural benchmark |
| | python benchmarks/hallucination_benchmark.py |
| | |
| | # Run the full HaluEval benchmark (downloads ~13MB of data) |
| | python benchmarks/full_halueval_benchmark.py |
| | |
| | # Run tests |
| | pytest tests/ -v |
| | ``` |
| |
|
| | --- |
| |
|
| | ## π Prior Art: Nobody Is Doing This |
| |
|
| | We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference: |
| |
|
| | | System | Requires Model | Validates Reasoning | |
| | | ---------------------------- | :-------------: | :-----------------: | |
| | | VerifyLLM (2025) | β
Yes | Partially | |
| | | ContraGen | β
Yes | Partially | |
| | | Process Supervision (OpenAI) | β
Yes | Yes | |
| | | Guardrails AI | β
Configurable | No (content) | |
| | | **Subtractive Filter** | **β No** | **β
Yes** | |
| |
|
| | > **Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.** |
| |
|
| | Read the full analysis in our [position paper](paper/TRIGNUM_300M_Position_Paper.md). |
| |
|
| | --- |
| |
|
| | ## βοΈ Quantum Integration: TQPE |
| |
|
| | [](https://doi.org/10.5281/zenodo.18751914) |
| |
|
| | TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for **Trignumental Quantum Phase Estimation (TQPE)**. |
| |
|
| | In our groundbreaking case study estimating the ground state energy of the **Hβ molecule**, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit _before execution_. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of **82.8%** on the final estimate (-1.1384 Ha). |
| |
|
| | Read the full `BUILDING THE BRIDGE` paper on Trignumentality and TQPE in the foundational [Trignumentality](https://github.com/Codfski/trignumentality) repository. |
| |
|
| | --- |
| |
|
| | ## π Documentation |
| |
|
| | | Document | Description | |
| | | ---------------------------------------------------------------- | ----------------------------------- | |
| | | [Core Postulate](docs/theory/01_core_postulate.md) | The fundamental axioms of Trignum | |
| | | [Three Faces](docs/theory/02_three_faces.md) | Ξ± (Logic), Ξ² (Illogic), Ξ³ (Context) | |
| | | [Magnetic Trillage](docs/theory/03_magnetic_trillage.md) | Data separation mechanism | |
| | | [T-CHIP Spec](docs/theory/04_tchip_spec.md) | The Tensor Character in detail | |
| | | [Cold State Hardware](docs/theory/05_cold_state_hardware.md) | Hardware implications | |
| | | [Hallucination Paradox](docs/theory/06_hallucination_paradox.md) | Reframing the "Big Monster" | |
| | | [Position Paper](paper/TRIGNUM_300M_Position_Paper.md) | Full academic paper with benchmarks | |
| | | [Roadmap](ROADMAP.md) | 2-quarter development plan | |
| |
|
| | --- |
| |
|
| | ## π The Golden Gems |
| |
|
| | | Gem | Wisdom | |
| | | ----- | --------------------------------------- | |
| | | GEM 1 | "The Human Pulse is the Master Clock" | |
| | | GEM 2 | "The Illogic is the Compass" | |
| | | GEM 3 | "Magnetic Trillage Over Brute Force" | |
| | | GEM 4 | "The Hallucination is the Raw Material" | |
| | | GEM 5 | "T-CHIP is the Mirror" | |
| |
|
| | --- |
| |
|
| | ## π€ Contributing |
| |
|
| | See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. |
| |
|
| | --- |
| |
|
| | ## π License |
| |
|
| | MIT License β see [LICENSE](LICENSE). |
| |
|
| | --- |
| |
|
| | ## π Contact |
| |
|
| | **TRACE ON LAB** |
| | π§ traceonlab@proton.me |
| |
|
| | --- |
| |
|
| | ## π‘οΈ The Call |
| |
|
| | > _"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."_ |
| |
|
| | ``` |
| | βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| | β π§² TRACE ON LAB β TRIGNUM-300M β v.300M β |
| | β β |
| | β The Pre-Flight Check for Autonomous AI. β |
| | β Zero models. Zero API calls. 146,866 samples/second. β |
| | β β |
| | β π΅ T-CHIP: CLEARED FOR TAKEOFF. β |
| | βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| | ``` |
| |
|
| | β **Star this repo if you believe AI should check its logic before it acts.** |
| |
|