Codfskitraceon
/

TRIGNUM-300M

@@ -1,290 +1,302 @@
-<div align="center">
-# 🧲 TRIGNUM-300M
-### The Pre-Flight Check for Autonomous AI
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
-[![Benchmarked](https://img.shields.io/badge/HaluEval-58%2C293_samples-green.svg)](#-benchmark-results)
-[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18672142.svg)](https://doi.org/10.5281/zenodo.18672142)
-> **"You wouldn't let a plane take off without a pre-flight check.**
-> **Why are we letting AI agents act without one?"**
-<img src="assets/roadmap_architecture.jpg" width="800" alt="TRIGNUM-300M Architecture Flowchart" />
-</div>
----
-<div align="center">
-  <!--
-    TODO: Add your demo GIF here!
-    1. Record demo/index.html with ScreenToGif
-    2. Save as assets/trignum_demo.gif
-    3. Uncomment line below:
-  -->
-  <!-- <img src="assets/trignum_demo.gif" width="800" alt="TRIGNUM-300M Demo" /> -->
-</div>
-## What Is This?
-TRIGNUM-300M is a **zero-model reasoning integrity validator** for LLM outputs. It catches structural logic failures — contradictions, circular reasoning, non-sequiturs — before an AI agent acts on them.
-```python
-from trignum_core.subtractive_filter import SubtractiveFilter
-sf = SubtractiveFilter()
-result = sf.apply(agent_output)
-if result.illogics_found:
-    agent.halt(reason=result.illogics_found)
-    # T-CHIP glows RED 🔴 → Human review required
-else:
-    agent.execute()
-    # T-CHIP glows BLUE 🔵 → Cleared for takeoff
-```
-**No LLM. No API. No training data. ~300 lines of Python. <1ms.**
----
-## 🔬 Benchmark Results
-We expanded our evaluation to **58,000+ real LLM outputs** including a new **517-sample curated dataset** for structural reasoning. Honest results:
-| Benchmark                    | Samples | Precision | Recall | F1        | Speed |
-| ---------------------------- | ------- | --------- | ------ | --------- | ----- |
-| **Structural illogic (curated)** | **517**      | **100%**  | **98.9%**    | **99.5%** | **<1ms**  |
-| HaluEval (full dataset)      | 58,293  | 60%       | 2.1%   | 4.0%      | 706ms |
-### What this means:
-- **99.5% F1 on structural reasoning failures** — contradictions, circular logic, unsupported conclusions
-- **4.0% F1 on factual hallucinations** — we don't catch wrong facts
-**That's the point.** There are 100 tools for fact-checking. There are **zero tools for reasoning-checking.** Until now.
-### Per-Task Breakdown (HaluEval)
-| Task          | n      | Precision | Recall | F1    |
-| ------------- | ------ | --------- | ------ | ----- |
-| QA            | 18,316 | 83.3%     | 0.25%  | 0.50% |
-| Dialogue      | 19,977 | 60.1%     | 4.38%  | 8.16% |
-| Summarization | 20,000 | 57.4%     | 1.60%  | 3.11% |
-**Throughput: 146,866 samples/second** — orders of magnitude faster than LLM-based validation.
----
-## ✈️ The Pre-Flight Check Analogy
-A pre-flight checklist doesn't verify that London exists. It verifies that:
-- ✅ Instruments don't **contradict** each other
-- ✅ There are no **circular faults** (sensor A confirms B confirms A)
-- ✅ The flight computer draws **conclusions from actual data**
-- ✅ Systems are **logically consistent**
-The Subtractive Filter does the same for AI reasoning:
-```
-LLM Output → Subtractive Filter → [PASS] 🔵 → Agent Executes
-                                 → [FAIL] 🔴 → Agent Halts → Human Review
-```
----
-## 🤖 The Missing "Agentic Validator"
-In the context of the recent shift towards **Agentic Reasoning**, autonomous LLMs are moving from static prompts to dynamic _thought-action_ loops involving planning, tool-use, and multi-agent collaboration.
-Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. **But there has been no validator for pure logic.** If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades.
-TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms **Validator-Driven Feedback** gate. It halts execution if the agent's internal thought (`zt`) contains a structural illogic, providing an immediate failure signal (`rt = 0`) _before_ the agent commits to an irreversible external action (`at`).
----
-## 🔺 Core Architecture
-### The Trignum Pyramid
-Three faces acting as magnetic poles for data separation:
-| Face            | Role            | What It Does                                          |
-| --------------- | --------------- | ----------------------------------------------------- |
-| **α (Logic)**   | Truth detection | Identifies structurally sound reasoning               |
-| **β (Illogic)** | Error detection | Catches contradictions, circular logic, non-sequiturs |
-| **γ (Context)** | Human grounding | Anchors output to human intent                        |
-### T-CHIP: The Tensor Character
-```
-╔═══════════════════════════════════════════════════════╗
-║  T-CHIP [v.300M]                                      ║
-║                                                       ║
-║  🔵 Blue  = Logic Stable (Cleared for Takeoff)        ║
-║  🔴 Red   = Illogic Detected (THE FREEZE)             ║
-║  🟡 Gold  = Human Pulse Locked (Sovereign Override)   ║
-║                                                       ║
-║  Response time: <1ms | False alarms: 0% (structural)  ║
-╚═══════════════════════════════════════════════════════╝
-```
-### The Subtractive Filter
-Four detection layers, all pattern-based:
-| Layer              | Catches                              | Method                           |
-| ------------------ | ------------------------------------ | -------------------------------- |
-| **Contradiction**  | "X is always true. X is never true." | Antonym pairs, negation patterns |
-| **Circular Logic** | A proves B proves A                  | Reference chain analysis         |
-| **Non-Sequitur**   | "Therefore X" without premises       | Causal connective analysis       |
-| **Depth Check**    | Claims without any reasoning         | Assertion density scoring        |
----
-## 📦 Repository Structure
-```
-TRIGNUM-300M-TCHIP/
-├── src/
-│   └── trignum_core/              # Core Python library
-│       ├── pyramid.py             # Trignum Pyramid (3 magnetic faces)
-│       ├── tchip.py               # T-CHIP (glow states)
-│       ├── subtractive_filter.py  # ★ The Subtractive Filter
-│       ├── human_pulse.py         # Human sovereignty layer
-│       └── magnetic_trillage.py   # Data separation
-├── tests/                         # 34 unit tests (all passing)
-├── benchmarks/
-│   ├── hallucination_benchmark.py     # Curated structural test
-│   ├── full_halueval_benchmark.py     # Full 58K HaluEval test
-│   ├── results.json                   # Structural benchmark results
-│   └── full_halueval_results.json     # Full HaluEval results
-├── demo/
-│   └── index.html                 # Three.js 3D interactive demo
-├── paper/
-│   └── TRIGNUM_300M_Position_Paper.md  # Position paper
-├── docs/
-│   └── theory/                    # 6 foundational theory documents
-├── T-CHIP CLEARED FOR TAKEOFF.md  # The pitch
-└── ROADMAP.md                     # 2-quarter development plan
-```
----
-## 🚀 Quick Start
-```bash
-# Clone
-git clone https://github.com/trace-on-lab/trignum-300m.git
-cd trignum-300m
-# Install
-pip install -r requirements.txt
-pip install -e .
-# Run the structural benchmark
-python benchmarks/hallucination_benchmark.py
-# Run the full HaluEval benchmark (downloads ~13MB of data)
-python benchmarks/full_halueval_benchmark.py
-# Run tests
-pytest tests/ -v
-```
----
-## 🌐 Prior Art: Nobody Is Doing This
-We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference:
-| System                       | Requires Model  | Validates Reasoning |
-| ---------------------------- | :-------------: | :-----------------: |
-| VerifyLLM (2025)             |     ✅ Yes      |      Partially      |
-| ContraGen                    |     ✅ Yes      |      Partially      |
-| Process Supervision (OpenAI) |     ✅ Yes      |         Yes         |
-| Guardrails AI                | ✅ Configurable |    No (content)     |
-| **Subtractive Filter**       |    **❌ No**    |     **✅ Yes**      |
-> **Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.**
-Read the full analysis in our [position paper](paper/TRIGNUM_300M_Position_Paper.md).
----
-## ⚛️ Quantum Integration: TQPE
-[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18751914.svg)](https://doi.org/10.5281/zenodo.18751914)
-TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for **Trignumental Quantum Phase Estimation (TQPE)**.
-In our groundbreaking case study estimating the ground state energy of the **H₂ molecule**, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit _before execution_. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of **82.8%** on the final estimate (-1.1384 Ha).
-Read the full `BUILDING THE BRIDGE` paper on Trignumentality and TQPE in the foundational [Trignumentality](https://github.com/Codfski/trignumentality) repository.
----
-## 📚 Documentation
-| Document                                                         | Description                         |
-| ---------------------------------------------------------------- | ----------------------------------- |
-| [Core Postulate](docs/theory/01_core_postulate.md)               | The fundamental axioms of Trignum   |
-| [Three Faces](docs/theory/02_three_faces.md)                     | α (Logic), β (Illogic), γ (Context) |
-| [Magnetic Trillage](docs/theory/03_magnetic_trillage.md)         | Data separation mechanism           |
-| [T-CHIP Spec](docs/theory/04_tchip_spec.md)                      | The Tensor Character in detail      |
-| [Cold State Hardware](docs/theory/05_cold_state_hardware.md)     | Hardware implications               |
-| [Hallucination Paradox](docs/theory/06_hallucination_paradox.md) | Reframing the "Big Monster"         |
-| [Position Paper](paper/TRIGNUM_300M_Position_Paper.md)           | Full academic paper with benchmarks |
-| [Roadmap](ROADMAP.md)                                            | 2-quarter development plan          |
----
-## 💎 The Golden Gems
-| Gem   | Wisdom                                  |
-| ----- | --------------------------------------- |
-| GEM 1 | "The Human Pulse is the Master Clock"   |
-| GEM 2 | "The Illogic is the Compass"            |
-| GEM 3 | "Magnetic Trillage Over Brute Force"    |
-| GEM 4 | "The Hallucination is the Raw Material" |
-| GEM 5 | "T-CHIP is the Mirror"                  |
----
-## 🤝 Contributing
-See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
----
-## 📄 License
-MIT License — see [LICENSE](LICENSE).
----
-## 📞 Contact
-**TRACE ON LAB**
-📧 traceonlab@proton.me
----
-## 🛡️ The Call
-> _"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."_
-```
-╔═══════════════════════════════════════════════════════╗
-║  🧲 TRACE ON LAB — TRIGNUM-300M — v.300M              ║
-║                                                       ║
-║  The Pre-Flight Check for Autonomous AI.              ║
-║  Zero models. Zero API calls. 146,866 samples/second. ║
-║                                                       ║
-║  🔵 T-CHIP: CLEARED FOR TAKEOFF.                      ║
-╚═══════════════════════════════════════════════════════╝
-```
-⭐ **Star this repo if you believe AI should check its logic before it acts.**

+---
+license: mit
+language:
+- en
+tags:
+- zero-shot
+- natural-language-inference
+- self-reflection
+- logic
+- reasoning
+- evaluation
+---
+<div align="center">
+# 🧲 TRIGNUM-300M
+### The Pre-Flight Check for Autonomous AI
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
+[![Benchmarked](https://img.shields.io/badge/HaluEval-58%2C293_samples-green.svg)](#-benchmark-results)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18672142.svg)](https://doi.org/10.5281/zenodo.18672142)
+> **"You wouldn't let a plane take off without a pre-flight check.**
+> **Why are we letting AI agents act without one?"**
+<img src="assets/roadmap_architecture.jpg" width="800" alt="TRIGNUM-300M Architecture Flowchart" />
+</div>
+---
+<div align="center">
+  <!--
+    TODO: Add your demo GIF here!
+    1. Record demo/index.html with ScreenToGif
+    2. Save as assets/trignum_demo.gif
+    3. Uncomment line below:
+  -->
+  <!-- <img src="assets/trignum_demo.gif" width="800" alt="TRIGNUM-300M Demo" /> -->
+</div>
+## What Is This?
+TRIGNUM-300M is a **zero-model reasoning integrity validator** for LLM outputs. It catches structural logic failures — contradictions, circular reasoning, non-sequiturs — before an AI agent acts on them.
+```python
+from trignum_core.subtractive_filter import SubtractiveFilter
+sf = SubtractiveFilter()
+result = sf.apply(agent_output)
+if result.illogics_found:
+    agent.halt(reason=result.illogics_found)
+    # T-CHIP glows RED 🔴 → Human review required
+else:
+    agent.execute()
+    # T-CHIP glows BLUE 🔵 → Cleared for takeoff
+```
+**No LLM. No API. No training data. ~300 lines of Python. <1ms.**
+---
+## 🔬 Benchmark Results
+We expanded our evaluation to **58,000+ real LLM outputs** including a new **517-sample curated dataset** for structural reasoning. Honest results:
+| Benchmark                    | Samples | Precision | Recall | F1        | Speed |
+| ---------------------------- | ------- | --------- | ------ | --------- | ----- |
+| **Structural illogic (curated)** | **517**      | **100%**  | **98.9%**    | **99.5%** | **<1ms**  |
+| HaluEval (full dataset)      | 58,293  | 60%       | 2.1%   | 4.0%      | 706ms |
+### What this means:
+- **99.5% F1 on structural reasoning failures** — contradictions, circular logic, unsupported conclusions
+- **4.0% F1 on factual hallucinations** — we don't catch wrong facts
+**That's the point.** There are 100 tools for fact-checking. There are **zero tools for reasoning-checking.** Until now.
+### Per-Task Breakdown (HaluEval)
+| Task          | n      | Precision | Recall | F1    |
+| ------------- | ------ | --------- | ------ | ----- |
+| QA            | 18,316 | 83.3%     | 0.25%  | 0.50% |
+| Dialogue      | 19,977 | 60.1%     | 4.38%  | 8.16% |
+| Summarization | 20,000 | 57.4%     | 1.60%  | 3.11% |
+**Throughput: 146,866 samples/second** — orders of magnitude faster than LLM-based validation.
+---
+## ✈️ The Pre-Flight Check Analogy
+A pre-flight checklist doesn't verify that London exists. It verifies that:
+- ✅ Instruments don't **contradict** each other
+- ✅ There are no **circular faults** (sensor A confirms B confirms A)
+- ✅ The flight computer draws **conclusions from actual data**
+- ✅ Systems are **logically consistent**
+The Subtractive Filter does the same for AI reasoning:
+```
+LLM Output → Subtractive Filter → [PASS] 🔵 → Agent Executes
+                                 → [FAIL] 🔴 → Agent Halts → Human Review
+```
+---
+## 🤖 The Missing "Agentic Validator"
+In the context of the recent shift towards **Agentic Reasoning**, autonomous LLMs are moving from static prompts to dynamic _thought-action_ loops involving planning, tool-use, and multi-agent collaboration.
+Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. **But there has been no validator for pure logic.** If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades.
+TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms **Validator-Driven Feedback** gate. It halts execution if the agent's internal thought (`zt`) contains a structural illogic, providing an immediate failure signal (`rt = 0`) _before_ the agent commits to an irreversible external action (`at`).
+---
+## 🔺 Core Architecture
+### The Trignum Pyramid
+Three faces acting as magnetic poles for data separation:
+| Face            | Role            | What It Does                                          |
+| --------------- | --------------- | ----------------------------------------------------- |
+| **α (Logic)**   | Truth detection | Identifies structurally sound reasoning               |
+| **β (Illogic)** | Error detection | Catches contradictions, circular logic, non-sequiturs |
+| **γ (Context)** | Human grounding | Anchors output to human intent                        |
+### T-CHIP: The Tensor Character
+```
+╔═══════════════════════════════════════════════════════╗
+║  T-CHIP [v.300M]                                      ║
+║                                                       ║
+║  🔵 Blue  = Logic Stable (Cleared for Takeoff)        ║
+║  🔴 Red   = Illogic Detected (THE FREEZE)             ║
+║  🟡 Gold  = Human Pulse Locked (Sovereign Override)   ║
+║                                                       ║
+║  Response time: <1ms | False alarms: 0% (structural)  ║
+╚═══════════════════════════════════════════════════════╝
+```
+### The Subtractive Filter
+Four detection layers, all pattern-based:
+| Layer              | Catches                              | Method                           |
+| ------------------ | ------------------------------------ | -------------------------------- |
+| **Contradiction**  | "X is always true. X is never true." | Antonym pairs, negation patterns |
+| **Circular Logic** | A proves B proves A                  | Reference chain analysis         |
+| **Non-Sequitur**   | "Therefore X" without premises       | Causal connective analysis       |
+| **Depth Check**    | Claims without any reasoning         | Assertion density scoring        |
+---
+## 📦 Repository Structure
+```
+TRIGNUM-300M-TCHIP/
+├── src/
+│   └── trignum_core/              # Core Python library
+│       ├── pyramid.py             # Trignum Pyramid (3 magnetic faces)
+│       ├── tchip.py               # T-CHIP (glow states)
+│       ├── subtractive_filter.py  # ★ The Subtractive Filter
+│       ├── human_pulse.py         # Human sovereignty layer
+│       └── magnetic_trillage.py   # Data separation
+├── tests/                         # 34 unit tests (all passing)
+├── benchmarks/
+│   ├── hallucination_benchmark.py     # Curated structural test
+│   ├── full_halueval_benchmark.py     # Full 58K HaluEval test
+│   ├── results.json                   # Structural benchmark results
+│   └── full_halueval_results.json     # Full HaluEval results
+├── demo/
+│   └── index.html                 # Three.js 3D interactive demo
+├── paper/
+│   └── TRIGNUM_300M_Position_Paper.md  # Position paper
+├── docs/
+│   └── theory/                    # 6 foundational theory documents
+├── T-CHIP CLEARED FOR TAKEOFF.md  # The pitch
+└── ROADMAP.md                     # 2-quarter development plan
+```
+---
+## 🚀 Quick Start
+```bash
+# Clone
+git clone https://github.com/trace-on-lab/trignum-300m.git
+cd trignum-300m
+# Install
+pip install -r requirements.txt
+pip install -e .
+# Run the structural benchmark
+python benchmarks/hallucination_benchmark.py
+# Run the full HaluEval benchmark (downloads ~13MB of data)
+python benchmarks/full_halueval_benchmark.py
+# Run tests
+pytest tests/ -v
+```
+---
+## 🌐 Prior Art: Nobody Is Doing This
+We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference:
+| System                       | Requires Model  | Validates Reasoning |
+| ---------------------------- | :-------------: | :-----------------: |
+| VerifyLLM (2025)             |     ✅ Yes      |      Partially      |
+| ContraGen                    |     ✅ Yes      |      Partially      |
+| Process Supervision (OpenAI) |     ✅ Yes      |         Yes         |
+| Guardrails AI                | ✅ Configurable |    No (content)     |
+| **Subtractive Filter**       |    **❌ No**    |     **✅ Yes**      |
+> **Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.**
+Read the full analysis in our [position paper](paper/TRIGNUM_300M_Position_Paper.md).
+---
+## ⚛️ Quantum Integration: TQPE
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18751914.svg)](https://doi.org/10.5281/zenodo.18751914)
+TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for **Trignumental Quantum Phase Estimation (TQPE)**.
+In our groundbreaking case study estimating the ground state energy of the **H₂ molecule**, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit _before execution_. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of **82.8%** on the final estimate (-1.1384 Ha).
+Read the full `BUILDING THE BRIDGE` paper on Trignumentality and TQPE in the foundational [Trignumentality](https://github.com/Codfski/trignumentality) repository.
+---
+## 📚 Documentation
+| Document                                                         | Description                         |
+| ---------------------------------------------------------------- | ----------------------------------- |
+| [Core Postulate](docs/theory/01_core_postulate.md)               | The fundamental axioms of Trignum   |
+| [Three Faces](docs/theory/02_three_faces.md)                     | α (Logic), β (Illogic), γ (Context) |
+| [Magnetic Trillage](docs/theory/03_magnetic_trillage.md)         | Data separation mechanism           |
+| [T-CHIP Spec](docs/theory/04_tchip_spec.md)                      | The Tensor Character in detail      |
+| [Cold State Hardware](docs/theory/05_cold_state_hardware.md)     | Hardware implications               |
+| [Hallucination Paradox](docs/theory/06_hallucination_paradox.md) | Reframing the "Big Monster"         |
+| [Position Paper](paper/TRIGNUM_300M_Position_Paper.md)           | Full academic paper with benchmarks |
+| [Roadmap](ROADMAP.md)                                            | 2-quarter development plan          |
+---
+## 💎 The Golden Gems
+| Gem   | Wisdom                                  |
+| ----- | --------------------------------------- |
+| GEM 1 | "The Human Pulse is the Master Clock"   |
+| GEM 2 | "The Illogic is the Compass"            |
+| GEM 3 | "Magnetic Trillage Over Brute Force"    |
+| GEM 4 | "The Hallucination is the Raw Material" |
+| GEM 5 | "T-CHIP is the Mirror"                  |
+---
+## 🤝 Contributing
+See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
+---
+## 📄 License
+MIT License — see [LICENSE](LICENSE).
+---
+## 📞 Contact
+**TRACE ON LAB**
+📧 traceonlab@proton.me
+---
+## 🛡️ The Call
+> _"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."_
+```
+╔═══════════════════════════════════════════════════════╗
+║  🧲 TRACE ON LAB — TRIGNUM-300M — v.300M              ║
+║                                                       ║
+║  The Pre-Flight Check for Autonomous AI.              ║
+║  Zero models. Zero API calls. 146,866 samples/second. ║
+║                                                       ║
+║  🔵 T-CHIP: CLEARED FOR TAKEOFF.                      ║
+╚═══════════════════════════════════════════════════════╝
+```
+⭐ **Star this repo if you believe AI should check its logic before it acts.**