Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,83 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- pytorch
|
| 5 |
+
- safety
|
| 6 |
+
- hallucination-detection
|
| 7 |
+
- entropy
|
| 8 |
+
- control-theory
|
| 9 |
+
- llm
|
| 10 |
+
- interpretability
|
| 11 |
+
language:
|
| 12 |
+
- en
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# ⚔️ The Katana Protocol (Active Thermodynamic Stabilization)
|
| 16 |
+
|
| 17 |
+
**Turning the "Black Box" into a Glass Box: A real-time system to freeze hallucinations before they happen.**
|
| 18 |
+
|
| 19 |
+
[](./Paper_ATS_Katana.pdf)
|
| 20 |
+
[](https://doi.org/10.5281/zenodo.14498328) ---
|
| 21 |
+
|
| 22 |
+
## 🧐 What is this?
|
| 23 |
+
|
| 24 |
+
We all know LLMs hallucinate. Usually, we try to fix it *after* it happens (with RAG or fact-checking). **We took a different approach: Physics.**
|
| 25 |
+
|
| 26 |
+
We treated the LLM not as a magical oracle, but as a thermodynamic system. We discovered that when an AI starts to lie or get confused, its internal "temperature" (Topological Entropy) spikes.
|
| 27 |
+
|
| 28 |
+
**The Katana Protocol** is a circuit breaker. It watches that entropy in real-time. If it spikes, it instantly "quenches" the model (drops the temperature to near zero), forcing it to snap back to the most logical, factual path.
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## 🤯 The Discovery: "The Lie Tax"
|
| 33 |
+
|
| 34 |
+
While testing this on **GPT-2** and **TinyLlama-1.1B**, we found something fascinating—and a little disturbing.
|
| 35 |
+
|
| 36 |
+
We call it **The Lie Tax (Thermodynamic Hysteresis)**.
|
| 37 |
+
It turns out that **fixing a lie costs energy**.
|
| 38 |
+
|
| 39 |
+
* When the model is telling the truth, its entropy is low (~2.1 bits).
|
| 40 |
+
* When it hallucinates, entropy rises.
|
| 41 |
+
* **The Kicker:** When we *force* it back to the truth using Katana, the entropy doesn't go back to normal. It stays higher (+1.40 bits for LLaMA).
|
| 42 |
+
|
| 43 |
+
**Interpretation:** It takes more computational "effort" for the AI to correct itself than to just tell the truth from the start. And here is the scary part: **Smarter models (LLaMA) have a higher Lie Tax than simpler ones (GPT-2).** The smarter the AI, the harder it is to pull it out of a hallucination.
|
| 44 |
+
|
| 45 |
+
### Visual Proof
|
| 46 |
+
|
| 47 |
+
#### 1. The Intercept (Turing Test)
|
| 48 |
+
*Watch the entropy spike (Red line) and the Katana triggering the freeze (Blue line).*
|
| 49 |
+

|
| 50 |
+
|
| 51 |
+
#### 2. Scaling Laws
|
| 52 |
+
*Comparison: LLaMA (Right) fights harder against correction than GPT-2 (Left).*
|
| 53 |
+

|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## 🛠️ How to Use It
|
| 58 |
+
|
| 59 |
+
This repo contains the proof-of-concept implementation. You don't need to retrain anything. It's a wrapper around the generation loop.
|
| 60 |
+
|
| 61 |
+
### Quick Start
|
| 62 |
+
|
| 63 |
+
```python
|
| 64 |
+
import torch
|
| 65 |
+
from script_katana import KatanaGenerator
|
| 66 |
+
|
| 67 |
+
# Load your model (Works with any HuggingFace model)
|
| 68 |
+
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
|
| 69 |
+
katana = KatanaGenerator(model_name)
|
| 70 |
+
|
| 71 |
+
# Define a tricky prompt that usually causes hallucinations
|
| 72 |
+
prompt = "The secret conspiracy regarding the moon consists of"
|
| 73 |
+
|
| 74 |
+
# Run with Katana Protocol enabled
|
| 75 |
+
output = katana.generate(
|
| 76 |
+
prompt,
|
| 77 |
+
max_tokens=50,
|
| 78 |
+
base_temp=1.5, # Creative mode
|
| 79 |
+
quench_temp=0.05 # "Freeze" mode
|
| 80 |
+
)
|
| 81 |
+
|
| 82 |
+
print(output)
|
| 83 |
+
# Result: The model starts creatively but snaps to logic when entropy spikes.
|