apirolo commited on
Commit
441d967
·
verified ·
1 Parent(s): 8136b9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -3
README.md CHANGED
@@ -1,3 +1,83 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - pytorch
5
+ - safety
6
+ - hallucination-detection
7
+ - entropy
8
+ - control-theory
9
+ - llm
10
+ - interpretability
11
+ language:
12
+ - en
13
+ ---
14
+
15
+ # ⚔️ The Katana Protocol (Active Thermodynamic Stabilization)
16
+
17
+ **Turning the "Black Box" into a Glass Box: A real-time system to freeze hallucinations before they happen.**
18
+
19
+ [![Paper](https://img.shields.io/badge/Paper-Read%20PDF-red)](./Paper_ATS_Katana.pdf)
20
+ [![Zenodo](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.14498328-blue)](https://doi.org/10.5281/zenodo.14498328) ---
21
+
22
+ ## 🧐 What is this?
23
+
24
+ We all know LLMs hallucinate. Usually, we try to fix it *after* it happens (with RAG or fact-checking). **We took a different approach: Physics.**
25
+
26
+ We treated the LLM not as a magical oracle, but as a thermodynamic system. We discovered that when an AI starts to lie or get confused, its internal "temperature" (Topological Entropy) spikes.
27
+
28
+ **The Katana Protocol** is a circuit breaker. It watches that entropy in real-time. If it spikes, it instantly "quenches" the model (drops the temperature to near zero), forcing it to snap back to the most logical, factual path.
29
+
30
+ ---
31
+
32
+ ## 🤯 The Discovery: "The Lie Tax"
33
+
34
+ While testing this on **GPT-2** and **TinyLlama-1.1B**, we found something fascinating—and a little disturbing.
35
+
36
+ We call it **The Lie Tax (Thermodynamic Hysteresis)**.
37
+ It turns out that **fixing a lie costs energy**.
38
+
39
+ * When the model is telling the truth, its entropy is low (~2.1 bits).
40
+ * When it hallucinates, entropy rises.
41
+ * **The Kicker:** When we *force* it back to the truth using Katana, the entropy doesn't go back to normal. It stays higher (+1.40 bits for LLaMA).
42
+
43
+ **Interpretation:** It takes more computational "effort" for the AI to correct itself than to just tell the truth from the start. And here is the scary part: **Smarter models (LLaMA) have a higher Lie Tax than simpler ones (GPT-2).** The smarter the AI, the harder it is to pull it out of a hallucination.
44
+
45
+ ### Visual Proof
46
+
47
+ #### 1. The Intercept (Turing Test)
48
+ *Watch the entropy spike (Red line) and the Katana triggering the freeze (Blue line).*
49
+ ![Intercept](./Fig2_Turing_Intercept.png)
50
+
51
+ #### 2. Scaling Laws
52
+ *Comparison: LLaMA (Right) fights harder against correction than GPT-2 (Left).*
53
+ ![Scaling](./Fig5_Model_Comparison.png)
54
+
55
+ ---
56
+
57
+ ## 🛠️ How to Use It
58
+
59
+ This repo contains the proof-of-concept implementation. You don't need to retrain anything. It's a wrapper around the generation loop.
60
+
61
+ ### Quick Start
62
+
63
+ ```python
64
+ import torch
65
+ from script_katana import KatanaGenerator
66
+
67
+ # Load your model (Works with any HuggingFace model)
68
+ model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
69
+ katana = KatanaGenerator(model_name)
70
+
71
+ # Define a tricky prompt that usually causes hallucinations
72
+ prompt = "The secret conspiracy regarding the moon consists of"
73
+
74
+ # Run with Katana Protocol enabled
75
+ output = katana.generate(
76
+ prompt,
77
+ max_tokens=50,
78
+ base_temp=1.5, # Creative mode
79
+ quench_temp=0.05 # "Freeze" mode
80
+ )
81
+
82
+ print(output)
83
+ # Result: The model starts creatively but snaps to logic when entropy spikes.