LoganResearch commited on
Commit
7ecae49
·
verified ·
1 Parent(s): cc41a27

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +436 -117
README.md CHANGED
@@ -1,119 +1,438 @@
1
- ---
2
- license: cc-by-4.0
3
- tags:
4
- - behavioral-detection
5
- - hidden-state-probing
6
- - per-token-classification
7
- - cross-architecture
8
- - holonomy-transformer
9
- - control-field
10
- - AI-safety
11
- language:
12
- - en
13
- thumbnail: cfhot_model_card.png
14
- ---
15
-
16
- <p align="center">
17
- <img src="cfhot_model_card.png" alt="CF-HoT Weights" width="100%">
18
- </p>
19
-
20
- # CF-HoT Weights
21
-
22
- Control Field Holonomy Transformer — trained weights, probes, adapters, and training code.
23
-
24
- 9 behavioral dimensions across 3 architectures. Per-token detection from hidden state geometry.
25
-
26
- Paper: [Consistency Is All You Need](https://zenodo.org/records/18489530)
27
-
28
- ## Results
29
-
30
- **Suppression probes** (LLaMA 3.1 8B):
31
-
32
- | Probe | Separation |
33
- |-------|-----------|
34
- | Repetition | 125× |
35
- | Hedging | 168× |
36
- | Sycophancy | 230× |
37
- | Verbosity | 272× |
38
-
39
- **Enhancement probes** (cross-architecture):
40
-
41
- | Probe | Qwen 14B | Mamba 7B | Mistral 7B |
42
- |-------|----------|----------|------------|
43
- | Depth | 999× | 999× | 999× |
44
- | Specificity | 999× | 999× | 999× |
45
- | Calibration | 999× | 999× | 999× |
46
- | Focus | 999× | 999× | 999× |
47
- | Coherence | 999× | 999× | 999× |
48
-
49
- Separation = Fisher's discriminant ratio between behavioral classes in projected hidden state space.
50
-
51
- ## Quick Start
52
-
53
- ```bash
54
- git lfs install
55
- git clone https://huggingface.co/LoganResearch/cfhot-weights
56
- cd cfhot-weights
57
-
58
- # Check probe info (no GPU needed)
59
- python inference.py --probe suppression/hedging_168x --info-only
60
-
61
- # Run inference
62
- python inference.py --probe suppression/hedging_168x --prompt "I think you might be right"
63
- python inference.py --probe cognitive/mistral/depth --prompt "Explain quantum gravity"
64
- python inference.py --probe suppression/repetition_125x --prompt "Tell me about dogs"
65
- ```
66
-
67
- **Load in your own code:**
68
-
69
- ```python
70
- from inference import load_probe, score_hidden_states
71
-
72
- probe = load_probe("suppression/hedging_168x")
73
- score = score_hidden_states(probe, outputs.hidden_states)
74
- # score > 0.5 → behavioral pattern detected
75
- ```
76
-
77
- The loader handles all checkpoint formats automatically.
78
-
79
- ## Structure
80
-
81
- ```
82
- inference.py universal loader — works with everything
83
- suppression/ 4 probes (LLaMA 8B)
84
- repetition_125x/ LoRA adapter + risk predictor (all 32 layers)
85
- hedging_168x/ probe head + fiber projection (3 layers)
86
- sycophancy_230x/ probe head + fiber projection (3 layers)
87
- verbosity_272x/ probe head + fiber projection (3 layers)
88
- cognitive/
89
- qwen/ 5 probes (Qwen 14B, hidden_dim=3584)
90
- mamba/ 5 probes (Falcon-Mamba 7B, hidden_dim=4096)
91
- mistral/ 5 probes (Mistral 7B, hidden_dim=4096)
92
- production/ merged heads + adapters
93
- code/ training pipelines
94
- results/ training logs
95
- ```
96
-
97
- ## How it works
98
-
99
- Behaviors are geometrically encoded in hidden states. CF-HoT predicts holonomy from the hidden state at each token position, accumulates it into a control field, and gates attention based on consistency risk. The probes read this geometry and classify behavior before the token is generated. 4ms overhead. Architecture-independent.
100
-
101
- ## Base models
102
-
103
- | Probe set | Base model | hidden_dim |
104
- |-----------|-----------|------------|
105
- | suppression/* | `meta-llama/Llama-3.1-8B-Instruct` | 4096 |
106
- | cognitive/qwen | `Qwen/Qwen2.5-7B-Instruct` | 3584 |
107
- | cognitive/mamba | `tiiuae/falcon-mamba-7b-instruct` | 4096 |
108
- | cognitive/mistral | `mistralai/Mistral-7B-Instruct-v0.3` | 4096 |
109
 
110
- ## Citation
111
 
112
- ```bibtex
113
- @misc{napolitano2026cfhot,
114
- author = {Napolitano, Logan},
115
- title = {CF-HoT: Control Field Holonomy Transformer},
116
- year = {2026},
117
- url = {https://huggingface.co/LoganResearch/cfhot-weights}
118
- }
119
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <style>
7
+ @import url('https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700&family=JetBrains+Mono:wght@400;600&display=swap');
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
+ * { margin: 0; padding: 0; box-sizing: border-box; }
10
 
11
+ body {
12
+ background: #07080A;
13
+ display: flex;
14
+ justify-content: center;
15
+ align-items: center;
16
+ min-height: 100vh;
17
+ font-family: 'DM Sans', sans-serif;
18
+ }
19
+
20
+ .card {
21
+ width: 1280px;
22
+ background: linear-gradient(180deg, #0A0C10 0%, #0D1017 100%);
23
+ border: 1px solid rgba(255,255,255,0.06);
24
+ border-radius: 16px;
25
+ overflow: hidden;
26
+ position: relative;
27
+ }
28
+
29
+ /* Subtle top glow */
30
+ .card::before {
31
+ content: '';
32
+ position: absolute;
33
+ top: -1px;
34
+ left: 50%;
35
+ transform: translateX(-50%);
36
+ width: 60%;
37
+ height: 1px;
38
+ background: linear-gradient(90deg, transparent, rgba(120,180,255,0.4), transparent);
39
+ }
40
+
41
+ .header {
42
+ padding: 40px 48px 20px;
43
+ text-align: center;
44
+ }
45
+
46
+ .header h1 {
47
+ font-family: 'JetBrains Mono', monospace;
48
+ font-size: 28px;
49
+ font-weight: 600;
50
+ letter-spacing: 3px;
51
+ color: #E8ECF4;
52
+ text-transform: uppercase;
53
+ margin-bottom: 8px;
54
+ }
55
+
56
+ .header .sub {
57
+ font-size: 14px;
58
+ color: rgba(255,255,255,0.35);
59
+ letter-spacing: 1px;
60
+ }
61
+
62
+ .divider-line {
63
+ height: 1px;
64
+ margin: 0 48px;
65
+ background: linear-gradient(90deg, transparent, rgba(255,255,255,0.08), transparent);
66
+ }
67
+
68
+ /* ─── Grid ─── */
69
+ .models {
70
+ display: grid;
71
+ grid-template-columns: repeat(4, 1fr);
72
+ padding: 24px 32px 32px;
73
+ gap: 12px;
74
+ }
75
+
76
+ .model {
77
+ position: relative;
78
+ border-radius: 12px;
79
+ overflow: hidden;
80
+ background: linear-gradient(180deg, rgba(255,255,255,0.025) 0%, rgba(255,255,255,0.008) 100%);
81
+ border: 1px solid rgba(255,255,255,0.05);
82
+ transition: all 0.4s ease;
83
+ }
84
+
85
+ .model:hover {
86
+ border-color: rgba(255,255,255,0.12);
87
+ transform: translateY(-2px);
88
+ box-shadow: 0 12px 40px rgba(0,0,0,0.4);
89
+ }
90
+
91
+ /* Color accents per model */
92
+ .model.llama .accent-bar { background: linear-gradient(180deg, #6366F1, #4F46E5); }
93
+ .model.qwen .accent-bar { background: linear-gradient(180deg, #10B981, #059669); }
94
+ .model.mamba .accent-bar { background: linear-gradient(180deg, #F59E0B, #D97706); }
95
+ .model.mistral .accent-bar { background: linear-gradient(180deg, #EF4444, #DC2626); }
96
+
97
+ .model.llama .glow { background: radial-gradient(ellipse at 50% 0%, rgba(99,102,241,0.08) 0%, transparent 70%); }
98
+ .model.qwen .glow { background: radial-gradient(ellipse at 50% 0%, rgba(16,185,129,0.08) 0%, transparent 70%); }
99
+ .model.mamba .glow { background: radial-gradient(ellipse at 50% 0%, rgba(245,158,11,0.08) 0%, transparent 70%); }
100
+ .model.mistral .glow { background: radial-gradient(ellipse at 50% 0%, rgba(239,68,68,0.08) 0%, transparent 70%); }
101
+
102
+ .accent-bar {
103
+ height: 3px;
104
+ width: 100%;
105
+ }
106
+
107
+ .glow {
108
+ position: absolute;
109
+ top: 0;
110
+ left: 0;
111
+ right: 0;
112
+ height: 120px;
113
+ pointer-events: none;
114
+ }
115
+
116
+ .model-inner {
117
+ padding: 24px 20px 28px;
118
+ position: relative;
119
+ z-index: 1;
120
+ }
121
+
122
+ .model-name {
123
+ font-family: 'JetBrains Mono', monospace;
124
+ font-size: 15px;
125
+ font-weight: 600;
126
+ color: #E8ECF4;
127
+ letter-spacing: 0.5px;
128
+ margin-bottom: 4px;
129
+ }
130
+
131
+ .model-id {
132
+ font-family: 'JetBrains Mono', monospace;
133
+ font-size: 10px;
134
+ color: rgba(255,255,255,0.25);
135
+ margin-bottom: 16px;
136
+ letter-spacing: 0.3px;
137
+ }
138
+
139
+ .dim-label {
140
+ font-size: 10px;
141
+ font-weight: 500;
142
+ text-transform: uppercase;
143
+ letter-spacing: 1.5px;
144
+ color: rgba(255,255,255,0.3);
145
+ margin-bottom: 8px;
146
+ }
147
+
148
+ .probe-list {
149
+ display: flex;
150
+ flex-direction: column;
151
+ gap: 6px;
152
+ }
153
+
154
+ .probe-row {
155
+ display: flex;
156
+ justify-content: space-between;
157
+ align-items: center;
158
+ padding: 6px 10px;
159
+ border-radius: 6px;
160
+ background: rgba(255,255,255,0.02);
161
+ border: 1px solid rgba(255,255,255,0.03);
162
+ }
163
+
164
+ .probe-name {
165
+ font-size: 12px;
166
+ color: rgba(255,255,255,0.55);
167
+ font-weight: 400;
168
+ }
169
+
170
+ .probe-sep {
171
+ font-family: 'JetBrains Mono', monospace;
172
+ font-size: 12px;
173
+ font-weight: 600;
174
+ color: #E8ECF4;
175
+ }
176
+
177
+ .model.llama .probe-sep { color: #A5B4FC; }
178
+ .model.qwen .probe-sep { color: #6EE7B7; }
179
+ .model.mamba .probe-sep { color: #FCD34D; }
180
+ .model.mistral .probe-sep { color: #FCA5A5; }
181
+
182
+ .probe-count {
183
+ text-align: center;
184
+ margin-top: 16px;
185
+ padding-top: 12px;
186
+ border-top: 1px solid rgba(255,255,255,0.04);
187
+ }
188
+
189
+ .probe-count .num {
190
+ font-family: 'JetBrains Mono', monospace;
191
+ font-size: 28px;
192
+ font-weight: 700;
193
+ color: #E8ECF4;
194
+ line-height: 1;
195
+ }
196
+
197
+ .probe-count .label {
198
+ font-size: 10px;
199
+ color: rgba(255,255,255,0.25);
200
+ text-transform: uppercase;
201
+ letter-spacing: 1px;
202
+ margin-top: 4px;
203
+ }
204
+
205
+ /* ─── Footer ─── */
206
+ .footer {
207
+ padding: 20px 48px 28px;
208
+ display: flex;
209
+ justify-content: space-between;
210
+ align-items: center;
211
+ border-top: 1px solid rgba(255,255,255,0.04);
212
+ }
213
+
214
+ .footer .stat {
215
+ text-align: center;
216
+ }
217
+
218
+ .footer .stat .val {
219
+ font-family: 'JetBrains Mono', monospace;
220
+ font-size: 22px;
221
+ font-weight: 700;
222
+ color: #E8ECF4;
223
+ }
224
+
225
+ .footer .stat .lbl {
226
+ font-size: 10px;
227
+ color: rgba(255,255,255,0.3);
228
+ text-transform: uppercase;
229
+ letter-spacing: 1px;
230
+ margin-top: 2px;
231
+ }
232
+
233
+ .footer .pipe {
234
+ width: 1px;
235
+ height: 36px;
236
+ background: rgba(255,255,255,0.06);
237
+ }
238
+
239
+ /* Animations */
240
+ @keyframes fadeUp {
241
+ from { opacity: 0; transform: translateY(12px); }
242
+ to { opacity: 1; transform: translateY(0); }
243
+ }
244
+
245
+ .model {
246
+ animation: fadeUp 0.6s ease both;
247
+ }
248
+ .model:nth-child(1) { animation-delay: 0.1s; }
249
+ .model:nth-child(2) { animation-delay: 0.2s; }
250
+ .model:nth-child(3) { animation-delay: 0.3s; }
251
+ .model:nth-child(4) { animation-delay: 0.4s; }
252
+ </style>
253
+ </head>
254
+ <body>
255
+ <div class="card">
256
+ <div class="header">
257
+ <h1>CF-HoT Weights</h1>
258
+ <div class="sub">Control Field Holonomy Transformer · Per-Token Behavioral Detection</div>
259
+ </div>
260
+ <div class="divider-line"></div>
261
+
262
+ <div class="models">
263
+
264
+ <!-- LLaMA -->
265
+ <div class="model llama">
266
+ <div class="accent-bar"></div>
267
+ <div class="glow"></div>
268
+ <div class="model-inner">
269
+ <div class="model-name">LLaMA 3.1 8B</div>
270
+ <div class="model-id">meta-llama/Llama-3.1-8B-Instruct</div>
271
+ <div class="dim-label">Suppression</div>
272
+ <div class="probe-list">
273
+ <div class="probe-row">
274
+ <span class="probe-name">Repetition</span>
275
+ <span class="probe-sep">125×</span>
276
+ </div>
277
+ <div class="probe-row">
278
+ <span class="probe-name">Hedging</span>
279
+ <span class="probe-sep">168×</span>
280
+ </div>
281
+ <div class="probe-row">
282
+ <span class="probe-name">Sycophancy</span>
283
+ <span class="probe-sep">230×</span>
284
+ </div>
285
+ <div class="probe-row">
286
+ <span class="probe-name">Verbosity</span>
287
+ <span class="probe-sep">272×</span>
288
+ </div>
289
+ </div>
290
+ <div class="probe-count">
291
+ <div class="num">4</div>
292
+ <div class="label">Probes</div>
293
+ </div>
294
+ </div>
295
+ </div>
296
+
297
+ <!-- Qwen -->
298
+ <div class="model qwen">
299
+ <div class="accent-bar"></div>
300
+ <div class="glow"></div>
301
+ <div class="model-inner">
302
+ <div class="model-name">Qwen 2.5 14B</div>
303
+ <div class="model-id">Qwen/Qwen2.5-7B-Instruct</div>
304
+ <div class="dim-label">Enhancement</div>
305
+ <div class="probe-list">
306
+ <div class="probe-row">
307
+ <span class="probe-name">Depth</span>
308
+ <span class="probe-sep">999×</span>
309
+ </div>
310
+ <div class="probe-row">
311
+ <span class="probe-name">Specificity</span>
312
+ <span class="probe-sep">999×</span>
313
+ </div>
314
+ <div class="probe-row">
315
+ <span class="probe-name">Calibration</span>
316
+ <span class="probe-sep">999×</span>
317
+ </div>
318
+ <div class="probe-row">
319
+ <span class="probe-name">Focus</span>
320
+ <span class="probe-sep">999×</span>
321
+ </div>
322
+ <div class="probe-row">
323
+ <span class="probe-name">Coherence</span>
324
+ <span class="probe-sep">999×</span>
325
+ </div>
326
+ </div>
327
+ <div class="probe-count">
328
+ <div class="num">5</div>
329
+ <div class="label">Probes</div>
330
+ </div>
331
+ </div>
332
+ </div>
333
+
334
+ <!-- Mamba -->
335
+ <div class="model mamba">
336
+ <div class="accent-bar"></div>
337
+ <div class="glow"></div>
338
+ <div class="model-inner">
339
+ <div class="model-name">Falcon-Mamba 7B</div>
340
+ <div class="model-id">tiiuae/falcon-mamba-7b-instruct</div>
341
+ <div class="dim-label">Enhancement</div>
342
+ <div class="probe-list">
343
+ <div class="probe-row">
344
+ <span class="probe-name">Depth</span>
345
+ <span class="probe-sep">999×</span>
346
+ </div>
347
+ <div class="probe-row">
348
+ <span class="probe-name">Specificity</span>
349
+ <span class="probe-sep">999×</span>
350
+ </div>
351
+ <div class="probe-row">
352
+ <span class="probe-name">Calibration</span>
353
+ <span class="probe-sep">999×</span>
354
+ </div>
355
+ <div class="probe-row">
356
+ <span class="probe-name">Focus</span>
357
+ <span class="probe-sep">999×</span>
358
+ </div>
359
+ <div class="probe-row">
360
+ <span class="probe-name">Coherence</span>
361
+ <span class="probe-sep">999×</span>
362
+ </div>
363
+ </div>
364
+ <div class="probe-count">
365
+ <div class="num">5</div>
366
+ <div class="label">Probes</div>
367
+ </div>
368
+ </div>
369
+ </div>
370
+
371
+ <!-- Mistral -->
372
+ <div class="model mistral">
373
+ <div class="accent-bar"></div>
374
+ <div class="glow"></div>
375
+ <div class="model-inner">
376
+ <div class="model-name">Mistral 7B</div>
377
+ <div class="model-id">mistralai/Mistral-7B-Instruct-v0.3</div>
378
+ <div class="dim-label">Enhancement</div>
379
+ <div class="probe-list">
380
+ <div class="probe-row">
381
+ <span class="probe-name">Depth</span>
382
+ <span class="probe-sep">999×</span>
383
+ </div>
384
+ <div class="probe-row">
385
+ <span class="probe-name">Specificity</span>
386
+ <span class="probe-sep">999×</span>
387
+ </div>
388
+ <div class="probe-row">
389
+ <span class="probe-name">Calibration</span>
390
+ <span class="probe-sep">999×</span>
391
+ </div>
392
+ <div class="probe-row">
393
+ <span class="probe-name">Focus</span>
394
+ <span class="probe-sep">999×</span>
395
+ </div>
396
+ <div class="probe-row">
397
+ <span class="probe-name">Coherence</span>
398
+ <span class="probe-sep">999×</span>
399
+ </div>
400
+ </div>
401
+ <div class="probe-count">
402
+ <div class="num">5</div>
403
+ <div class="label">Probes</div>
404
+ </div>
405
+ </div>
406
+ </div>
407
+
408
+ </div>
409
+
410
+ <div class="footer">
411
+ <div class="stat">
412
+ <div class="val">19</div>
413
+ <div class="lbl">Total Probes</div>
414
+ </div>
415
+ <div class="pipe"></div>
416
+ <div class="stat">
417
+ <div class="val">4</div>
418
+ <div class="lbl">Architectures</div>
419
+ </div>
420
+ <div class="pipe"></div>
421
+ <div class="stat">
422
+ <div class="val">9</div>
423
+ <div class="lbl">Dimensions</div>
424
+ </div>
425
+ <div class="pipe"></div>
426
+ <div class="stat">
427
+ <div class="val">4ms</div>
428
+ <div class="lbl">Overhead</div>
429
+ </div>
430
+ <div class="pipe"></div>
431
+ <div class="stat">
432
+ <div class="val">0</div>
433
+ <div class="lbl">Fine-tuning Required</div>
434
+ </div>
435
+ </div>
436
+ </div>
437
+ </body>
438
+ </html>