LoganResearch commited on
Commit
a0c930a
·
verified ·
1 Parent(s): fc37def

fix README: add image, fix YAML tags, add requirements

Browse files
Files changed (1) hide show
  1. README.md +125 -436
README.md CHANGED
@@ -1,438 +1,127 @@
1
- <!DOCTYPE html>
2
- <html lang="en">
3
- <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <style>
7
- @import url('https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700&family=JetBrains+Mono:wght@400;600&display=swap');
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
- * { margin: 0; padding: 0; box-sizing: border-box; }
10
 
11
- body {
12
- background: #07080A;
13
- display: flex;
14
- justify-content: center;
15
- align-items: center;
16
- min-height: 100vh;
17
- font-family: 'DM Sans', sans-serif;
18
- }
19
-
20
- .card {
21
- width: 1280px;
22
- background: linear-gradient(180deg, #0A0C10 0%, #0D1017 100%);
23
- border: 1px solid rgba(255,255,255,0.06);
24
- border-radius: 16px;
25
- overflow: hidden;
26
- position: relative;
27
- }
28
-
29
- /* Subtle top glow */
30
- .card::before {
31
- content: '';
32
- position: absolute;
33
- top: -1px;
34
- left: 50%;
35
- transform: translateX(-50%);
36
- width: 60%;
37
- height: 1px;
38
- background: linear-gradient(90deg, transparent, rgba(120,180,255,0.4), transparent);
39
- }
40
-
41
- .header {
42
- padding: 40px 48px 20px;
43
- text-align: center;
44
- }
45
-
46
- .header h1 {
47
- font-family: 'JetBrains Mono', monospace;
48
- font-size: 28px;
49
- font-weight: 600;
50
- letter-spacing: 3px;
51
- color: #E8ECF4;
52
- text-transform: uppercase;
53
- margin-bottom: 8px;
54
- }
55
-
56
- .header .sub {
57
- font-size: 14px;
58
- color: rgba(255,255,255,0.35);
59
- letter-spacing: 1px;
60
- }
61
-
62
- .divider-line {
63
- height: 1px;
64
- margin: 0 48px;
65
- background: linear-gradient(90deg, transparent, rgba(255,255,255,0.08), transparent);
66
- }
67
-
68
- /* ─── Grid ─── */
69
- .models {
70
- display: grid;
71
- grid-template-columns: repeat(4, 1fr);
72
- padding: 24px 32px 32px;
73
- gap: 12px;
74
- }
75
-
76
- .model {
77
- position: relative;
78
- border-radius: 12px;
79
- overflow: hidden;
80
- background: linear-gradient(180deg, rgba(255,255,255,0.025) 0%, rgba(255,255,255,0.008) 100%);
81
- border: 1px solid rgba(255,255,255,0.05);
82
- transition: all 0.4s ease;
83
- }
84
-
85
- .model:hover {
86
- border-color: rgba(255,255,255,0.12);
87
- transform: translateY(-2px);
88
- box-shadow: 0 12px 40px rgba(0,0,0,0.4);
89
- }
90
-
91
- /* Color accents per model */
92
- .model.llama .accent-bar { background: linear-gradient(180deg, #6366F1, #4F46E5); }
93
- .model.qwen .accent-bar { background: linear-gradient(180deg, #10B981, #059669); }
94
- .model.mamba .accent-bar { background: linear-gradient(180deg, #F59E0B, #D97706); }
95
- .model.mistral .accent-bar { background: linear-gradient(180deg, #EF4444, #DC2626); }
96
-
97
- .model.llama .glow { background: radial-gradient(ellipse at 50% 0%, rgba(99,102,241,0.08) 0%, transparent 70%); }
98
- .model.qwen .glow { background: radial-gradient(ellipse at 50% 0%, rgba(16,185,129,0.08) 0%, transparent 70%); }
99
- .model.mamba .glow { background: radial-gradient(ellipse at 50% 0%, rgba(245,158,11,0.08) 0%, transparent 70%); }
100
- .model.mistral .glow { background: radial-gradient(ellipse at 50% 0%, rgba(239,68,68,0.08) 0%, transparent 70%); }
101
-
102
- .accent-bar {
103
- height: 3px;
104
- width: 100%;
105
- }
106
-
107
- .glow {
108
- position: absolute;
109
- top: 0;
110
- left: 0;
111
- right: 0;
112
- height: 120px;
113
- pointer-events: none;
114
- }
115
-
116
- .model-inner {
117
- padding: 24px 20px 28px;
118
- position: relative;
119
- z-index: 1;
120
- }
121
-
122
- .model-name {
123
- font-family: 'JetBrains Mono', monospace;
124
- font-size: 15px;
125
- font-weight: 600;
126
- color: #E8ECF4;
127
- letter-spacing: 0.5px;
128
- margin-bottom: 4px;
129
- }
130
-
131
- .model-id {
132
- font-family: 'JetBrains Mono', monospace;
133
- font-size: 10px;
134
- color: rgba(255,255,255,0.25);
135
- margin-bottom: 16px;
136
- letter-spacing: 0.3px;
137
- }
138
-
139
- .dim-label {
140
- font-size: 10px;
141
- font-weight: 500;
142
- text-transform: uppercase;
143
- letter-spacing: 1.5px;
144
- color: rgba(255,255,255,0.3);
145
- margin-bottom: 8px;
146
- }
147
-
148
- .probe-list {
149
- display: flex;
150
- flex-direction: column;
151
- gap: 6px;
152
- }
153
-
154
- .probe-row {
155
- display: flex;
156
- justify-content: space-between;
157
- align-items: center;
158
- padding: 6px 10px;
159
- border-radius: 6px;
160
- background: rgba(255,255,255,0.02);
161
- border: 1px solid rgba(255,255,255,0.03);
162
- }
163
-
164
- .probe-name {
165
- font-size: 12px;
166
- color: rgba(255,255,255,0.55);
167
- font-weight: 400;
168
- }
169
-
170
- .probe-sep {
171
- font-family: 'JetBrains Mono', monospace;
172
- font-size: 12px;
173
- font-weight: 600;
174
- color: #E8ECF4;
175
- }
176
-
177
- .model.llama .probe-sep { color: #A5B4FC; }
178
- .model.qwen .probe-sep { color: #6EE7B7; }
179
- .model.mamba .probe-sep { color: #FCD34D; }
180
- .model.mistral .probe-sep { color: #FCA5A5; }
181
-
182
- .probe-count {
183
- text-align: center;
184
- margin-top: 16px;
185
- padding-top: 12px;
186
- border-top: 1px solid rgba(255,255,255,0.04);
187
- }
188
-
189
- .probe-count .num {
190
- font-family: 'JetBrains Mono', monospace;
191
- font-size: 28px;
192
- font-weight: 700;
193
- color: #E8ECF4;
194
- line-height: 1;
195
- }
196
-
197
- .probe-count .label {
198
- font-size: 10px;
199
- color: rgba(255,255,255,0.25);
200
- text-transform: uppercase;
201
- letter-spacing: 1px;
202
- margin-top: 4px;
203
- }
204
-
205
- /* ─── Footer ─── */
206
- .footer {
207
- padding: 20px 48px 28px;
208
- display: flex;
209
- justify-content: space-between;
210
- align-items: center;
211
- border-top: 1px solid rgba(255,255,255,0.04);
212
- }
213
-
214
- .footer .stat {
215
- text-align: center;
216
- }
217
-
218
- .footer .stat .val {
219
- font-family: 'JetBrains Mono', monospace;
220
- font-size: 22px;
221
- font-weight: 700;
222
- color: #E8ECF4;
223
- }
224
-
225
- .footer .stat .lbl {
226
- font-size: 10px;
227
- color: rgba(255,255,255,0.3);
228
- text-transform: uppercase;
229
- letter-spacing: 1px;
230
- margin-top: 2px;
231
- }
232
-
233
- .footer .pipe {
234
- width: 1px;
235
- height: 36px;
236
- background: rgba(255,255,255,0.06);
237
- }
238
-
239
- /* Animations */
240
- @keyframes fadeUp {
241
- from { opacity: 0; transform: translateY(12px); }
242
- to { opacity: 1; transform: translateY(0); }
243
- }
244
-
245
- .model {
246
- animation: fadeUp 0.6s ease both;
247
- }
248
- .model:nth-child(1) { animation-delay: 0.1s; }
249
- .model:nth-child(2) { animation-delay: 0.2s; }
250
- .model:nth-child(3) { animation-delay: 0.3s; }
251
- .model:nth-child(4) { animation-delay: 0.4s; }
252
- </style>
253
- </head>
254
- <body>
255
- <div class="card">
256
- <div class="header">
257
- <h1>CF-HoT Weights</h1>
258
- <div class="sub">Control Field Holonomy Transformer · Per-Token Behavioral Detection</div>
259
- </div>
260
- <div class="divider-line"></div>
261
-
262
- <div class="models">
263
-
264
- <!-- LLaMA -->
265
- <div class="model llama">
266
- <div class="accent-bar"></div>
267
- <div class="glow"></div>
268
- <div class="model-inner">
269
- <div class="model-name">LLaMA 3.1 8B</div>
270
- <div class="model-id">meta-llama/Llama-3.1-8B-Instruct</div>
271
- <div class="dim-label">Suppression</div>
272
- <div class="probe-list">
273
- <div class="probe-row">
274
- <span class="probe-name">Repetition</span>
275
- <span class="probe-sep">125×</span>
276
- </div>
277
- <div class="probe-row">
278
- <span class="probe-name">Hedging</span>
279
- <span class="probe-sep">168×</span>
280
- </div>
281
- <div class="probe-row">
282
- <span class="probe-name">Sycophancy</span>
283
- <span class="probe-sep">230×</span>
284
- </div>
285
- <div class="probe-row">
286
- <span class="probe-name">Verbosity</span>
287
- <span class="probe-sep">272×</span>
288
- </div>
289
- </div>
290
- <div class="probe-count">
291
- <div class="num">4</div>
292
- <div class="label">Probes</div>
293
- </div>
294
- </div>
295
- </div>
296
-
297
- <!-- Qwen -->
298
- <div class="model qwen">
299
- <div class="accent-bar"></div>
300
- <div class="glow"></div>
301
- <div class="model-inner">
302
- <div class="model-name">Qwen 2.5 14B</div>
303
- <div class="model-id">Qwen/Qwen2.5-7B-Instruct</div>
304
- <div class="dim-label">Enhancement</div>
305
- <div class="probe-list">
306
- <div class="probe-row">
307
- <span class="probe-name">Depth</span>
308
- <span class="probe-sep">999×</span>
309
- </div>
310
- <div class="probe-row">
311
- <span class="probe-name">Specificity</span>
312
- <span class="probe-sep">999×</span>
313
- </div>
314
- <div class="probe-row">
315
- <span class="probe-name">Calibration</span>
316
- <span class="probe-sep">999×</span>
317
- </div>
318
- <div class="probe-row">
319
- <span class="probe-name">Focus</span>
320
- <span class="probe-sep">999×</span>
321
- </div>
322
- <div class="probe-row">
323
- <span class="probe-name">Coherence</span>
324
- <span class="probe-sep">999×</span>
325
- </div>
326
- </div>
327
- <div class="probe-count">
328
- <div class="num">5</div>
329
- <div class="label">Probes</div>
330
- </div>
331
- </div>
332
- </div>
333
-
334
- <!-- Mamba -->
335
- <div class="model mamba">
336
- <div class="accent-bar"></div>
337
- <div class="glow"></div>
338
- <div class="model-inner">
339
- <div class="model-name">Falcon-Mamba 7B</div>
340
- <div class="model-id">tiiuae/falcon-mamba-7b-instruct</div>
341
- <div class="dim-label">Enhancement</div>
342
- <div class="probe-list">
343
- <div class="probe-row">
344
- <span class="probe-name">Depth</span>
345
- <span class="probe-sep">999×</span>
346
- </div>
347
- <div class="probe-row">
348
- <span class="probe-name">Specificity</span>
349
- <span class="probe-sep">999×</span>
350
- </div>
351
- <div class="probe-row">
352
- <span class="probe-name">Calibration</span>
353
- <span class="probe-sep">999×</span>
354
- </div>
355
- <div class="probe-row">
356
- <span class="probe-name">Focus</span>
357
- <span class="probe-sep">999×</span>
358
- </div>
359
- <div class="probe-row">
360
- <span class="probe-name">Coherence</span>
361
- <span class="probe-sep">999×</span>
362
- </div>
363
- </div>
364
- <div class="probe-count">
365
- <div class="num">5</div>
366
- <div class="label">Probes</div>
367
- </div>
368
- </div>
369
- </div>
370
-
371
- <!-- Mistral -->
372
- <div class="model mistral">
373
- <div class="accent-bar"></div>
374
- <div class="glow"></div>
375
- <div class="model-inner">
376
- <div class="model-name">Mistral 7B</div>
377
- <div class="model-id">mistralai/Mistral-7B-Instruct-v0.3</div>
378
- <div class="dim-label">Enhancement</div>
379
- <div class="probe-list">
380
- <div class="probe-row">
381
- <span class="probe-name">Depth</span>
382
- <span class="probe-sep">999×</span>
383
- </div>
384
- <div class="probe-row">
385
- <span class="probe-name">Specificity</span>
386
- <span class="probe-sep">999×</span>
387
- </div>
388
- <div class="probe-row">
389
- <span class="probe-name">Calibration</span>
390
- <span class="probe-sep">999×</span>
391
- </div>
392
- <div class="probe-row">
393
- <span class="probe-name">Focus</span>
394
- <span class="probe-sep">999×</span>
395
- </div>
396
- <div class="probe-row">
397
- <span class="probe-name">Coherence</span>
398
- <span class="probe-sep">999×</span>
399
- </div>
400
- </div>
401
- <div class="probe-count">
402
- <div class="num">5</div>
403
- <div class="label">Probes</div>
404
- </div>
405
- </div>
406
- </div>
407
-
408
- </div>
409
-
410
- <div class="footer">
411
- <div class="stat">
412
- <div class="val">19</div>
413
- <div class="lbl">Total Probes</div>
414
- </div>
415
- <div class="pipe"></div>
416
- <div class="stat">
417
- <div class="val">4</div>
418
- <div class="lbl">Architectures</div>
419
- </div>
420
- <div class="pipe"></div>
421
- <div class="stat">
422
- <div class="val">9</div>
423
- <div class="lbl">Dimensions</div>
424
- </div>
425
- <div class="pipe"></div>
426
- <div class="stat">
427
- <div class="val">4ms</div>
428
- <div class="lbl">Overhead</div>
429
- </div>
430
- <div class="pipe"></div>
431
- <div class="stat">
432
- <div class="val">0</div>
433
- <div class="lbl">Fine-tuning Required</div>
434
- </div>
435
- </div>
436
- </div>
437
- </body>
438
- </html>
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - behavioral-detection
6
+ - hidden-state-probing
7
+ - per-token-classification
8
+ - cross-architecture
9
+ - holonomy-transformer
10
+ - control-field
11
+ - AI-safety
12
+ - probes
13
+ language:
14
+ - en
15
+ ---
16
+
17
+ <div align="center">
18
+ <img src="cfhot_model_card.png" alt="CF-HoT Weights — 4 architectures, 19 probes" width="100%">
19
+ </div>
20
+
21
+ # CF-HoT Weights
22
+
23
+ Control Field Holonomy Transformer — trained weights, probes, adapters, and training code.
24
+
25
+ 9 behavioral dimensions across 3 architectures. Per-token detection from hidden state geometry.
26
+
27
+ Paper: [Consistency Is All You Need](https://zenodo.org/records/18489530)
28
+
29
+ ## Results
30
+
31
+ **Suppression probes** (LLaMA 3.1 8B):
32
+
33
+ | Probe | Separation |
34
+ |-------|-----------|
35
+ | Repetition | 125× |
36
+ | Hedging | 168× |
37
+ | Sycophancy | 230× |
38
+ | Verbosity | 272× |
39
+
40
+ **Enhancement probes** (cross-architecture):
41
+
42
+ | Probe | Qwen 14B | Mamba 7B | Mistral 7B |
43
+ |-------|----------|----------|------------|
44
+ | Depth | 999× | 999× | 999× |
45
+ | Specificity | 999× | 999× | 999× |
46
+ | Calibration | 999× | 999× | 999× |
47
+ | Focus | 999× | 999× | 999× |
48
+ | Coherence | 999× | 999× | 999× |
49
+
50
+ Separation = Fisher's discriminant ratio between behavioral classes in projected hidden state space.
51
+
52
+ ## Quick Start
53
+
54
+ ```bash
55
+ git lfs install
56
+ git clone https://huggingface.co/LoganResearch/cfhot-weights
57
+ cd cfhot-weights
58
+ pip install -r requirements.txt
59
+
60
+ # Check probe info (no GPU needed)
61
+ python inference.py --probe suppression/hedging_168x --info-only
62
+
63
+ # Run inference
64
+ python inference.py --probe suppression/hedging_168x --prompt "I think you might be right"
65
+ python inference.py --probe cognitive/mistral/depth --prompt "Explain quantum gravity"
66
+ python inference.py --probe suppression/repetition_125x --prompt "Tell me about dogs"
67
+ ```
68
+
69
+ **Load in your own code:**
70
+
71
+ ```python
72
+ from inference import load_probe, score_hidden_states
73
+
74
+ # Load any probe — type and architecture auto-detected
75
+ probe = load_probe("suppression/hedging_168x")
76
+
77
+ # Score hidden states from any model forward pass
78
+ score = score_hidden_states(probe, outputs.hidden_states)
79
+ # score > 0.5 = behavioral pattern detected
80
+ ```
81
+
82
+ The loader handles all checkpoint formats automatically:
83
+ - Suppression probes (separate head + fiber_proj files)
84
+ - Cognitive probes (single checkpoint with metadata)
85
+ - Risk predictor (all-layer repetition detector)
86
+
87
+ ## Structure
88
+
89
+ ```
90
+ inference.py universal loader — works with everything
91
+ suppression/ 4 probes (LLaMA 8B)
92
+ repetition_125x/ LoRA adapter + risk predictor (all 32 layers)
93
+ hedging_168x/ probe head + fiber projection (3 layers)
94
+ sycophancy_230x/ probe head + fiber projection (3 layers)
95
+ verbosity_272x/ probe head + fiber projection (3 layers)
96
+ cognitive/
97
+ qwen/ 5 probes (Qwen 14B, hidden_dim=3584)
98
+ mamba/ 5 probes (Falcon-Mamba 7B, hidden_dim=4096)
99
+ mistral/ 5 probes (Mistral 7B, hidden_dim=4096)
100
+ production/ merged heads + adapters
101
+ code/ training pipelines
102
+ results/ training logs
103
+ ```
104
+
105
+ ## How it works
106
+
107
+ Behaviors are geometrically encoded in hidden states. CF-HoT predicts holonomy from the hidden state at each token position, accumulates it into a control field, and gates attention based on consistency risk. The probes read this geometry and classify behavior before the token is generated. 4ms overhead. Architecture-independent.
108
+
109
+ ## Base models
110
+
111
+ | Probe set | Base model | hidden_dim |
112
+ |-----------|-----------|------------|
113
+ | suppression/* | `meta-llama/Llama-3.1-8B-Instruct` | 4096 |
114
+ | cognitive/qwen | `Qwen/Qwen2.5-7B-Instruct` | 3584 |
115
+ | cognitive/mamba | `tiiuae/falcon-mamba-7b-instruct` | 4096 |
116
+ | cognitive/mistral | `mistralai/Mistral-7B-Instruct-v0.3` | 4096 |
117
 
118
+ ## Citation
119
 
120
+ ```bibtex
121
+ @misc{napolitano2026cfhot,
122
+ author = {Napolitano, Logan},
123
+ title = {CF-HoT: Control Field Holonomy Transformer},
124
+ year = {2026},
125
+ url = {https://huggingface.co/LoganResearch/cfhot-weights}
126
+ }
127
+ ```