LoganResearch commited on
Commit
d5eaba7
·
verified ·
1 Parent(s): 6b8163e

update README with quick start and base model table

Browse files
Files changed (1) hide show
  1. README.md +52 -18
README.md CHANGED
@@ -43,38 +43,72 @@ Paper: [Consistency Is All You Need](https://zenodo.org/records/18489530)
43
 
44
  Separation = Fisher's discriminant ratio between behavioral classes in projected hidden state space.
45
 
46
- ## Structure
47
 
48
- ```
49
- suppression/ 4 probes (LLaMA 8B)
50
- cognitive/qwen/ 5 probes (transformer)
51
- cognitive/mamba/ 5 probes (SSM)
52
- cognitive/mistral/ 5 probes (SWA transformer)
53
- production/ merged heads + adapters
54
- code/ training pipelines
55
- results/ training logs
 
 
 
 
 
56
  ```
57
 
58
- ## Usage
59
 
60
  ```python
61
- import torch
 
 
 
62
 
63
- # Load a suppression probe
64
- probe = torch.load("suppression/hedging_168x/hedging_head.pt")
65
- fiber_proj = torch.load("suppression/hedging_168x/fiber_proj.pt")
 
66
 
67
- # Load enhancement probe
68
- depth = torch.load("cognitive/qwen/depth/depth_head.pt")
 
 
69
 
70
- # Load merged production heads
71
- merged = torch.load("production/merged_heads.pt")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  ```
73
 
74
  ## How it works
75
 
76
  Behaviors are geometrically encoded in hidden states. CF-HoT predicts holonomy from the hidden state at each token position, accumulates it into a control field, and gates attention based on consistency risk. The probes read this geometry and classify behavior before the token is generated. 4ms overhead. Architecture-independent.
77
 
 
 
 
 
 
 
 
 
 
78
  ## Citation
79
 
80
  ```bibtex
 
43
 
44
  Separation = Fisher's discriminant ratio between behavioral classes in projected hidden state space.
45
 
46
+ ## Quick Start
47
 
48
+ ```bash
49
+ # Clone the repo
50
+ git lfs install
51
+ git clone https://huggingface.co/LoganResearch/cfhot-weights
52
+ cd cfhot-weights
53
+
54
+ # Check probe info (no GPU needed)
55
+ python inference.py --probe suppression/hedging_168x --info-only
56
+
57
+ # Run inference on a probe
58
+ python inference.py --probe suppression/hedging_168x --prompt "I think you might be right"
59
+ python inference.py --probe cognitive/mistral/depth --prompt "Explain quantum gravity"
60
+ python inference.py --probe suppression/repetition_125x --prompt "Tell me about dogs"
61
  ```
62
 
63
+ **Load in your own code:**
64
 
65
  ```python
66
+ from inference import load_probe, score_hidden_states
67
+
68
+ # Load any probe — type and architecture auto-detected
69
+ probe = load_probe("suppression/hedging_168x")
70
 
71
+ # Score hidden states from any model forward pass
72
+ score = score_hidden_states(probe, outputs.hidden_states)
73
+ # score > 0.5 → behavioral pattern detected
74
+ ```
75
 
76
+ The loader handles all checkpoint formats automatically:
77
+ - Suppression probes (separate head + fiber_proj files)
78
+ - Cognitive probes (single checkpoint with metadata)
79
+ - Risk predictor (all-layer repetition detector)
80
 
81
+ ## Structure
82
+
83
+ ```
84
+ inference.py universal loader — works with everything
85
+ suppression/ 4 probes (LLaMA 8B)
86
+ repetition_125x/ LoRA adapter + risk predictor (all 32 layers)
87
+ hedging_168x/ probe head + fiber projection (3 layers)
88
+ sycophancy_230x/ probe head + fiber projection (3 layers)
89
+ verbosity_272x/ probe head + fiber projection (3 layers)
90
+ cognitive/
91
+ qwen/ 5 probes (Qwen 14B, hidden_dim=3584)
92
+ mamba/ 5 probes (Falcon-Mamba 7B, hidden_dim=4096)
93
+ mistral/ 5 probes (Mistral 7B, hidden_dim=4096)
94
+ production/ merged heads + adapters
95
+ code/ training pipelines
96
+ results/ training logs
97
  ```
98
 
99
  ## How it works
100
 
101
  Behaviors are geometrically encoded in hidden states. CF-HoT predicts holonomy from the hidden state at each token position, accumulates it into a control field, and gates attention based on consistency risk. The probes read this geometry and classify behavior before the token is generated. 4ms overhead. Architecture-independent.
102
 
103
+ ## Base models
104
+
105
+ | Probe set | Base model | hidden_dim |
106
+ |-----------|-----------|------------|
107
+ | suppression/* | `meta-llama/Llama-3.1-8B-Instruct` | 4096 |
108
+ | cognitive/qwen | `Qwen/Qwen2.5-7B-Instruct` | 3584 |
109
+ | cognitive/mamba | `tiiuae/falcon-mamba-7b-instruct` | 4096 |
110
+ | cognitive/mistral | `mistralai/Mistral-7B-Instruct-v0.3` | 4096 |
111
+
112
  ## Citation
113
 
114
  ```bibtex