update README with quick start and base model table
Browse files
README.md
CHANGED
|
@@ -43,38 +43,72 @@ Paper: [Consistency Is All You Need](https://zenodo.org/records/18489530)
|
|
| 43 |
|
| 44 |
Separation = Fisher's discriminant ratio between behavioral classes in projected hidden state space.
|
| 45 |
|
| 46 |
-
##
|
| 47 |
|
| 48 |
-
```
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
```
|
| 57 |
|
| 58 |
-
|
| 59 |
|
| 60 |
```python
|
| 61 |
-
import
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
-
#
|
| 64 |
-
|
| 65 |
-
|
|
|
|
| 66 |
|
| 67 |
-
|
| 68 |
-
|
|
|
|
|
|
|
| 69 |
|
| 70 |
-
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
```
|
| 73 |
|
| 74 |
## How it works
|
| 75 |
|
| 76 |
Behaviors are geometrically encoded in hidden states. CF-HoT predicts holonomy from the hidden state at each token position, accumulates it into a control field, and gates attention based on consistency risk. The probes read this geometry and classify behavior before the token is generated. 4ms overhead. Architecture-independent.
|
| 77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
## Citation
|
| 79 |
|
| 80 |
```bibtex
|
|
|
|
| 43 |
|
| 44 |
Separation = Fisher's discriminant ratio between behavioral classes in projected hidden state space.
|
| 45 |
|
| 46 |
+
## Quick Start
|
| 47 |
|
| 48 |
+
```bash
|
| 49 |
+
# Clone the repo
|
| 50 |
+
git lfs install
|
| 51 |
+
git clone https://huggingface.co/LoganResearch/cfhot-weights
|
| 52 |
+
cd cfhot-weights
|
| 53 |
+
|
| 54 |
+
# Check probe info (no GPU needed)
|
| 55 |
+
python inference.py --probe suppression/hedging_168x --info-only
|
| 56 |
+
|
| 57 |
+
# Run inference on a probe
|
| 58 |
+
python inference.py --probe suppression/hedging_168x --prompt "I think you might be right"
|
| 59 |
+
python inference.py --probe cognitive/mistral/depth --prompt "Explain quantum gravity"
|
| 60 |
+
python inference.py --probe suppression/repetition_125x --prompt "Tell me about dogs"
|
| 61 |
```
|
| 62 |
|
| 63 |
+
**Load in your own code:**
|
| 64 |
|
| 65 |
```python
|
| 66 |
+
from inference import load_probe, score_hidden_states
|
| 67 |
+
|
| 68 |
+
# Load any probe — type and architecture auto-detected
|
| 69 |
+
probe = load_probe("suppression/hedging_168x")
|
| 70 |
|
| 71 |
+
# Score hidden states from any model forward pass
|
| 72 |
+
score = score_hidden_states(probe, outputs.hidden_states)
|
| 73 |
+
# score > 0.5 → behavioral pattern detected
|
| 74 |
+
```
|
| 75 |
|
| 76 |
+
The loader handles all checkpoint formats automatically:
|
| 77 |
+
- Suppression probes (separate head + fiber_proj files)
|
| 78 |
+
- Cognitive probes (single checkpoint with metadata)
|
| 79 |
+
- Risk predictor (all-layer repetition detector)
|
| 80 |
|
| 81 |
+
## Structure
|
| 82 |
+
|
| 83 |
+
```
|
| 84 |
+
inference.py universal loader — works with everything
|
| 85 |
+
suppression/ 4 probes (LLaMA 8B)
|
| 86 |
+
repetition_125x/ LoRA adapter + risk predictor (all 32 layers)
|
| 87 |
+
hedging_168x/ probe head + fiber projection (3 layers)
|
| 88 |
+
sycophancy_230x/ probe head + fiber projection (3 layers)
|
| 89 |
+
verbosity_272x/ probe head + fiber projection (3 layers)
|
| 90 |
+
cognitive/
|
| 91 |
+
qwen/ 5 probes (Qwen 14B, hidden_dim=3584)
|
| 92 |
+
mamba/ 5 probes (Falcon-Mamba 7B, hidden_dim=4096)
|
| 93 |
+
mistral/ 5 probes (Mistral 7B, hidden_dim=4096)
|
| 94 |
+
production/ merged heads + adapters
|
| 95 |
+
code/ training pipelines
|
| 96 |
+
results/ training logs
|
| 97 |
```
|
| 98 |
|
| 99 |
## How it works
|
| 100 |
|
| 101 |
Behaviors are geometrically encoded in hidden states. CF-HoT predicts holonomy from the hidden state at each token position, accumulates it into a control field, and gates attention based on consistency risk. The probes read this geometry and classify behavior before the token is generated. 4ms overhead. Architecture-independent.
|
| 102 |
|
| 103 |
+
## Base models
|
| 104 |
+
|
| 105 |
+
| Probe set | Base model | hidden_dim |
|
| 106 |
+
|-----------|-----------|------------|
|
| 107 |
+
| suppression/* | `meta-llama/Llama-3.1-8B-Instruct` | 4096 |
|
| 108 |
+
| cognitive/qwen | `Qwen/Qwen2.5-7B-Instruct` | 3584 |
|
| 109 |
+
| cognitive/mamba | `tiiuae/falcon-mamba-7b-instruct` | 4096 |
|
| 110 |
+
| cognitive/mistral | `mistralai/Mistral-7B-Instruct-v0.3` | 4096 |
|
| 111 |
+
|
| 112 |
## Citation
|
| 113 |
|
| 114 |
```bibtex
|