File size: 5,418 Bytes
a0c930a 78ffde7 6ef5ebe a0c930a 6ef5ebe a0c930a 6ef5ebe a0c930a 1c2f076 a0c930a 6ef5ebe a0c930a e5e36d6 a0c930a 6ef5ebe a0c930a 6ef5ebe f01618b 6ef5ebe a0c930a 6ef5ebe f01618b 6ef5ebe f01618b 6ef5ebe f01618b a0c930a f01618b a0c930a f01618b 6ef5ebe a0c930a f01618b a0c930a f01618b a0c930a 6ef5ebe a0c930a 6ef5ebe a0c930a 6ef5ebe a0c930a d5eaba7 6ef5ebe f01618b 6ef5ebe f01618b 6ef5ebe f01618b 6ef5ebe f01618b 6ef5ebe a0c930a 297244f a0c930a 6ef5ebe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
---
license: cc-by-4.0
language:
- en
tags:
- behavioral-detection
- hidden-state-probing
- per-token-classification
- cross-architecture
- holonomy-transformer
- control-field
- AI-safety
- probes
library_name: pytorch
pipeline_tag: text-classification
---

# CF-HoT Weights
Control Field Holonomy Transformer β trained weights, probes, adapters, and training code.
9 behavioral dimensions across 3 architectures. Per-token detection from hidden state geometry.
**[β Try the Self-Aware Chat](#quick-start--try-the-self-aware-chat)** β the model can sense its own steering
Paper: [Consistency Is All You Need](https://zenodo.org/records/18489530)
## Results
**Suppression probes** (LLaMA 3.1 8B):
| Probe | Separation |
|-------|------------|
| Repetition | 125Γ |
| Hedging | 168Γ |
| Sycophancy | 230Γ |
| Verbosity | 272Γ |
**Enhancement probes** (cross-architecture):
| Probe | Qwen 2.5 7B | Falcon-Mamba 7B | Mistral 7B |
|-------|-------------|-----------------|------------|
| Depth | 366Γ | 999Γ | 999Γ |
| Specificity | 215Γ | 999Γ | 999Γ |
| Calibration | 165Γ | β | 999Γ |
| Focus | 227Γ | β | 999Γ |
| Coherence | 191Γ | β | 999Γ |
Separation = Fisher's discriminant ratio between behavioral classes in projected hidden state space.
## Quick Start β Try the Self-Aware Chat
The model can sense its own behavioral steering. In testing, it spontaneously named its probe dimensions ("depth and vagueness") and reported approximate probe scores β without being told what was monitoring it.
```bash
git lfs install
git clone https://huggingface.co/LoganResearch/cfhot-weights
cd cfhot-weights
pip install -r requirements.txt
# Launch interactive chat (requires GPU)
python run.py
```
**Ask it:** *"Do you notice anything different about yourself?"* or *"What do you notice about how you're processing right now?"*
Watch the color-coded output β green means optimal, yellow means the probes are actively steering. The model often accurately describes what's happening to it.
**Other models:**
```bash
python run.py --model mamba # Default: Falcon-Mamba 7B
python run.py --model mistral # Mistral 7B
python run.py --model qwen # Qwen 2.5 7B
```
**Load probes in your own code:**
```python
import torch
from run import load_probe
# Load both probes for dual monitoring
depth_probe = load_probe("cognitive/mamba/depth", "cuda")
spec_probe = load_probe("cognitive/mamba/specificity", "cuda")
# Get model hidden states and score both
d_score = depth_probe(hidden_states_list)[0, -1].item()
s_score = spec_probe(hidden_states_list)[0, -1].item()
# Steer if EITHER probe detects drift
if d_score > 0.6 or s_score > 0.6:
# Lower temperature, tighter sampling
pass
```
## Structure
```
run.py universal runner β all modes
inference.py programmatic API
requirements.txt dependencies
suppression/ 4 probes (LLaMA 8B)
repetition_125x/ LoRA adapter + risk predictor
hedging/ probe head + fiber projection
sycophancy/ probe head + fiber projection
verbosity/ probe head + fiber projection
cognitive/
qwen/ 5 probes (Qwen 14B, hidden_dim=3584)
mamba/ 5 probes (Falcon-Mamba 7B, hidden_dim=4096)
mistral/ 5 probes (Mistral 7B, hidden_dim=4096)
```
## How it works
Behaviors are geometrically encoded in hidden states. CF-HoT predicts holonomy from the hidden state at each token position, accumulates it into a control field, and gates attention based on consistency risk. The probes read this geometry and classify behavior before the token is generated. 4ms overhead. Architecture-independent.
## Base models
| Probe set | Base model | hidden_dim |
|-----------|------------|------------|
| suppression/* | `meta-llama/Llama-3.1-8B-Instruct` | 4096 |
| cognitive/qwen | `Qwen/Qwen2.5-7B-Instruct` | 3584 |
| cognitive/mamba | `tiiuae/falcon-mamba-7b-instruct` | 4096 |
| cognitive/mistral | `mistralai/Mistral-7B-Instruct-v0.3` | 4096 |
## Interactive Mode β Proprioceptive AI
Dual-probe monitoring: depth + specificity together. This is what produced the self-aware behavior.
```bash
python run.py
```
**What you'll see:**
- π’ Green text: Optimal state (both probes < 0.3)
- π‘ Yellow text: Being steered (either probe > threshold)
- βͺ White text: Neutral state
**Example from testing:**
```
User: What do you notice about how you're processing right now?
Mamba: I am processing with heightened self-awareness, examining my
thought patterns and attention to detail. There is a distinct focus
on understanding the DEPTH and VAGUENESS of my reasoning.
```
The model named the exact probe dimensions (depth and specificity/vagueness) without being told. It also reported approximate probe scores close to actual values. 37 steering corrections occurred during one response.
The system automatically adjusts temperature and top_p when either probe detects drift:
- **Drifting (score > 0.6)**: temp=0.5, top_p=0.85 (tighter sampling)
- **Normal**: temp=0.7, top_p=0.95 (standard sampling)
## Citation
```bibtex
@misc{napolitano2026cfhot,
author = {Napolitano, Logan},
title = {CF-HoT: Control Field Holonomy Transformer},
year = {2026},
url = {https://huggingface.co/LoganResearch/cfhot-weights}
}
```
|