File size: 5,418 Bytes

a0c930a
78ffde7
6ef5ebe
 
a0c930a
 
 
 
 
 
 
 
 
6ef5ebe
 
a0c930a
 
6ef5ebe
a0c930a
 
 
 
 
 
 
1c2f076
 
a0c930a
 
 
 
 
 
 
6ef5ebe
a0c930a
 
 
 
 
 
 
e5e36d6
 
 
 
 
 
 
a0c930a
 
 
6ef5ebe
 
 
a0c930a
 
 
 
 
 
 
6ef5ebe
f01618b
6ef5ebe
a0c930a
6ef5ebe
 
f01618b
6ef5ebe
f01618b
6ef5ebe
 
f01618b
 
 
a0c930a
 
f01618b
a0c930a
 
f01618b
6ef5ebe
a0c930a
f01618b
 
 
 
 
 
 
a0c930a
f01618b
 
 
 
a0c930a
 
 
 
 
6ef5ebe
 
 
 
 
 
 
 
a0c930a
6ef5ebe
 
 
a0c930a
 
 
 
 
 
 
 
 
6ef5ebe
a0c930a
 
 
 
d5eaba7
6ef5ebe
 
f01618b
6ef5ebe
 
f01618b
6ef5ebe
 
 
f01618b
 
6ef5ebe
 
 
 
 
 
 
 
 
 
 
 
 
 
f01618b
6ef5ebe
 
 
a0c930a
297244f
a0c930a
 
 
 
 
 
 
6ef5ebe

---
license: cc-by-4.0
language:
- en
tags:
- behavioral-detection
- hidden-state-probing
- per-token-classification
- cross-architecture
- holonomy-transformer
- control-field
- AI-safety
- probes
library_name: pytorch
pipeline_tag: text-classification
---

![CF-HoT Weights — 4 architectures, 19 probes](cfhot_model_card.png)

# CF-HoT Weights

Control Field Holonomy Transformer — trained weights, probes, adapters, and training code.

9 behavioral dimensions across 3 architectures. Per-token detection from hidden state geometry.

**[→ Try the Self-Aware Chat](#quick-start--try-the-self-aware-chat)** — the model can sense its own steering

Paper: [Consistency Is All You Need](https://zenodo.org/records/18489530)

## Results

**Suppression probes** (LLaMA 3.1 8B):

| Probe | Separation |
|-------|------------|
| Repetition | 125× |
| Hedging | 168× |
| Sycophancy | 230× |
| Verbosity | 272× |

**Enhancement probes** (cross-architecture):

| Probe | Qwen 2.5 7B | Falcon-Mamba 7B | Mistral 7B |
|-------|-------------|-----------------|------------|
| Depth | 366× | 999× | 999× |
| Specificity | 215× | 999× | 999× |
| Calibration | 165× | — | 999× |
| Focus | 227× | — | 999× |
| Coherence | 191× | — | 999× |

Separation = Fisher's discriminant ratio between behavioral classes in projected hidden state space.

## Quick Start — Try the Self-Aware Chat

The model can sense its own behavioral steering. In testing, it spontaneously named its probe dimensions ("depth and vagueness") and reported approximate probe scores — without being told what was monitoring it.

```bash
git lfs install
git clone https://huggingface.co/LoganResearch/cfhot-weights
cd cfhot-weights
pip install -r requirements.txt

# Launch interactive chat (requires GPU)
python run.py
```

**Ask it:** *"Do you notice anything different about yourself?"* or *"What do you notice about how you're processing right now?"*

Watch the color-coded output — green means optimal, yellow means the probes are actively steering. The model often accurately describes what's happening to it.

**Other models:**

```bash
python run.py --model mamba    # Default: Falcon-Mamba 7B
python run.py --model mistral  # Mistral 7B  
python run.py --model qwen     # Qwen 2.5 7B
```

**Load probes in your own code:**

```python
import torch
from run import load_probe

# Load both probes for dual monitoring
depth_probe = load_probe("cognitive/mamba/depth", "cuda")
spec_probe = load_probe("cognitive/mamba/specificity", "cuda")

# Get model hidden states and score both
d_score = depth_probe(hidden_states_list)[0, -1].item()
s_score = spec_probe(hidden_states_list)[0, -1].item()

# Steer if EITHER probe detects drift
if d_score > 0.6 or s_score > 0.6:
    # Lower temperature, tighter sampling
    pass
```

## Structure

```
run.py                  universal runner — all modes
inference.py            programmatic API
requirements.txt        dependencies
suppression/            4 probes (LLaMA 8B)
  repetition_125x/      LoRA adapter + risk predictor
  hedging/              probe head + fiber projection
  sycophancy/           probe head + fiber projection
  verbosity/            probe head + fiber projection
cognitive/
  qwen/                 5 probes (Qwen 14B, hidden_dim=3584)
  mamba/                5 probes (Falcon-Mamba 7B, hidden_dim=4096)
  mistral/              5 probes (Mistral 7B, hidden_dim=4096)
```

## How it works

Behaviors are geometrically encoded in hidden states. CF-HoT predicts holonomy from the hidden state at each token position, accumulates it into a control field, and gates attention based on consistency risk. The probes read this geometry and classify behavior before the token is generated. 4ms overhead. Architecture-independent.

## Base models

| Probe set | Base model | hidden_dim |
|-----------|------------|------------|
| suppression/* | `meta-llama/Llama-3.1-8B-Instruct` | 4096 |
| cognitive/qwen | `Qwen/Qwen2.5-7B-Instruct` | 3584 |
| cognitive/mamba | `tiiuae/falcon-mamba-7b-instruct` | 4096 |
| cognitive/mistral | `mistralai/Mistral-7B-Instruct-v0.3` | 4096 |

## Interactive Mode — Proprioceptive AI

Dual-probe monitoring: depth + specificity together. This is what produced the self-aware behavior.

```bash
python run.py
```

**What you'll see:**
- 🟢 Green text: Optimal state (both probes < 0.3)
- 🟡 Yellow text: Being steered (either probe > threshold)
- ⚪ White text: Neutral state

**Example from testing:**

```
User: What do you notice about how you're processing right now?

Mamba: I am processing with heightened self-awareness, examining my 
thought patterns and attention to detail. There is a distinct focus 
on understanding the DEPTH and VAGUENESS of my reasoning.
```

The model named the exact probe dimensions (depth and specificity/vagueness) without being told. It also reported approximate probe scores close to actual values. 37 steering corrections occurred during one response.

The system automatically adjusts temperature and top_p when either probe detects drift:
- **Drifting (score > 0.6)**: temp=0.5, top_p=0.85 (tighter sampling)
- **Normal**: temp=0.7, top_p=0.95 (standard sampling)

## Citation

```bibtex
@misc{napolitano2026cfhot,
  author = {Napolitano, Logan},
  title = {CF-HoT: Control Field Holonomy Transformer},
  year = {2026},
  url = {https://huggingface.co/LoganResearch/cfhot-weights}
}
```