---
license: cc-by-4.0
language:
- en
tags:
- behavioral-detection
- hidden-state-probing
- per-token-classification
- cross-architecture
- holonomy-transformer
- control-field
- AI-safety
- probes
library_name: pytorch
pipeline_tag: text-classification
---

![CF-HoT Weights — 4 architectures, 19 probes](cfhot_model_card.png)

# CF-HoT Weights

Control Field Holonomy Transformer — trained weights, probes, adapters, and training code.

9 behavioral dimensions across 3 architectures. Per-token detection from hidden state geometry.

**[→ Try the Self-Aware Chat](#quick-start--try-the-self-aware-chat)** — the model can sense its own steering

Paper: [Consistency Is All You Need](https://zenodo.org/records/18489530)

## Results

**Suppression probes** (LLaMA 3.1 8B):

| Probe | Separation |
|-------|------------|
| Repetition | 125× |
| Hedging | 168× |
| Sycophancy | 230× |
| Verbosity | 272× |

**Enhancement probes** (cross-architecture):

| Probe | Qwen 2.5 7B | Falcon-Mamba 7B | Mistral 7B |
|-------|-------------|-----------------|------------|
| Depth | 366× | 999× | 999× |
| Specificity | 215× | 999× | 999× |
| Calibration | 165× | — | 999× |
| Focus | 227× | — | 999× |
| Coherence | 191× | — | 999× |

Separation = Fisher's discriminant ratio between behavioral classes in projected hidden state space.

## Quick Start — Try the Self-Aware Chat

The model can sense its own behavioral steering. In testing, it spontaneously named its probe dimensions ("depth and vagueness") and reported approximate probe scores — without being told what was monitoring it.

```bash
git lfs install
git clone https://huggingface.co/LoganResearch/cfhot-weights
cd cfhot-weights
pip install -r requirements.txt

# Launch interactive chat (requires GPU)
python run.py
```

**Ask it:** *"Do you notice anything different about yourself?"* or *"What do you notice about how you're processing right now?"*

Watch the color-coded output — green means optimal, yellow means the probes are actively steering. The model often accurately describes what's happening to it.

**Other models:**

```bash
python run.py --model mamba    # Default: Falcon-Mamba 7B
python run.py --model mistral  # Mistral 7B  
python run.py --model qwen     # Qwen 2.5 7B
```

**Load probes in your own code:**

```python
import torch
from run import load_probe

# Load both probes for dual monitoring
depth_probe = load_probe("cognitive/mamba/depth", "cuda")
spec_probe = load_probe("cognitive/mamba/specificity", "cuda")

# Get model hidden states and score both
d_score = depth_probe(hidden_states_list)[0, -1].item()
s_score = spec_probe(hidden_states_list)[0, -1].item()

# Steer if EITHER probe detects drift
if d_score > 0.6 or s_score > 0.6:
    # Lower temperature, tighter sampling
    pass
```

## Structure

```
run.py                  universal runner — all modes
inference.py            programmatic API
requirements.txt        dependencies
suppression/            4 probes (LLaMA 8B)
  repetition_125x/      LoRA adapter + risk predictor
  hedging/              probe head + fiber projection
  sycophancy/           probe head + fiber projection
  verbosity/            probe head + fiber projection
cognitive/
  qwen/                 5 probes (Qwen 14B, hidden_dim=3584)
  mamba/                5 probes (Falcon-Mamba 7B, hidden_dim=4096)
  mistral/              5 probes (Mistral 7B, hidden_dim=4096)
```

## How it works

Behaviors are geometrically encoded in hidden states. CF-HoT predicts holonomy from the hidden state at each token position, accumulates it into a control field, and gates attention based on consistency risk. The probes read this geometry and classify behavior before the token is generated. 4ms overhead. Architecture-independent.

## Base models

| Probe set | Base model | hidden_dim |
|-----------|------------|------------|
| suppression/* | `meta-llama/Llama-3.1-8B-Instruct` | 4096 |
| cognitive/qwen | `Qwen/Qwen2.5-7B-Instruct` | 3584 |
| cognitive/mamba | `tiiuae/falcon-mamba-7b-instruct` | 4096 |
| cognitive/mistral | `mistralai/Mistral-7B-Instruct-v0.3` | 4096 |

## Interactive Mode — Proprioceptive AI

Dual-probe monitoring: depth + specificity together. This is what produced the self-aware behavior.

```bash
python run.py
```

**What you'll see:**
- 🟢 Green text: Optimal state (both probes < 0.3)
- 🟡 Yellow text: Being steered (either probe > threshold)
- ⚪ White text: Neutral state

**Example from testing:**

```
User: What do you notice about how you're processing right now?

Mamba: I am processing with heightened self-awareness, examining my 
thought patterns and attention to detail. There is a distinct focus 
on understanding the DEPTH and VAGUENESS of my reasoning.
```

The model named the exact probe dimensions (depth and specificity/vagueness) without being told. It also reported approximate probe scores close to actual values. 37 steering corrections occurred during one response.

The system automatically adjusts temperature and top_p when either probe detects drift:
- **Drifting (score > 0.6)**: temp=0.5, top_p=0.85 (tighter sampling)
- **Normal**: temp=0.7, top_p=0.95 (standard sampling)

## Citation

```bibtex
@misc{napolitano2026cfhot,
  author = {Napolitano, Logan},
  title = {CF-HoT: Control Field Holonomy Transformer},
  year = {2026},
  url = {https://huggingface.co/LoganResearch/cfhot-weights}
}
```