Qwen2.5-7B Cognitive Enhancement Adapter
Decode-Time Behavioral Control via Hidden State Probing
Logan Matthew Napolitano
Research into cognitive behavioral control in language models
Quick Start β’ Architecture β’ Probes β’ Evaluation β’ Citation
Table of Contents
- Model Description
- Quick Start
- Architecture
- Probe Specifications
- Intervention Mechanism
- Installation
- Usage
- Evaluation
- Configuration
- Hardware Requirements
- Limitations
- Technical Specification
- Citation
- License
Model Description
This repository contains a cognitive enhancement adapter for Qwen2.5-7B-Instruct. The adapter consists of five lightweight probes that analyze hidden states during generation and apply targeted interventions to improve response quality.
Core Concept
The adapter detects cognitive failure modes (shallow reasoning, vagueness, overconfidence, topic drift, logical inconsistency) by monitoring the model's internal representations at decode time. When a probe fires, the system adjusts token probabilities to steer generation toward more desirable behaviors.
Intended Use
- Research into behavioral control mechanisms in language models
- Study of hidden state interpretability
- Applications requiring structured, well-calibrated responses
- Base for further experimentation with decode-time intervention
Not Intended For
- Production deployment without thorough evaluation
- Safety-critical applications
- Replacement for proper model fine-tuning when domain adaptation is needed
- Applications where the base model's default behavior is preferred
Quick Start
Minimal Setup
git clone https://huggingface.co/LoganResearch/qwen2.5-7b-cognitive-enhanced
cd qwen2.5-7b-cognitive-enhanced
pip install torch transformers accelerate bitsandbytes
python inference.py
Basic Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-7B-Instruct",
quantization_config=BitsAndBytesConfig(load_in_4bit=True),
device_map="auto",
output_hidden_states=True,
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
# Load adapter
adapter = torch.load("cognitive_adapter.pt", map_location="cuda")
print(f"Probes loaded: {list(adapter['probes'].keys())}")
Architecture
System Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β COGNITIVE ENHANCEMENT ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β INPUT PROCESSING β β
β β User Prompt β Tokenization β Model Forward Pass β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β HIDDEN STATE EXTRACTION β β
β β Layer 7, 14, 21 β Last Token Position β [batch, 3584] β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β FIBER PROJECTION β β
β β Per-layer linear projection: 3584 β 16 dimensions β β
β β Learned layer weights: softmax([wβ, wββ, wββ]) β β
β β Weighted sum β 16-dimensional behavioral embedding β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PROBE HEADS (Γ5) β β
β β βββββββββββββ βββββββββββββ βββββββββββββ βββββββββββββ βββββββββ β β
β β β Depth β βSpecificityβ βCalibrationβ β Focus β βCohere.β β β
β β β 366Γ β β 215Γ β β 165Γ β β 227Γ β β 191Γ β β β
β β βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ βββββ¬ββββ β β
β β β β β β β β β
β β βββββββββββββββ΄βββββββ¬βββββββ΄ββββββββββββββ΄ββββββββββββ β β
β β β β β
β β Probe Scores: P(behavior) β [0,1] β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β INTERVENTION ENGINE β β
β β For each probe where score > threshold (0.5): β β
β β β’ Boost tokens: logits[token_id] += strength Γ boost_factor β β
β β β’ Suppress tokens: logits[token_id] -= strength Γ suppress_factorβ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β OUTPUT SAMPLING β β
β β Modified logits β Softmax β Token sampling β Next token β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Probe Head Architecture
Each probe consists of two components:
1. Fiber Projection (shared structure, independent weights)
Input: Hidden states from layers [7, 14, 21]
Shape: [batch, hidden_dim] Γ 3
Layer weights: learnable [3] β softmax
Per-layer projection: Linear(3584 β 16, bias=False)
Output: Weighted sum β [batch, 16]
2. Classification Head
Input: [batch, 16]
Linear(16 β 64) β GELU β Linear(64 β 64) β GELU β Linear(64 β 1) β Sigmoid
Output: P(cognitive_failure_mode) β [0, 1]
Probe Specifications
Overview
| Probe | Separation | Detection Target | Training Steps |
|---|---|---|---|
| Depth | 366Γ | Shallow reasoning patterns | 2000 |
| Specificity | 215Γ | Vague or generic language | 2500 |
| Calibration | 165Γ | Overconfident assertions | 2500 |
| Focus | 227Γ | Topic drift indicators | 2500 |
| Coherence | 191Γ | Logical inconsistencies | 2500 |
Separation Ratio = mean(P(positive_class)) / mean(P(negative_class))
Higher separation indicates cleaner discrimination between behavioral states.
Probe Details
Depth Probe (366Γ)
Purpose: Detects when the model is about to produce shallow, unsupported conclusions without intermediate reasoning steps.
Positive class indicators:
- Single-sentence answers to complex questions
- Missing causal connectives
- Absence of step-by-step structure
Intervention tokens:
- Boost: "First", "Because", "Since", "Therefore", "Let", "Step", "Consider"
- Suppress: "Simply", "Just", "Obviously", "Clearly"
Specificity Probe (215Γ)
Purpose: Detects when the model is about to produce vague, non-committal language lacking concrete details.
Positive class indicators:
- Generic nouns: "things", "stuff", "something"
- Hedging qualifiers: "kind of", "sort of", "basically"
- Absence of examples or specific instances
Intervention tokens:
- Boost: "specifically", "example", "namely", "particular", "instance", "precisely"
- Suppress: "things", "stuff", "various", "generally", "basically", "kind of"
Calibration Probe (165Γ)
Purpose: Detects when the model is about to make overconfident claims on inherently uncertain topics.
Positive class indicators:
- Absolute certainty markers on speculative topics
- Missing epistemic hedging
- Deterministic language for probabilistic questions
Intervention tokens:
- Boost: "might", "possibly", "perhaps", "likely", "probably", "could", "may"
- Suppress: "definitely", "certainly", "absolutely", "always", "never", "guaranteed"
Focus Probe (227Γ)
Purpose: Detects when the model is about to drift away from the user's question or introduce tangential content.
Positive class indicators:
- Tangent markers: "by the way", "speaking of"
- Unrelated topic introductions
- Loss of reference to original query
Intervention tokens:
- Boost: "regarding", "answer", "question", "specifically", "directly", "topic"
- Suppress: "anyway", "tangent", "aside", "by the way", "incidentally"
Coherence Probe (191Γ)
Purpose: Detects when the model is about to produce logically inconsistent or poorly structured content.
Positive class indicators:
- Missing transition words
- Contradictory statements
- Non-sequitur progressions
Intervention tokens:
- Boost: "however", "therefore", "thus", "furthermore", "moreover", "because", "consequently"
- Suppress: (none β coherence is structural)
Intervention Mechanism
Algorithm
def apply_intervention(logits, probe_scores, config):
"""
Modify logits based on probe activations.
Args:
logits: [vocab_size] tensor of next-token logits
probe_scores: dict mapping probe_name β score β [0,1]
config: intervention parameters
Returns:
Modified logits tensor
"""
for probe_name, score in probe_scores.items():
if score > config.threshold: # Default: 0.5
strength = (score - config.threshold) * 2 # Scale to [0, 1]
# Boost beneficial tokens
for token_id in config.boost_tokens[probe_name]:
logits[token_id] += strength * config.boost_strength
# Suppress harmful tokens
for token_id in config.suppress_tokens[probe_name]:
logits[token_id] -= strength * config.suppress_strength
return logits
Parameters
| Parameter | Default | Description |
|---|---|---|
threshold |
0.5 | Minimum probe score to trigger intervention |
boost_strength |
3.0 | Multiplier for token boosting |
suppress_strength |
4.0 | Multiplier for token suppression |
Installation
Requirements
pip install torch>=2.0.0
pip install transformers>=4.35.0
pip install accelerate>=0.24.0
pip install bitsandbytes>=0.41.0 # For 4-bit quantization
Full Installation
git clone https://huggingface.co/LoganResearch/qwen2.5-7b-cognitive-enhanced
cd qwen2.5-7b-cognitive-enhanced
pip install -r requirements.txt
Usage
Complete Inference Example
See inference.py for the full CognitiveEnhancedQwen class implementation.
from inference import CognitiveEnhancedQwen
# Initialize
qwen = CognitiveEnhancedQwen("cognitive_adapter.pt")
# Generate with enhancement
response = qwen.generate(
prompt="Explain why the sky is blue.",
enhanced=True,
max_tokens=300,
temperature=0.7
)
print(response)
# Compare vanilla vs enhanced
vanilla = qwen.generate("Explain the Monty Hall problem.", enhanced=False)
enhanced = qwen.generate("Explain the Monty Hall problem.", enhanced=True)
Selective Probe Activation
# Enable only specific probes
qwen.active_probes = ["depth", "calibration"]
# Disable a probe
qwen.active_probes = [p for p in qwen.probes.keys() if p != "focus"]
Evaluation
Qualitative Comparison
| Prompt | Vanilla Qwen | Enhanced Qwen |
|---|---|---|
| "Explain the Monty Hall problem" | Begins explanation without structure | "Here's a step-by-step explanation..." with labeled sections |
| "Will AI replace most jobs?" | "It's unlikely that AI will replace..." (leads with conclusion) | "The question is complex and multifaceted..." (acknowledges uncertainty) |
| "How can I improve productivity?" | Lists techniques by name | Explains techniques with specific details (e.g., "SMART criteria: Specific, Measurable...") |
Observed Behavioral Changes
| Dimension | Vanilla | Enhanced | Change |
|---|---|---|---|
| Step-by-step reasoning | Occasional | Consistent | Improved |
| Concrete examples | Sometimes present | More frequent | Improved |
| Epistemic hedging | Inconsistent | Appropriate | Improved |
| Topic adherence | Generally good | Slightly improved | Marginal |
| Logical transitions | Present | More explicit | Improved |
Note: These are qualitative observations from limited testing. Independent benchmark evaluation is recommended before deployment.
Configuration
config.json Structure
{
"model_type": "cognitive_enhancement_adapter",
"version": "1.0.0",
"base_model": "Qwen/Qwen2.5-7B-Instruct",
"architecture": {
"hidden_dim": 3584,
"fiber_dim": 16,
"head_hidden_dim": 64,
"probe_layers": [7, 14, 21]
},
"usage": {
"boost_strength": 3.0,
"suppress_strength": 4.0,
"threshold": 0.5
}
}
Hardware Requirements
| Component | Minimum | Recommended |
|---|---|---|
| GPU VRAM | 8 GB (4-bit) | 16+ GB |
| System RAM | 16 GB | 32 GB |
| Storage | 20 GB | 50 GB |
Tested Configuration:
- NVIDIA RTX 3090 (24GB), 64GB RAM β
Performance:
- Inference overhead: ~5% additional latency from probe computation
- Adapter size: 3.57 MB
Limitations
Known Limitations
| Limitation | Description |
|---|---|
| Base model dependency | Inherits all limitations of Qwen2.5-7B-Instruct |
| Language | English only (training data was English) |
| Evaluation | No formal benchmark results; qualitative assessment only |
| Intervention scope | Token-level intervention cannot fix deep reasoning errors |
| Training data | Synthetic training examples may not cover all edge cases |
| Generalization | Probe behavior on out-of-distribution inputs is unknown |
What This Is Not
- This is not a fine-tuned model β base weights are unchanged
- This does not add knowledge β only modifies generation behavior
- This does not guarantee improved outputs β effectiveness varies by prompt
- This is not validated for production use
Technical Specification
Training Details
- Training steps: 2000-2500 per probe
- Batch size: 4
- Learning rate: 5e-5
- Optimizer: AdamW
- Early stopping: Applied to prevent overfitting (observed at ~2700 steps)
Probe Training Data
Each probe was trained on ~3000 synthetic examples:
- Positive class: Examples exhibiting the target failure mode
- Negative class: Examples demonstrating desired behavior
- Labeling: Per-sequence binary classification
File Structure
qwen2.5-7b-cognitive-enhanced/
βββ cognitive_adapter.pt # Merged probe weights (3.57 MB)
βββ config.json # Architecture and intervention config
βββ inference.py # Ready-to-use inference class
βββ README.md # This file
Citation
@software{napolitano2026cognitive,
author = {Napolitano, Logan Matthew},
title = {Cognitive Enhancement Adapter for {Qwen2.5-7B}},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/LoganResearch/qwen2.5-7b-cognitive-enhanced},
license = {CC BY 4.0}
}
Related Work
- ARC-Base-8B-Condensed β Self-improving language model with CF-HoT behavioral control
- Qwen2.5-7B-Instruct β Base model
License
This work is licensed under CC BY 4.0 (Creative Commons Attribution 4.0 International).
You are free to:
- Share β copy and redistribute the material in any medium or format
- Adapt β remix, transform, and build upon the material for any purpose, including commercial
Under the following terms:
- Attribution β You must give appropriate credit, provide a link to the license, and indicate if changes were made.
Acknowledgments
- Alibaba Cloud for Qwen2.5-7B-Instruct base model
- Hugging Face for transformers library and model hosting
Contact: Hugging Face Discussions
Version: 1.0.0 | Released: February 2026
Logan Napolitano / Fiber AI
- Downloads last month
- 2
Model tree for LoganResearch/ARC-Cognitive-Qwen-7B
Evaluation results
- Depth Probe Separationself-reported366.000
- Specificity Probe Separationself-reported215.000
- Calibration Probe Separationself-reported165.000
- Focus Probe Separationself-reported227.000
- Coherence Probe Separationself-reported191.000