Instructions to use LoganResearch/ARC-Cognitive-Qwen-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LoganResearch/ARC-Cognitive-Qwen-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LoganResearch/ARC-Cognitive-Qwen-7B")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("LoganResearch/ARC-Cognitive-Qwen-7B", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LoganResearch/ARC-Cognitive-Qwen-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LoganResearch/ARC-Cognitive-Qwen-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LoganResearch/ARC-Cognitive-Qwen-7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/LoganResearch/ARC-Cognitive-Qwen-7B

SGLang

How to use LoganResearch/ARC-Cognitive-Qwen-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LoganResearch/ARC-Cognitive-Qwen-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LoganResearch/ARC-Cognitive-Qwen-7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LoganResearch/ARC-Cognitive-Qwen-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LoganResearch/ARC-Cognitive-Qwen-7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use LoganResearch/ARC-Cognitive-Qwen-7B with Docker Model Runner:
```
docker model run hf.co/LoganResearch/ARC-Cognitive-Qwen-7B
```

Qwen2.5-7B Cognitive Enhancement Adapter

Decode-Time Behavioral Control via Hidden State Probing

Logan Matthew Napolitano

Research into cognitive behavioral control in language models

Quick Start • Architecture • Probes • Evaluation • Citation

Model Description
Quick Start
Architecture
Probe Specifications
Intervention Mechanism
Installation
Usage
Evaluation
Configuration
Hardware Requirements
Limitations
Technical Specification
Citation
License

Model Description

This repository contains a cognitive enhancement adapter for Qwen2.5-7B-Instruct. The adapter consists of five lightweight probes that analyze hidden states during generation and apply targeted interventions to improve response quality.

Core Concept

The adapter detects cognitive failure modes (shallow reasoning, vagueness, overconfidence, topic drift, logical inconsistency) by monitoring the model's internal representations at decode time. When a probe fires, the system adjusts token probabilities to steer generation toward more desirable behaviors.

Intended Use

Research into behavioral control mechanisms in language models
Study of hidden state interpretability
Applications requiring structured, well-calibrated responses
Base for further experimentation with decode-time intervention

Not Intended For

Production deployment without thorough evaluation
Safety-critical applications
Replacement for proper model fine-tuning when domain adaptation is needed
Applications where the base model's default behavior is preferred

Quick Start

Minimal Setup

git clone https://huggingface.co/LoganResearch/qwen2.5-7b-cognitive-enhanced
cd qwen2.5-7b-cognitive-enhanced
pip install torch transformers accelerate bitsandbytes
python inference.py

Basic Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    device_map="auto",
    output_hidden_states=True,
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Load adapter
adapter = torch.load("cognitive_adapter.pt", map_location="cuda")
print(f"Probes loaded: {list(adapter['probes'].keys())}")

Architecture

System Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                    COGNITIVE ENHANCEMENT ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                         INPUT PROCESSING                             │   │
│  │  User Prompt → Tokenization → Model Forward Pass                     │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      HIDDEN STATE EXTRACTION                         │   │
│  │  Layer 7, 14, 21 → Last Token Position → [batch, 3584]              │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                       FIBER PROJECTION                               │   │
│  │  Per-layer linear projection: 3584 → 16 dimensions                   │   │
│  │  Learned layer weights: softmax([w₇, w₁₄, w₂₁])                     │   │
│  │  Weighted sum → 16-dimensional behavioral embedding                  │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                        PROBE HEADS (×5)                              │   │
│  │  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────┐ │   │
│  │  │   Depth   │ │Specificity│ │Calibration│ │   Focus   │ │Cohere.│ │   │
│  │  │   366×    │ │   215×    │ │   165×    │ │   227×    │ │ 191×  │ │   │
│  │  └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └───┬───┘ │   │
│  │        │             │             │             │           │     │   │
│  │        └─────────────┴──────┬──────┴─────────────┴───────────┘     │   │
│  │                             │                                       │   │
│  │                     Probe Scores: P(behavior) ∈ [0,1]              │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      INTERVENTION ENGINE                             │   │
│  │  For each probe where score > threshold (0.5):                       │   │
│  │    • Boost tokens: logits[token_id] += strength × boost_factor      │   │
│  │    • Suppress tokens: logits[token_id] -= strength × suppress_factor│   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                         OUTPUT SAMPLING                              │   │
│  │  Modified logits → Softmax → Token sampling → Next token            │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Probe Head Architecture

Each probe consists of two components:

1. Fiber Projection (shared structure, independent weights)

Input: Hidden states from layers [7, 14, 21]
       Shape: [batch, hidden_dim] × 3

Layer weights: learnable [3] → softmax
Per-layer projection: Linear(3584 → 16, bias=False)
Output: Weighted sum → [batch, 16]

2. Classification Head

Input: [batch, 16]
Linear(16 → 64) → GELU → Linear(64 → 64) → GELU → Linear(64 → 1) → Sigmoid
Output: P(cognitive_failure_mode) ∈ [0, 1]

Probe Specifications

Overview

Probe	Separation	Detection Target	Training Steps
Depth	366×	Shallow reasoning patterns	2000
Specificity	215×	Vague or generic language	2500
Calibration	165×	Overconfident assertions	2500
Focus	227×	Topic drift indicators	2500
Coherence	191×	Logical inconsistencies	2500

Separation Ratio = mean(P(positive_class)) / mean(P(negative_class))

Higher separation indicates cleaner discrimination between behavioral states.

Probe Details

Depth Probe (366×)

Purpose: Detects when the model is about to produce shallow, unsupported conclusions without intermediate reasoning steps.