File size: 44,219 Bytes

---
license: cc-by-4.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- llama
- dense-responses
- self-improvement
- representation-engineering
- cf-hot
- recursive-self-improvement
base_model: NousResearch/Hermes-3-Llama-3.1-8B
---

<div align="center">

# ARC-Base-8B-Condensed
## Adaptive Recursive Cognition

**A Multi-Loop Self-Stabilizing Language Model with Predictive Control**

*Logan Matthew Napolitano*

[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![Base Model](https://img.shields.io/badge/base-Hermes--3--8B-green.svg)](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B)

*Research into stable self-improving language models*

[Quick Start](#quick-start) • [Architecture](#architecture) • [Commands](#command-reference) • [Technical Specification](#technical-specification) • [Citation](#citation)

</div>

---

## Table of Contents

1. [Model Description](#model-description)
2. [Quick Start](#quick-start)
3. [Architecture](#architecture)
4. [Core Technology](#core-technology)
5. [Command Reference](#command-reference)
6. [Evaluation](#evaluation)
7. [Installation](#installation)
8. [Configuration](#configuration)
9. [Repository Structure](#repository-structure)
10. [Hardware Requirements](#hardware-requirements)
11. [Training From Scratch](#training-from-scratch)
12. [API Reference](#api-reference)
13. [Limitations](#limitations)
14. [Ethical Considerations](#ethical-considerations)
15. [Technical Specification](#technical-specification)
16. [Changelog](#changelog)
17. [Citation](#citation)
18. [License](#license)

---

### Primary Reference

The complete theoretical framework, methodology, and reproducibility details for this model are documented in:

**Napolitano, L. M. (2025). _Controlled Language Models: Decode-Time Behavioral Control and Token Efficiency._**  
Zenodo. https://doi.org/10.5281/zenodo.18344021

This paper should be cited for any academic or technical use of ARC-Base-8B-Condensed.


## Model Description

ARC-Base-8B-Condensed is a fine-tuned version of [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) designed for:

1. **Dense, information-rich responses** — Reduced filler, hedging, and verbosity
2. **Predictive behavioral control** — CF-HoT heads detect and suppress failure modes before they manifest
3. **Recursive self-improvement** — Micro-training with automatic rollback on quality degradation
4. **Mentor-based learning** — Optional consultation with Claude API for continuous improvement

### Intended Use

- Research into self-improving language models
- Applications requiring concise, direct responses
- Study of representation engineering and behavioral control
- Base for further fine-tuning experiments

### Not Intended For

- Production deployment without evaluation
- Safety-critical applications
- Unsupervised autonomous operation
- Applications requiring verbose, elaborative responses

---

## Quick Start

### One-Command Start

```bash
git clone https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed
cd ARC-Base-8B-Condensed
pip install -r requirements.txt
python arc_engine_v29_full.py
```

On first run, the engine will:
1. Download the base model (~16GB)
2. Load the DENSE adapter and CF-HoT heads
3. Initialize all subsystems
4. Present an interactive command prompt

```
═══════════════════════════════════════════════════════════════════════════════
  ARC ENGINE v2.9 - Adaptive Recursive Cognition
  Multi-Loop Self-Stabilizing Language Model
═══════════════════════════════════════════════════════════════════════════════
    DENSE Mode:      ON (CONDENSATOR checkpoint)
    CF-HoT Control:  ON
    CF-HoT 125×:     OFF
    Mentor Mode:     OFF
    Auto-Train:      OFF
    Experience Buffer: 0 examples
═══════════════════════════════════════════════════════════════════════════════

You> hello
Hello. How can I help?

[Quality: 0.82 | Density: 45.2 | Coherence: 0.95 | Tokens: 5]
```

### Minimal Python Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "LoganResearch/ARC-Base-8B-Condensed",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LoganResearch/ARC-Base-8B-Condensed")

prompt = "<|im_start|>user\nExplain gradient descent briefly.<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## Architecture

### System Overview

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         ARC ENGINE ARCHITECTURE                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                         INPUT PROCESSING                             │   │
│  │  User Input → Command Parser → Generate / Tool Execute               │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                         CORE MODEL STACK                             │   │
│  ├─────────────────────────────────────────────────────────────────────┤   │
│  │                                                                       │   │
│  │   Base Model: Hermes-3-Llama-3.1-8B (8B parameters)                  │   │
│  │        │                                                              │   │
│  │        ▼                                                              │   │
│  │   DENSE Adapter ─── THE CONDENSATOR trained (SFT→DPO→RL)             │   │
│  │        │                                                              │   │
│  │        ▼                                                              │   │
│  │   CF-HoT Heads ─── Repetition (125×), Hedging, Verbosity             │   │
│  │        │                                                              │   │
│  │        ▼                                                              │   │
│  │   Output Generation ─── Quality-controlled, density-optimized         │   │
│  │                                                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                       QUALITY EVALUATION                             │   │
│  │  Response → Density Score → Coherence Score → Overall Quality        │   │
│  │                    │                                                  │   │
│  │                    ▼                                                  │   │
│  │  ┌──────────────────────────────────────────────────────────────┐   │   │
│  │  │ Mentor Mode Check: Quality < 0.6 OR Uncertainty > 0.4?       │   │   │
│  │  │      │ Yes                                                    │   │   │
│  │  │      ▼                                                        │   │   │
│  │  │ Consult Claude → Learn from Response → Update Training Buffer │   │   │
│  │  └──────────────────────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      RSI EXPERIENCE BUFFER                           │   │
│  │  Store: prompt, response, quality, domain, difficulty, feedback      │   │
│  │                    │                                                  │   │
│  │         ┌──────────┴──────────┐                                      │   │
│  │         ▼                     ▼                                      │   │
│  │  Auto-Train Trigger?    Dream Cycle?                                 │   │
│  │         │                     │                                      │   │
│  │         ▼                     ▼                                      │   │
│  │  Micro-training        Experience Replay                             │   │
│  │  (25 steps)            (Reinforce learnings)                         │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      VALIDATION & COMMIT                             │   │
│  │  New Quality vs Old Quality → Better? COMMIT : ROLLBACK              │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

### RSI Loop (Recursive Self-Improvement)

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    RECURSIVE SELF-IMPROVEMENT LOOP                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────┐                                                               │
│   │  CHAT   │◄─────────────────────────────────────────────────┐           │
│   └────┬────┘                                                   │           │
│        │                                                        │           │
│        ▼                                                        │           │
│   ┌─────────┐                                                   │           │
│   │ MEASURE │ Calculate quality, density, coherence             │           │
│   └────┬────┘                                                   │           │
│        │                                                        │           │
│        ▼                                                        │           │
│   ┌─────────┐                                                   │           │
│   │ BUFFER  │ Store in experience buffer with metadata          │           │
│   └────┬────┘                                                   │           │
│        │                                                        │           │
│        ▼                                                        │           │
│   ┌──────────────┐                                              │           │
│   │ AUTO-TRIGGER │ Buffer full? Quality threshold? Feedback?    │           │
│   └──────┬───────┘                                              │           │
│          │                                                      │           │
│     Yes  │  No ─────────────────────────────────────────────────┘           │
│          │                                                                  │
│          ▼                                                                  │
│   ┌─────────────┐                                                           │
│   │ MICRO-TRAIN │ 25 steps on high-quality buffer samples                   │
│   └──────┬──────┘                                                           │
│          │                                                                  │
│          ▼                                                                  │
│   ┌─────────────┐                                                           │
│   │  VALIDATE   │ Compare new model vs checkpoint                           │
│   └──────┬──────┘                                                           │
│          │                                                                  │
│     ┌────┴────┐                                                             │
│     │         │                                                             │
│  Better?   Worse?                                                           │
│     │         │                                                             │
│     ▼         ▼                                                             │
│  COMMIT    ROLLBACK                                                         │
│     │         │                                                             │
│     └────┬────┘                                                             │
│          │                                                                  │
│          ▼                                                                  │
│   Continue ─────────────────────────────────────────────────────────────────┘
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Mentor Mode Flow

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         MENTOR MODE LEARNING FLOW                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   User Prompt                                                               │
│        │                                                                    │
│        ▼                                                                    │
│   ┌─────────────────┐                                                       │
│   │ Local Generation │ Generate response with local 8B model                │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐                                                       │
│   │ Quality Check   │ Evaluate density, coherence, quality                  │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌────────────────────────────────────┐                                    │
│   │ Quality < 0.6 OR Uncertainty > 0.4 │                                    │
│   └────────┬───────────────────────────┘                                    │
│            │                                                                │
│       Yes  │  No ──────────► Return local response                          │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐                                                       │
│   │ Consult Claude  │ Via API                                               │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐                                                       │
│   │ Create DPO Pair │                                                       │
│   │ chosen: Claude  │                                                       │
│   │ rejected: Local │                                                       │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐                                                       │
│   │ Add to Buffer   │ High-quality experience for training                  │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   Return Claude's response + log learning                                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

---

## Core Technology

### 1. CF-HoT: Control-Field Holonomy

Predictive control through hidden-state monitoring. Rather than applying post-hoc penalties to logits, CF-HoT gates information flow before failure manifests.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                        CF-HoT ARCHITECTURE                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Hidden States (Layers 16-24)                                              │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐                                                       │
│   │ Fiber Projection │ Compress to d=16 per layer                          │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐                                                       │
│   │ Layer Attention  │ Weighted aggregation across layers                   │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   ┌─────────────────┐                                                       │
│   │ Risk Predictor   │ Binary classifier: P(unwanted_behavior)              │
│   └────────┬────────┘                                                       │
│            │                                                                │
│            ▼                                                                │
│   If P > threshold ──► Apply logit penalties                                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

**Head Performance:**

| Head | Separation | Description |
|------|------------|-------------|
| Repetition | 125× | Detects impending repetitive loops |
| Hedging | 1.5× | Blocks uncertainty markers |
| Verbosity | 2.1× | Suppresses filler content |

The repetition head achieves 125× separation between positive (pre-repetition) and negative (diverse output) hidden states, enabling reliable early warning.

### 2. The Condensator: Dense Response Training

4-stage training pipeline:

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                       THE CONDENSATOR PIPELINE                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  STAGE 1: Supervised Fine-Tuning (SFT)                                      │
│  ─────────────────────────────────────                                      │
│  • 847 curated dense response examples                                      │
│  • Learning rate: 2e-5                                                      │
│  • Epochs: 3                                                                │
│                                                                             │
│  STAGE 2: Direct Preference Optimization (DPO)                              │
│  ─────────────────────────────────────────────                              │
│  • Preference pairs: dense (chosen) vs verbose (rejected)                   │
│  • Beta: 0.1                                                                │
│  • Epochs: 2                                                                │
│                                                                             │
│  STAGE 3: Reinforcement Learning (PPO)                                      │
│  ─────────────────────────────────────                                      │
│  • Reward = quality_score - length_penalty                                  │
│  • Conservative KL constraint                                               │
│  • Learning rate: 1e-6                                                      │
│                                                                             │
│  STAGE 4: Checkpointing                                                     │
│  ─────────────────────                                                      │
│  • Save every 25 steps                                                      │
│  • A/B comparison on held-out prompts                                       │
│  • Automatic rollback if quality drops                                      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

### 3. Enhanced CF-HoT Parameters

| Parameter | Value | Reason |
|-----------|-------|--------|
| EMA Momentum | 0.995 | Stable control field |
| Gate Temperature | 2.0 | Softer sigmoid |
| Gate Bounds | [0.1, 0.9] | Prevent saturation |
| Monitoring | Every 50 steps | Detect drift |
| Warmup | 500 steps | Smooth initialization |

---

## Command Reference

### Core Commands

| Command | Description |
|---------|-------------|
| `status` | System status overview |
| `help` | Full command menu |
| `help <topic>` | Topic-specific help |
| `quit` | Exit |

### Self-Improvement

| Command | Description |
|---------|-------------|
| `!improve` | Run improvement iteration |
| `!eval` | Full evaluation |
| `!train <steps>` | Training steps |
| `!compare` | Compare checkpoints |
| `!rollback` | Revert to best checkpoint |
| `!load <path>` | Load checkpoint |
| `!benchmark` | Evaluation suite |

### Mentor Mode

| Command | Description |
|---------|-------------|
| `!mentor` | Show mentor mode status |
| `!mentor on` | Enable auto-consultation |
| `!mentor off` | Disable mentor mode |
| `!mentor ask <question>` | Ask Claude and learn from response |
| `!mentor learn` | Show collected learnings |

### RSI (Recursive Self-Improvement)

| Command | Description |
|---------|-------------|
| `!auto_train on` | Enable learning during chat |
| `!auto_train off` | Disable auto-training |
| `!skills` | Quality per domain |
| `!forgetting` | Detect catastrophic forgetting |
| `!dream` | Force experience replay |
| `!buffer` | Experience buffer stats |
| `!selfplay <N>` | Run N self-play iterations |

### Condensator

| Command | Description |
|---------|-------------|
| `!condensator` | Run full SFT→DPO→RL pipeline |
| `!dpo` | Run DPO stage only |
| `!rl` | Run RL stage only |
| `!train_cfhot` | Train CF-HoT heads |

### CF-HoT Control

| Command | Description |
|---------|-------------|
| `!cfhot` / `!125x` | Toggle 125× head |
| `!cfhot status` | Head status |
| `!gate_stats` | CF-HoT gate health |

### Generation Modes

| Command | Description |
|---------|-------------|
| `!book` | Toggle book mode (16K tokens) |
| `!write <topic>` | Write extended content |
| `!claude <prompt>` | Direct Claude API prompt |

### Tools

| Command | Description |
|---------|-------------|
| `!shell <cmd>` | Execute shell command |
| `!python <code>` | Execute Python |
| `!read <path>` | Read file |
| `!write <path> <content>` | Write file |
| `!search <query>` | Web search |
| `!fetch <url>` | Fetch URL content |

### Browser (requires Playwright)

| Command | Description |
|---------|-------------|
| `!browse <url>` | Open URL |
| `!click <selector>` | Click element |
| `!type <text>` | Type text |
| `!read` | Read page content |

### Multimedia (optional dependencies)

| Command | Description |
|---------|-------------|
| `!stream` | Open live token window |
| `!audio` / `!tts` | Toggle text-to-speech |
| `!imagine <prompt>` | Generate image (SDXL) |
| `!dalle <prompt>` | Generate image (DALL-E 3) |

### Experimental Features

| Command | Description |
|---------|-------------|
| `!content blog <topic>` | Generate blog post |
| `!content youtube <topic>` | Generate video script |

---

## Evaluation

### Qualitative Comparison

| Prompt | Base Hermes-3 | ARC-Condensed |
|--------|---------------|---------------|
| "hello" | "Hello! I'm here to help you with any questions or tasks you might have. Feel free to ask me anything!" (23 tokens) | "Hello. How can I help?" (5 tokens) |
| "What is recursion?" | "That's a great question! Recursion is a programming concept where a function calls itself..." (150+ tokens) | "Function calling itself until base case. Stack frames accumulate, unwind on return." (12 tokens) |
| "How are you?" | "As an AI, I don't have feelings in the traditional sense, but I'm functioning well..." (25 tokens) | "Functional. Task?" (3 tokens) |

### Quantitative Metrics

| Metric | Base Model | ARC-Condensed | Change |
|--------|------------|---------------|--------|
| Avg. Response Length | 150 tokens | 45 tokens | -70% |
| Filler Phrases | Present | Minimal | ~-95% |
| Information Density | 17.0 | 45.2 | +166% |
| Quality Score (internal) | 0.52 | 0.78 | +50% |

**Note:** These are heuristic metrics from internal evaluation. Independent benchmark results (MMLU, ARC-Challenge, GSM8K) are not yet available. We welcome independent evaluation.

### Self-Improvement Trajectory (Observed)

```
Iteration 0:  Quality 0.52 (baseline)
Iteration 5:  Quality 0.68 (+31%)
Iteration 10: Quality 0.75 (+44%)
Iteration 15: Quality 0.78 (+50%, plateau)
```

Self-improvement shows diminishing returns after ~15 iterations. This is expected behavior, not a limitation to work around.

---

## Installation

### Minimal Installation

```bash
pip install torch transformers accelerate peft bitsandbytes datasets trl
```

### Full Installation

```bash
pip install -r requirements.txt
```

### Optional Dependencies

```bash
# Browser automation
pip install playwright && playwright install firefox

# Image generation
pip install diffusers pillow

# Text-to-speech
pip install pyttsx3 gTTS pygame

# Claude API (for mentor mode)
pip install anthropic

# OpenAI API (for DALL-E)
pip install openai

# Web search
pip install requests
```

### Environment Variables

```bash
# Optional - for enhanced features
export ANTHROPIC_API_KEY="sk-ant-..."  # Mentor Mode
export OPENAI_API_KEY="sk-..."          # DALL-E
```

---

## Configuration

### Main Configuration

```python
class Config:
    # Generation
    temperature = 0.85
    top_p = 0.9
    max_new_tokens = 512
    repetition_penalty = 1.1
    
    # CF-HoT
    use_cfhot = True
    use_cfhot_125x = False
    cfhot_repetition_threshold = 0.6
    cfhot_repetition_penalty = 6.0
    
    # Self-improvement
    min_quality_score = 0.5
    target_quality_score = 0.75
    training_steps_per_iteration = 25
    quality_drop_threshold = 0.1
```

### RSI Configuration

```python
@dataclass
class RSIConfig:
    auto_train_enabled: bool = False
    buffer_size: int = 1000
    min_experiences_to_train: int = 50
    quality_threshold_for_training: float = 0.7
    dream_cycle_interval: int = 100
    forgetting_check_interval: int = 50
```

### Mentor Configuration

```python
@dataclass
class MentorConfig:
    enabled: bool = False
    auto_consult_threshold: float = 0.6
    uncertainty_threshold: float = 0.4
    learn_from_responses: bool = True
```

---

## Repository Structure

```
ARC-Base-8B-Condensed/
│
├── arc_engine_v29_full.py       # Main engine
├── README.md                     # This file
├── requirements.txt              # Dependencies
│
├── model-00001-of-00004.safetensors  # Model weights
├── model-00002-of-00004.safetensors
├── model-00003-of-00004.safetensors
├── model-00004-of-00004.safetensors
├── config.json
├── tokenizer.json
├── tokenizer_config.json
├── special_tokens_map.json
├── generation_config.json
│
├── dense_checkpoints/            # Training checkpoints
│   └── step_*/
│
├── cfhot_checkpoints/            # CF-HoT heads
│   └── final_6000/
│       └── risk_predictor.pt
│
├── improvement_logs/             # RSI logs
└── exports/                      # Checkpoint exports
```

---

## Hardware Requirements

| Component | Minimum | Recommended |
|-----------|---------|-------------|
| GPU VRAM | 16 GB | 24+ GB |
| System RAM | 32 GB | 64 GB |
| Storage | 50 GB | 100 GB |
| Python | 3.10+ | 3.11 |

**Tested Configurations:**
- NVIDIA RTX 3090 (24GB), 64GB RAM ✓
- NVIDIA RTX 4090 (24GB), 128GB RAM ✓
- NVIDIA A100 (40GB) ✓

**Performance Estimates:**
- Inference: ~15-25 tokens/second
- Full Condensator pipeline: ~4 hours (RTX 3090)
- Self-improvement iteration: ~30 minutes

---

## Training From Scratch

### Automated Training

```bash
python arc_engine_v29_full.py
> !condensator
```

This runs:
1. SFT (3 epochs)
2. DPO (2 epochs)  
3. RL (300 steps)
4. Checkpoint validation

### Manual Training

**Step 1: Train CF-HoT Heads**
```
> !train_cfhot
```

**Step 2: Run Condensator**
```
> !condensator
```

**Step 3: Self-Improvement**
```
> !selfplay 1000
```

---

## API Reference

### Start Server

```
> !api
[api] Server running on http://0.0.0.0:8080
```

### Endpoints

#### POST /generate

```bash
curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is recursion?"}'
```

Response:
```json
{
  "response": "Function calling itself until base case.",
  "quality": 0.82,
  "density": 48.3,
  "tokens": 8
}
```

#### GET /health

```bash
curl http://localhost:8080/health
```

---

## Limitations

### Known Limitations

| Limitation | Description |
|------------|-------------|
| **Scale** | Tested on 8B parameters only; scaling behavior unknown |
| **Language** | English only |
| **Benchmarks** | No formal benchmark results (MMLU, GSM8K, etc.) |
| **Terseness** | May be too concise for applications requiring elaboration |
| **Iterations** | Self-improvement plateaus after ~15 iterations |
| **Memory** | Full features require 16GB+ VRAM |

### What This Is Not

- This is **not** AGI or a path to AGI
- This is **not** a production-ready system
- Self-improvement is **bounded and reversible**
- The model **requires human oversight**
- Claims are **not independently validated**

---

## Ethical Considerations

### Safety Measures

- **Quality gates:** All self-modification requires quality validation
- **Automatic rollback:** Degradation triggers checkpoint restoration
- **Bounded improvement:** No unbounded recursive self-modification
- **Human oversight:** System designed for interactive use, not autonomy

### Potential Risks

- Dense responses may omit important caveats or safety information
- Self-improvement research requires careful monitoring
- Model inherits biases from base Hermes-3 and training data
- Experimental features should not be used for consequential decisions

### Explicit Non-Goals

This system is **not designed for:**
- Autonomous operation without human oversight
- Self-replication or self-preservation
- Deception or manipulation
- Capability acquisition beyond defined scope

---

## Technical Specification

Full technical documentation is available:

- **Primary Reference (Master Book):**  
  [Controlled Language Models: Decode-Time Behavioral Control and Token Efficiency](https://doi.org/10.5281/zenodo.18344021)

- **Related Preprints:**
  - [From Explicit Holonomy to Latent Control Fields](https://zenodo.org/records/14707164)
  - [The Holonomy Transformer](https://zenodo.org/records/14707081)

The specification covers:
- Multi-loop training architecture
- Control field theory and implementation
- Tokenization co-evolution (fourth loop)
- Reliability engineering and rollback protocols
- Reproducibility requirements


---

## Changelog

### v2.9 (Current)
- Stealth web browser for research
- Improved training functions
- Bug fixes for selfplay training loop

### v2.8
- Full RSI continuous learning system
- Auto-train during chat
- Dream cycles for experience replay
- Domain-specific skill tracking
- Catastrophic forgetting detection

### v2.4
- Mentor Mode: Learn from Claude API
- Content generation tools
- Smart help system

### v2.2
- Full CONDENSATOR pipeline
- Enhanced CF-HoT with EMA, gate temperature
- DPO and RL training stages

### v2.0
- Initial release
- CF-HoT 125× repetition head
- Dense response training
- Basic self-improvement loop

---

## Citation
```bibtex
@software{napolitano2025arc,
  author       = {Napolitano, Logan Matthew},
  title        = {{ARC-Base-8B-Condensed}: Adaptive Recursive Cognition for Self-Stabilizing Language Models},
  year         = {2025},
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed},
  note         = {Technical specification available on Zenodo},
  license      = {CC BY 4.0}
}
```
```bibtex
@article{napolitano2025controlled,
  author       = {Napolitano, Logan Matthew},
  title        = {Controlled Language Models: Decode-Time Behavioral Control and Token Efficiency},
  year         = {2025},
  doi          = {10.5281/zenodo.18344021},
  url          = {https://zenodo.org/records/18344021},
  publisher    = {Zenodo},
  note         = {Primary technical reference for ARC-Base-8B-Condensed}
}
```
```bibtex
@article{napolitano2025controlfield,
  author       = {Napolitano, Logan Matthew},
  title        = {From Explicit Holonomy to Latent Control Fields},
  year         = {2025},
  doi          = {10.5281/zenodo.14707164},
  url          = {https://zenodo.org/records/14707164},
  publisher    = {Zenodo}
}
```

## References

1. Zou, A., et al. (2023). Representation Engineering: A Top-Down Approach to AI Transparency. arXiv:2310.01405
2. Rafailov, R., et al. (2023). Direct Preference Optimization. arXiv:2305.18290
3. Hu, E. J., et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685
4. Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. NeurIPS.

---

## Acknowledgments

- **NousResearch** for Hermes-3-Llama-3.1-8B base model
- **Meta AI** for Llama 3.1 architecture
- **Hugging Face** for transformers, PEFT, TRL
- **Anthropic** for Claude API (Mentor Mode)

---

## License

This work is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) (Creative Commons Attribution 4.0 International).

You are free to:
- **Share** — copy and redistribute the material in any medium or format
- **Adapt** — remix, transform, and build upon the material for any purpose, including commercial

Under the following terms:
- **Attribution** — You must give appropriate credit, provide a link to the license, and indicate if changes were made.

---

<div align="center">

**Contact:** [GitHub Issues](https://github.com/LoganResearch/ARC-Base-8B-Condensed/issues) | [Hugging Face Discussions](https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed/discussions)

**Version:** 2.9 | **Last Updated:** January 2025

</div>