LoganResearch
/

ARC-Base-8B-Condensed

@@ -1,249 +1,293 @@
-# 🧠 ARC-enabled LLaMA-3.1-8B
-## Adaptive Response Control via CF-HoT
-**"Making an 8B Behave Like an 80B"**
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
-[![PyTorch 2.0+](https://img.shields.io/badge/pytorch-2.0+-red.svg)](https://pytorch.org/)
 ---
-## 🔥 What is ARC?
-**ARC (Adaptive Response Control)** is a decode-time intervention system that detects and suppresses behavioral failure modes in language models:
-| Pattern | Detection | Effect |
-|---------|-----------|--------|
-| **Repetition** | 125× separation | Eliminates loops |
-| **Hedging** | 1.5× separation | Reduces "As an AI..." |
-| **Verbosity** | 2.1× separation | Cuts filler phrases |
-ARC uses lightweight prediction heads (~5K parameters each) trained on model hidden states. At inference time, these heads detect when the model is about to produce problematic patterns and intervene by modifying the logit distribution.
-**Result:** An 8B model that produces output quality comparable to models 10× its size.
 ---
-## 📊 Results
-### Token Efficiency
-| Model | Hedging Phrases | Filler Phrases | Useful Content |
-|-------|-----------------|----------------|----------------|
-| Base LLaMA-3.1-8B | 2-3 per response | 15-25% of tokens | ~60% |
-| ARC-enabled | 0-1 per response | <5% of tokens | ~90% |
-### Example Comparison
-**Prompt:** "hello"
-| Base Model | ARC-enabled |
-|------------|-------------|
-| "Hello! I'm an AI assistant. How can I help you today? I'm happy to assist with any questions!" | "Hello. System active. How can I help?" |
-| 23 tokens, 2 hedges | 8 tokens, 0 hedges |
 ---
-## 🚀 Quick Start
-```bash
-# Clone repository
-git clone https://github.com/yourusername/arc-llama-8b
-cd arc-llama-8b
-# Install dependencies
-pip install torch transformers peft bitsandbytes
-# Run interactive dual-terminal UI
-python arc_llama_8b.py
-# Or run benchmarks
-python arc_benchmark.py
-```
 ---
-## 🎮 Dual Terminal UI
-The interactive interface shows Base vs ARC responses side-by-side:
-```
-╔══════════════════════════════════════════════════════════════════════════════╗
-║                          ARC-enabled LLaMA-3.1-8B                            ║
-║                    Adaptive Response Control via CF-HoT                      ║
-╠═══════════════════════════════════════╦══════════════════════════════════════╣
-║       ○ BASE LLaMA-3.1-8B             ║         ◉ ARC-enabled                ║
-╠───────────────────────────────────────╬──────────────────────────────────────╣
-║ Hello! I'm an AI assistant created    ║ Hello. How can I assist you today?   ║
-║ to help you. I'm here to assist with  ║                                      ║
-║ any questions or tasks you might      ║                                      ║
-║ have. How can I help you today?       ║                                      ║
-╠───────────────────────────────────────╬──────────────────────────────────────╣
-║ 34 tok | 245ms | 2 hedges             ║ 9 tok | 201ms | 3 ARC interventions  ║
-╠══════════════════════════════════════════════════════════════════════════════╣
-║  [Enter] Send | [/arc] ARC only | [/base] Base only | [/dual] Compare        ║
-╚══════════════════════════════════════════════════════════════════════════════╝
-```
-### Commands
-| Command | Description |
-|---------|-------------|
-| `/dual` | Side-by-side comparison (default) |
-| `/arc` | ARC-enabled output only |
-| `/base` | Base model output only |
-| `/help` | Show help |
-| `/quit` | Exit |
 ---
-## 🏗️ Architecture
-```
-┌─────────────────────────────────────────────────────────────┐
-│                 LLaMA-3.1-8B (frozen)                       │
-│                          ↓                                  │
-│                   Hidden States                             │
-│              [32 layers × 4096 dims]                        │
-│                          ↓                                  │
-│            FIBER PROJECTIONS (shared)                       │
-│              [32 × 16 = 512 features]                       │
-│                          ↓                                  │
-│         ┌────────────┬────────────┬────────────┐           │
-│         │ Repetition │  Hedging   │ Verbosity  │           │
-│         │   Head     │   Head     │   Head     │           │
-│         │   (5.3K)   │   (5.3K)   │   (5.3K)   │           │
-│         └────────────┴────────────┴────────────┘           │
-│                          ↓                                  │
-│              INTERVENTION ENGINE                            │
-│         (modify logits based on risk scores)                │
-│                          ↓                                  │
-│                 SAMPLE NEXT TOKEN                           │
-└─────────────────────────────────────────────────────────────┘
-```
-### Intervention Logic
-```python
-# Each token generation step:
-risks = arc.get_risks(hidden_states)
-if risks['repetition'] > 0.7:
-    # Suppress recently used tokens
-    logits[recent_tokens] -= 5.0
-if risks['hedging'] > 0.6:
-    # Suppress hedge phrase starters
-    logits[hedge_tokens] -= 3.0
-if risks['verbosity'] > 0.65:
-    # Suppress filler phrase starters
-    logits[verbose_tokens] -= 2.0
 ```
-### Overhead
-- **Latency:** <1% increase
-- **Memory:** ~16MB for all heads
-- **Compute:** Parallel head inference
----
-## 📁 Repository Structure
-```
-arc-llama-8b/
-├── arc_llama_8b.py          # Main inference with dual-terminal UI
-├── arc_benchmark.py         # Benchmarking script
-├── README.md
-│
-├── results/
-│   ├── cfhot_risk_v2/
-│   │   └── ckpt_5000/       # Repetition head + fiber projections
-│   └── multi_head_v2/
-│       ├── hedging_head/    # Hedging detection head
-│       └── verbosity_head/  # Verbosity detection head
-│
-└── docs/
-    ├── ARC_Technical_Report.pdf
-    └── CF-HoT_Paper.pdf
-```
 ---
-## 🔬 Technical Details
-### CF-HoT: Contrastive Fiber Heads-on-Thought
-ARC is built on CF-HoT, a technique for training lightweight behavioral classifiers on transformer hidden states:
-1. **Fiber Projections:** Linear projections from each layer's hidden state to a low-dimensional "fiber" space (16 dims)
-2. **Contrastive Training:** Heads trained to distinguish between "good" and "bad" behavioral examples
-3. **Layer Aggregation:** Learned weighted combination of all layers' fiber projections
-4. **Real-time Inference:** Heads run in parallel during generation with negligible overhead
-### Training Data
-| Head | Positive Examples | Negative Examples |
-|------|-------------------|-------------------|
-| Repetition | Tokens that repeat recent context | Novel tokens |
-| Hedging | "As an AI...", "I cannot..." | Direct statements |
-| Verbosity | "Let me explain...", "Basically..." | Concise phrases |
-### Separation Metrics
-| Head | Mean Positive | Mean Negative | Separation |
-|------|---------------|---------------|------------|
-| Repetition | 0.94 | 0.0075 | **125×** |
-| Hedging | 0.62 | 0.41 | **1.5×** |
-| Verbosity | 0.68 | 0.32 | **2.1×** |
 ---
-## 📝 Citation
-```bibtex
-@software{arc_llama_2026,
-  title={ARC-enabled LLaMA-3.1-8B: Adaptive Response Control via CF-HoT},
-  author={Anonymous},
-  year={2026},
-  url={https://github.com/yourusername/arc-llama-8b}
-}
-```
 ---
-## 🔗 Links
-- **HuggingFace Model:** [Coming soon]
-- **Zenodo DOI:** [Coming soon]
-- **Paper:** [Coming soon]
 ---
-## ⚠️ Limitations
-- ARC modifies model behavior at decode-time only
-- Intervention thresholds may need tuning for different use cases
-- Currently optimized for LLaMA-3.1 architecture
-- Heads trained on English text
 ---
-## 📜 License
-MIT License - See LICENSE file for details.
 ---
-## 🙏 Acknowledgments
-Built on:
-- [LLaMA-3.1](https://llama.meta.com/) by Meta
-- [Transformers](https://huggingface.co/transformers/) by Hugging Face
-- [PEFT](https://github.com/huggingface/peft) for efficient fine-tuning

+---
+license: cc-by-4.0
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- llama
+- dense-responses
+- self-optimization
+- representation-engineering
+base_model: NousResearch/Hermes-3-Llama-3.1-8B
+---
+![ARC Banner](banner.svg)
+# ARC: Adaptive Recursive Cognition
+A closed-loop control system that uses internal state predictability to improve response efficiency without collapsing.
+**Author:** Logan Matthew Napolitano
+**Base Model:** NousResearch/Hermes-3-Llama-3.1-8B
+**License:** CC BY 4.0
+**Code:** 7,111 lines | **Weights:** ~6.5 GB
 ---
+## Quick Start
+```bash
+git clone https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed
+cd ARC-Base-8B-Condensed
+pip install torch transformers peft bitsandbytes accelerate trl chromadb sentence-transformers
+python ubermenschetien_v2_full.py
+```
+That's it. The engine handles CF-HoT steering, dense generation, everything.
+### Commands
+```
+> hello                    # Chat
+> !improve                 # Start self-improvement loop
+> !eval                    # Evaluate current model
+> !status                  # Show system status
+> !shell <cmd>             # Execute shell command
+> !python <code>           # Execute Python
+```
 ---
+## Overview
+### What This Is
+Bounded self-optimization of response quality. The model iteratively improves its own outputs within well-defined parameters—multi-metric evaluation, conservative training, automatic rollback.
+Most self-improvement demos collapse within 1-3 iterations. This one doesn't, and the logs prove it.
+### What This Is Not
+- Not AGI or open-ended self-improvement
+- Cannot modify its own architecture
+- Cannot acquire capabilities beyond training distribution
+- Cannot improve without human-defined metrics and examples
 ---
+## Key Finding: 125× Class Separation
+The CF-HoT repetition head predicts repetitive behavior from hidden states before it occurs:
+| Metric | Value |
+|--------|-------|
+| Score on repetitive text | 0.875 |
+| Score on non-repetitive | 0.007 |
+| Separation ratio | **125×** |
+This is the most important empirical result. The model encodes "I'm about to repeat" as a distinct internal state, detectable before the tokens are generated. This is quantitative, replicable, and implies something real about how the model represents behavioral states.
+| Head | Positive | Negative | Separation |
+|------|----------|----------|------------|
+| Repetition | 0.875 | 0.007 | **125×** |
+| Verbosity | 0.68 | 0.32 | 2.1× |
+| Hedging | 0.58 | 0.39 | 1.5× |
 ---
+## Results
+| Metric | Baseline | ARC | Change |
+|--------|----------|-----|--------|
+| Information Density | 17.0 | 28.5 | +68% |
+| Avg Response Tokens | 150 | 65 | -57% |
+| Filler Phrases | High | ~0 | -95% |
+| Mode Collapse Events | Frequent | Zero | Prevented |
+### Response Examples
+| Prompt | Base Model | ARC |
+|--------|-----------|-----|
+| "hello" | "Hello! I'm here to help you with any questions or tasks you might have. Feel free to ask me anything!" (23 tokens) | "Hello. How can I help?" (5 tokens) |
+| "What is recursion?" | "That's a great question! Recursion is a programming concept where a function calls itself..." (150+ tokens) | "Function self-invocation until base case. Stack frames accumulate, unwind." (18 tokens) |
+| "How are you?" | "As an AI, I don't have feelings in the traditional sense, but I'm functioning well and ready to assist..." (28 tokens) | "Functional and ready. What's the task?" (6 tokens) |
 ---
+## Self-Improvement Stability
+| Iteration | Quality | Coherence | Action | Notes |
+|-----------|---------|-----------|--------|-------|
+| 0 | 0.52 | 0.75 | - | Baseline |
+| 1 | 0.58 | 0.78 | KEEP | Improved |
+| 2 | 0.35 | 0.45 | **ROLLBACK** | Collapse detected, auto-recovered |
+| 3 | 0.61 | 0.80 | KEEP | Continued improving |
+| 4 | 0.59 | 0.79 | KEEP | Stable |
+| 5 | 0.63 | 0.82 | KEEP | Final |
+Iteration 2 collapsed. The system detected it, rolled back, and continued. The safeguards work exactly as designed.
+---
+## System Components
+### 1. CF-HoT (Contrastive Fine-tuning with Hidden-state Oversight Training)
+Real-time behavioral control via representation engineering:
+- Monitors hidden states at each token position
+- Predicts behavioral risks before tokens are generated
+- Applies corrective logit penalties when risk exceeds threshold
+- 125× separation for repetition detection
+### 2. THE CONDENSATOR
+4-stage dense response training:
+```
+Stage 1: SFT    → 53 gold-standard dense examples (Loss: 1.17 → 0.72)
+Stage 2: DPO    → Preference pairs: dense > verbose
+Stage 3: RL     → PPO with calibrated density reward
+Stage 4: Checkpoint → Save every 25 steps, maintain best for rollback
 ```
+Key insight: SFT loss dropped from 1.17 to 0.72, proving actual learning occurred.
+### 3. Stability Loop
+Multi-metric evaluation defeats Goodhart's Law:
+- **Density (0.25)** — information per token
+- **Coherence (0.25)** — grammatical, readable output
+- **Helpfulness (0.25)** — addresses the prompt
+- **Penalties (0.25)** — filler detection, gibberish patterns
+A/B checkpoint comparison with automatic rollback on quality drops > 0.05.
 ---
+## API Integration
+For developers integrating ARC into their own applications:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+base = AutoModelForCausalLM.from_pretrained(
+    "NousResearch/Hermes-3-Llama-3.1-8B",
+    torch_dtype=torch.float16,
+    device_map="auto",
+    load_in_4bit=True
+)
+model = PeftModel.from_pretrained(
+    base,
+    "LoganResearch/ARC-Base-8B-Condensed",
+    subfolder="dense_checkpoints/step_100"
+)
+tokenizer = AutoTokenizer.from_pretrained("NousResearch/Hermes-3-Llama-3.1-8B")
+prompt = "<|im_start|>user\nWhat is recursion?<|im_end|>\n<|im_start|>assistant\n"
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+output = model.generate(**inputs, max_new_tokens=50)
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+```
+Note: For full dense output with CF-HoT steering, use the main engine (`ubermenschetien_v2_full.py`).
+---
+## Training From Scratch
+```bash
+git clone https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed
+cd ARC-Base-8B-Condensed
+pip install torch transformers peft bitsandbytes accelerate trl chromadb sentence-transformers
+# Full pipeline (~4 hours on RTX 3090)
+python training_scripts/quickstart.py --full
+# Or step by step:
+python training_scripts/train_cfhot_head.py --behavior repetition --steps 5000
+python training_scripts/the_condensator.py --sft-epochs 3 --rl-steps 300
+python training_scripts/train_self_improve.py --iterations 5
+```
+---
+## Repository Structure
+```
+ARC-Base-8B-Condensed/
+├── ubermenschetien_v2_full.py           # Main engine (2,055 lines)
+├── ubermenschetien_agentic_full.py      # Agentic variant (1,589 lines)
+├── ubermenschetien_heaven_engine_dense.py
+│
+├── training_scripts/
+│   ├── the_condensator.py               # 4-stage training (797 lines)
+│   ├── train_cfhot_head.py              # CF-HoT training (546 lines)
+│   ├── train_self_improve.py            # Self-improvement loop (604 lines)
+│   └── quickstart.py                    # One-command runner
+│
+├── dense_checkpoints/
+│   ├── step_100/                        # Initial dense checkpoint
+│   ├── step_200/
+│   └── step_300/
+│
+├── cfhot_checkpoints/
+│   ├── ckpt_5000/                       # 125× repetition head
+│   └── [ckpt_500 through ckpt_6000]
+│
+├── multi_head_checkpoints/
+│   ├── hedging_head/
+│   ├── verbosity_head/
+│   └── sycophancy_head/
+│
+└── paper/
+    ├── ubermenschetien_paper.tex
+    └── ubermenschetien_paper.md
+```
 ---
+## Hardware Requirements
+| Component | Minimum | Recommended |
+|-----------|---------|-------------|
+| GPU VRAM | 16 GB | 24 GB |
+| System RAM | 32 GB | 64 GB |
+| Disk Space | 50 GB | 100 GB |
+| Training Time | ~6 hours | ~4 hours |
+Tested on single NVIDIA RTX 3090 (24GB).
 ---
+## Limitations
+- **Scale:** 8B parameters only; larger models untested
+- **Language:** English only
+- **Iterations:** 3-5 stable iterations demonstrated
+- **Evaluation:** Heuristic metrics, no formal human evaluation
+- **Scope:** Bounded optimization, not open-ended self-improvement
 ---
+## Citation
+```bibtex
+@software{napolitano2025arc,
+  title={ARC: Adaptive Recursive Cognition},
+  author={Napolitano, Logan Matthew},
+  year={2025},
+  url={https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed}
+}
+```
 ---
+## References
+1. Zou et al. (2023). Representation Engineering: A Top-Down Approach to AI Transparency. arXiv:2310.01405
+2. Ouyang et al. (2022). Training language models to follow instructions with human feedback. NeurIPS.
+3. Rafailov et al. (2023). Direct Preference Optimization. arXiv:2305.18290
+4. Hu et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685
+5. Dettmers et al. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314
 ---
+## Acknowledgments
+- NousResearch for Hermes-3-Llama-3.1-8B
+- Hugging Face for transformers, PEFT, TRL
+- Meta AI for Llama 3.1 architecture