LoganResearch commited on
Commit
3f598f5
·
verified ·
1 Parent(s): b66d66c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +215 -171
README.md CHANGED
@@ -1,249 +1,293 @@
1
- # 🧠 ARC-enabled LLaMA-3.1-8B
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- ## Adaptive Response Control via CF-HoT
4
 
5
- **"Making an 8B Behave Like an 80B"**
6
 
7
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
- [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
9
- [![PyTorch 2.0+](https://img.shields.io/badge/pytorch-2.0+-red.svg)](https://pytorch.org/)
 
10
 
11
  ---
12
 
13
- ## 🔥 What is ARC?
14
 
15
- **ARC (Adaptive Response Control)** is a decode-time intervention system that detects and suppresses behavioral failure modes in language models:
 
 
 
 
 
16
 
17
- | Pattern | Detection | Effect |
18
- |---------|-----------|--------|
19
- | **Repetition** | 125× separation | Eliminates loops |
20
- | **Hedging** | 1.5× separation | Reduces "As an AI..." |
21
- | **Verbosity** | 2.1× separation | Cuts filler phrases |
22
 
23
- ARC uses lightweight prediction heads (~5K parameters each) trained on model hidden states. At inference time, these heads detect when the model is about to produce problematic patterns and intervene by modifying the logit distribution.
24
 
25
- **Result:** An 8B model that produces output quality comparable to models 10× its size.
 
 
 
 
 
 
 
26
 
27
  ---
28
 
29
- ## 📊 Results
30
 
31
- ### Token Efficiency
32
 
33
- | Model | Hedging Phrases | Filler Phrases | Useful Content |
34
- |-------|-----------------|----------------|----------------|
35
- | Base LLaMA-3.1-8B | 2-3 per response | 15-25% of tokens | ~60% |
36
- | ARC-enabled | 0-1 per response | <5% of tokens | ~90% |
37
 
38
- ### Example Comparison
39
 
40
- **Prompt:** "hello"
41
 
42
- | Base Model | ARC-enabled |
43
- |------------|-------------|
44
- | "Hello! I'm an AI assistant. How can I help you today? I'm happy to assist with any questions!" | "Hello. System active. How can I help?" |
45
- | 23 tokens, 2 hedges | 8 tokens, 0 hedges |
46
 
47
  ---
48
 
49
- ## 🚀 Quick Start
50
 
51
- ```bash
52
- # Clone repository
53
- git clone https://github.com/yourusername/arc-llama-8b
54
- cd arc-llama-8b
55
 
56
- # Install dependencies
57
- pip install torch transformers peft bitsandbytes
 
 
 
58
 
59
- # Run interactive dual-terminal UI
60
- python arc_llama_8b.py
61
 
62
- # Or run benchmarks
63
- python arc_benchmark.py
64
- ```
 
 
65
 
66
  ---
67
 
68
- ## 🎮 Dual Terminal UI
69
 
70
- The interactive interface shows Base vs ARC responses side-by-side:
71
-
72
- ```
73
- ╔══════════════════════════════════════════════════════════════════════════════╗
74
- ║ ARC-enabled LLaMA-3.1-8B ║
75
- ║ Adaptive Response Control via CF-HoT ║
76
- ╠═══════════════════════════════════════╦══════════════════════════════════════╣
77
- ║ ○ BASE LLaMA-3.1-8B ║ ◉ ARC-enabled ║
78
- ╠───────────────────────────────────────╬──────────────────────────────────────╣
79
- ║ Hello! I'm an AI assistant created ║ Hello. How can I assist you today? ║
80
- ║ to help you. I'm here to assist with ║ ║
81
- ║ any questions or tasks you might ║ ║
82
- ║ have. How can I help you today? ║ ║
83
- ╠───────────────────────────────────────╬──────────────────────────────────────╣
84
- ║ 34 tok | 245ms | 2 hedges ║ 9 tok | 201ms | 3 ARC interventions ║
85
- ╠══════════════════════════════════════════════════════════════════════════════╣
86
- ║ [Enter] Send | [/arc] ARC only | [/base] Base only | [/dual] Compare ║
87
- ╚══════════════════════════════════════════════════════════════════════════════╝
88
- ```
89
 
90
- ### Commands
91
 
92
- | Command | Description |
93
- |---------|-------------|
94
- | `/dual` | Side-by-side comparison (default) |
95
- | `/arc` | ARC-enabled output only |
96
- | `/base` | Base model output only |
97
- | `/help` | Show help |
98
- | `/quit` | Exit |
99
 
100
  ---
101
 
102
- ## 🏗️ Architecture
103
 
104
- ```
105
- ┌─────────────────────────────────────────────────────────────┐
106
- │ LLaMA-3.1-8B (frozen) │
107
- │ ↓ │
108
- │ Hidden States │
109
- │ [32 layers × 4096 dims] │
110
- │ ↓ │
111
- │ FIBER PROJECTIONS (shared) │
112
- │ [32 × 16 = 512 features] │
113
- │ ↓ │
114
- │ ┌────────────┬────────────┬────────────┐ │
115
- │ │ Repetition │ Hedging │ Verbosity │ │
116
- │ │ Head │ Head │ Head │ │
117
- │ │ (5.3K) │ (5.3K) │ (5.3K) │ │
118
- │ └────────────┴────────────┴────────────┘ │
119
- │ ↓ │
120
- │ INTERVENTION ENGINE │
121
- │ (modify logits based on risk scores) │
122
- │ ↓ │
123
- │ SAMPLE NEXT TOKEN │
124
- └─────────────────────────────────────────────────────────────┘
125
- ```
126
 
127
- ### Intervention Logic
128
 
129
- ```python
130
- # Each token generation step:
131
- risks = arc.get_risks(hidden_states)
132
 
133
- if risks['repetition'] > 0.7:
134
- # Suppress recently used tokens
135
- logits[recent_tokens] -= 5.0
136
 
137
- if risks['hedging'] > 0.6:
138
- # Suppress hedge phrase starters
139
- logits[hedge_tokens] -= 3.0
140
 
141
- if risks['verbosity'] > 0.65:
142
- # Suppress filler phrase starters
143
- logits[verbose_tokens] -= 2.0
 
 
 
 
 
 
 
 
 
 
144
  ```
145
 
146
- ### Overhead
147
 
148
- - **Latency:** <1% increase
149
- - **Memory:** ~16MB for all heads
150
- - **Compute:** Parallel head inference
151
 
152
- ---
153
 
154
- ## 📁 Repository Structure
 
 
 
155
 
156
- ```
157
- arc-llama-8b/
158
- ├── arc_llama_8b.py # Main inference with dual-terminal UI
159
- ├── arc_benchmark.py # Benchmarking script
160
- ├── README.md
161
-
162
- ├── results/
163
- │ ├── cfhot_risk_v2/
164
- │ │ └── ckpt_5000/ # Repetition head + fiber projections
165
- │ └── multi_head_v2/
166
- │ ├── hedging_head/ # Hedging detection head
167
- │ └── verbosity_head/ # Verbosity detection head
168
-
169
- └── docs/
170
- ├── ARC_Technical_Report.pdf
171
- └── CF-HoT_Paper.pdf
172
- ```
173
 
174
  ---
175
 
176
- ## 🔬 Technical Details
177
 
178
- ### CF-HoT: Contrastive Fiber Heads-on-Thought
179
 
180
- ARC is built on CF-HoT, a technique for training lightweight behavioral classifiers on transformer hidden states:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181
 
182
- 1. **Fiber Projections:** Linear projections from each layer's hidden state to a low-dimensional "fiber" space (16 dims)
183
 
184
- 2. **Contrastive Training:** Heads trained to distinguish between "good" and "bad" behavioral examples
185
 
186
- 3. **Layer Aggregation:** Learned weighted combination of all layers' fiber projections
187
 
188
- 4. **Real-time Inference:** Heads run in parallel during generation with negligible overhead
 
 
 
189
 
190
- ### Training Data
 
191
 
192
- | Head | Positive Examples | Negative Examples |
193
- |------|-------------------|-------------------|
194
- | Repetition | Tokens that repeat recent context | Novel tokens |
195
- | Hedging | "As an AI...", "I cannot..." | Direct statements |
196
- | Verbosity | "Let me explain...", "Basically..." | Concise phrases |
197
 
198
- ### Separation Metrics
199
 
200
- | Head | Mean Positive | Mean Negative | Separation |
201
- |------|---------------|---------------|------------|
202
- | Repetition | 0.94 | 0.0075 | **125×** |
203
- | Hedging | 0.62 | 0.41 | **1.5×** |
204
- | Verbosity | 0.68 | 0.32 | **2.1×** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
205
 
206
  ---
207
 
208
- ## 📝 Citation
209
 
210
- ```bibtex
211
- @software{arc_llama_2026,
212
- title={ARC-enabled LLaMA-3.1-8B: Adaptive Response Control via CF-HoT},
213
- author={Anonymous},
214
- year={2026},
215
- url={https://github.com/yourusername/arc-llama-8b}
216
- }
217
- ```
218
 
219
  ---
220
 
221
- ## 🔗 Links
222
 
223
- - **HuggingFace Model:** [Coming soon]
224
- - **Zenodo DOI:** [Coming soon]
225
- - **Paper:** [Coming soon]
 
 
226
 
227
  ---
228
 
229
- ## ⚠️ Limitations
230
-
231
- - ARC modifies model behavior at decode-time only
232
- - Intervention thresholds may need tuning for different use cases
233
- - Currently optimized for LLaMA-3.1 architecture
234
- - Heads trained on English text
 
 
 
235
 
236
  ---
237
 
238
- ## 📜 License
239
 
240
- MIT License - See LICENSE file for details.
 
 
 
 
241
 
242
  ---
243
 
244
- ## 🙏 Acknowledgments
245
 
246
- Built on:
247
- - [LLaMA-3.1](https://llama.meta.com/) by Meta
248
- - [Transformers](https://huggingface.co/transformers/) by Hugging Face
249
- - [PEFT](https://github.com/huggingface/peft) for efficient fine-tuning
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - llama
9
+ - dense-responses
10
+ - self-optimization
11
+ - representation-engineering
12
+ base_model: NousResearch/Hermes-3-Llama-3.1-8B
13
+ ---
14
+ ![ARC Banner](banner.svg)
15
 
16
+ # ARC: Adaptive Recursive Cognition
17
 
18
+ A closed-loop control system that uses internal state predictability to improve response efficiency without collapsing.
19
 
20
+ **Author:** Logan Matthew Napolitano
21
+ **Base Model:** NousResearch/Hermes-3-Llama-3.1-8B
22
+ **License:** CC BY 4.0
23
+ **Code:** 7,111 lines | **Weights:** ~6.5 GB
24
 
25
  ---
26
 
27
+ ## Quick Start
28
 
29
+ ```bash
30
+ git clone https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed
31
+ cd ARC-Base-8B-Condensed
32
+ pip install torch transformers peft bitsandbytes accelerate trl chromadb sentence-transformers
33
+ python ubermenschetien_v2_full.py
34
+ ```
35
 
36
+ That's it. The engine handles CF-HoT steering, dense generation, everything.
 
 
 
 
37
 
38
+ ### Commands
39
 
40
+ ```
41
+ > hello # Chat
42
+ > !improve # Start self-improvement loop
43
+ > !eval # Evaluate current model
44
+ > !status # Show system status
45
+ > !shell <cmd> # Execute shell command
46
+ > !python <code> # Execute Python
47
+ ```
48
 
49
  ---
50
 
51
+ ## Overview
52
 
53
+ ### What This Is
54
 
55
+ Bounded self-optimization of response quality. The model iteratively improves its own outputs within well-defined parameters—multi-metric evaluation, conservative training, automatic rollback.
 
 
 
56
 
57
+ Most self-improvement demos collapse within 1-3 iterations. This one doesn't, and the logs prove it.
58
 
59
+ ### What This Is Not
60
 
61
+ - Not AGI or open-ended self-improvement
62
+ - Cannot modify its own architecture
63
+ - Cannot acquire capabilities beyond training distribution
64
+ - Cannot improve without human-defined metrics and examples
65
 
66
  ---
67
 
68
+ ## Key Finding: 125× Class Separation
69
 
70
+ The CF-HoT repetition head predicts repetitive behavior from hidden states before it occurs:
 
 
 
71
 
72
+ | Metric | Value |
73
+ |--------|-------|
74
+ | Score on repetitive text | 0.875 |
75
+ | Score on non-repetitive | 0.007 |
76
+ | Separation ratio | **125×** |
77
 
78
+ This is the most important empirical result. The model encodes "I'm about to repeat" as a distinct internal state, detectable before the tokens are generated. This is quantitative, replicable, and implies something real about how the model represents behavioral states.
 
79
 
80
+ | Head | Positive | Negative | Separation |
81
+ |------|----------|----------|------------|
82
+ | Repetition | 0.875 | 0.007 | **125×** |
83
+ | Verbosity | 0.68 | 0.32 | 2.1× |
84
+ | Hedging | 0.58 | 0.39 | 1.5× |
85
 
86
  ---
87
 
88
+ ## Results
89
 
90
+ | Metric | Baseline | ARC | Change |
91
+ |--------|----------|-----|--------|
92
+ | Information Density | 17.0 | 28.5 | +68% |
93
+ | Avg Response Tokens | 150 | 65 | -57% |
94
+ | Filler Phrases | High | ~0 | -95% |
95
+ | Mode Collapse Events | Frequent | Zero | Prevented |
 
 
 
 
 
 
 
 
 
 
 
 
 
96
 
97
+ ### Response Examples
98
 
99
+ | Prompt | Base Model | ARC |
100
+ |--------|-----------|-----|
101
+ | "hello" | "Hello! I'm here to help you with any questions or tasks you might have. Feel free to ask me anything!" (23 tokens) | "Hello. How can I help?" (5 tokens) |
102
+ | "What is recursion?" | "That's a great question! Recursion is a programming concept where a function calls itself..." (150+ tokens) | "Function self-invocation until base case. Stack frames accumulate, unwind." (18 tokens) |
103
+ | "How are you?" | "As an AI, I don't have feelings in the traditional sense, but I'm functioning well and ready to assist..." (28 tokens) | "Functional and ready. What's the task?" (6 tokens) |
 
 
104
 
105
  ---
106
 
107
+ ## Self-Improvement Stability
108
 
109
+ | Iteration | Quality | Coherence | Action | Notes |
110
+ |-----------|---------|-----------|--------|-------|
111
+ | 0 | 0.52 | 0.75 | - | Baseline |
112
+ | 1 | 0.58 | 0.78 | KEEP | Improved |
113
+ | 2 | 0.35 | 0.45 | **ROLLBACK** | Collapse detected, auto-recovered |
114
+ | 3 | 0.61 | 0.80 | KEEP | Continued improving |
115
+ | 4 | 0.59 | 0.79 | KEEP | Stable |
116
+ | 5 | 0.63 | 0.82 | KEEP | Final |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
+ Iteration 2 collapsed. The system detected it, rolled back, and continued. The safeguards work exactly as designed.
119
 
120
+ ---
121
+
122
+ ## System Components
123
 
124
+ ### 1. CF-HoT (Contrastive Fine-tuning with Hidden-state Oversight Training)
 
 
125
 
126
+ Real-time behavioral control via representation engineering:
 
 
127
 
128
+ - Monitors hidden states at each token position
129
+ - Predicts behavioral risks before tokens are generated
130
+ - Applies corrective logit penalties when risk exceeds threshold
131
+ - 125× separation for repetition detection
132
+
133
+ ### 2. THE CONDENSATOR
134
+
135
+ 4-stage dense response training:
136
+ ```
137
+ Stage 1: SFT → 53 gold-standard dense examples (Loss: 1.17 → 0.72)
138
+ Stage 2: DPO → Preference pairs: dense > verbose
139
+ Stage 3: RL → PPO with calibrated density reward
140
+ Stage 4: Checkpoint → Save every 25 steps, maintain best for rollback
141
  ```
142
 
143
+ Key insight: SFT loss dropped from 1.17 to 0.72, proving actual learning occurred.
144
 
145
+ ### 3. Stability Loop
 
 
146
 
147
+ Multi-metric evaluation defeats Goodhart's Law:
148
 
149
+ - **Density (0.25)** — information per token
150
+ - **Coherence (0.25)** — grammatical, readable output
151
+ - **Helpfulness (0.25)** — addresses the prompt
152
+ - **Penalties (0.25)** — filler detection, gibberish patterns
153
 
154
+ A/B checkpoint comparison with automatic rollback on quality drops > 0.05.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
 
156
  ---
157
 
158
+ ## API Integration
159
 
160
+ For developers integrating ARC into their own applications:
161
 
162
+ ```python
163
+ from transformers import AutoModelForCausalLM, AutoTokenizer
164
+ from peft import PeftModel
165
+ import torch
166
+
167
+ base = AutoModelForCausalLM.from_pretrained(
168
+ "NousResearch/Hermes-3-Llama-3.1-8B",
169
+ torch_dtype=torch.float16,
170
+ device_map="auto",
171
+ load_in_4bit=True
172
+ )
173
+
174
+ model = PeftModel.from_pretrained(
175
+ base,
176
+ "LoganResearch/ARC-Base-8B-Condensed",
177
+ subfolder="dense_checkpoints/step_100"
178
+ )
179
+
180
+ tokenizer = AutoTokenizer.from_pretrained("NousResearch/Hermes-3-Llama-3.1-8B")
181
+
182
+ prompt = "<|im_start|>user\nWhat is recursion?<|im_end|>\n<|im_start|>assistant\n"
183
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
184
+ output = model.generate(**inputs, max_new_tokens=50)
185
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
186
+ ```
187
 
188
+ Note: For full dense output with CF-HoT steering, use the main engine (`ubermenschetien_v2_full.py`).
189
 
190
+ ---
191
 
192
+ ## Training From Scratch
193
 
194
+ ```bash
195
+ git clone https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed
196
+ cd ARC-Base-8B-Condensed
197
+ pip install torch transformers peft bitsandbytes accelerate trl chromadb sentence-transformers
198
 
199
+ # Full pipeline (~4 hours on RTX 3090)
200
+ python training_scripts/quickstart.py --full
201
 
202
+ # Or step by step:
203
+ python training_scripts/train_cfhot_head.py --behavior repetition --steps 5000
204
+ python training_scripts/the_condensator.py --sft-epochs 3 --rl-steps 300
205
+ python training_scripts/train_self_improve.py --iterations 5
206
+ ```
207
 
208
+ ---
209
 
210
+ ## Repository Structure
211
+ ```
212
+ ARC-Base-8B-Condensed/
213
+ ├── ubermenschetien_v2_full.py # Main engine (2,055 lines)
214
+ ├── ubermenschetien_agentic_full.py # Agentic variant (1,589 lines)
215
+ ├── ubermenschetien_heaven_engine_dense.py
216
+
217
+ ├── training_scripts/
218
+ │ ├── the_condensator.py # 4-stage training (797 lines)
219
+ │ ├── train_cfhot_head.py # CF-HoT training (546 lines)
220
+ │ ├── train_self_improve.py # Self-improvement loop (604 lines)
221
+ │ └── quickstart.py # One-command runner
222
+
223
+ ├── dense_checkpoints/
224
+ │ ├── step_100/ # Initial dense checkpoint
225
+ │ ├── step_200/
226
+ │ └── step_300/
227
+
228
+ ├── cfhot_checkpoints/
229
+ │ ├── ckpt_5000/ # 125× repetition head
230
+ │ └── [ckpt_500 through ckpt_6000]
231
+
232
+ ├── multi_head_checkpoints/
233
+ │ ├── hedging_head/
234
+ │ ├── verbosity_head/
235
+ │ └── sycophancy_head/
236
+
237
+ └── paper/
238
+ ├── ubermenschetien_paper.tex
239
+ └── ubermenschetien_paper.md
240
+ ```
241
 
242
  ---
243
 
244
+ ## Hardware Requirements
245
 
246
+ | Component | Minimum | Recommended |
247
+ |-----------|---------|-------------|
248
+ | GPU VRAM | 16 GB | 24 GB |
249
+ | System RAM | 32 GB | 64 GB |
250
+ | Disk Space | 50 GB | 100 GB |
251
+ | Training Time | ~6 hours | ~4 hours |
252
+
253
+ Tested on single NVIDIA RTX 3090 (24GB).
254
 
255
  ---
256
 
257
+ ## Limitations
258
 
259
+ - **Scale:** 8B parameters only; larger models untested
260
+ - **Language:** English only
261
+ - **Iterations:** 3-5 stable iterations demonstrated
262
+ - **Evaluation:** Heuristic metrics, no formal human evaluation
263
+ - **Scope:** Bounded optimization, not open-ended self-improvement
264
 
265
  ---
266
 
267
+ ## Citation
268
+ ```bibtex
269
+ @software{napolitano2025arc,
270
+ title={ARC: Adaptive Recursive Cognition},
271
+ author={Napolitano, Logan Matthew},
272
+ year={2025},
273
+ url={https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed}
274
+ }
275
+ ```
276
 
277
  ---
278
 
279
+ ## References
280
 
281
+ 1. Zou et al. (2023). Representation Engineering: A Top-Down Approach to AI Transparency. arXiv:2310.01405
282
+ 2. Ouyang et al. (2022). Training language models to follow instructions with human feedback. NeurIPS.
283
+ 3. Rafailov et al. (2023). Direct Preference Optimization. arXiv:2305.18290
284
+ 4. Hu et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685
285
+ 5. Dettmers et al. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314
286
 
287
  ---
288
 
289
+ ## Acknowledgments
290
 
291
+ - NousResearch for Hermes-3-Llama-3.1-8B
292
+ - Hugging Face for transformers, PEFT, TRL
293
+ - Meta AI for Llama 3.1 architecture