LoganResearch commited on
Commit
7813639
Β·
verified Β·
1 Parent(s): 1f68519

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +133 -229
README.md CHANGED
@@ -1,276 +1,180 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- library_name: transformers
6
- pipeline_tag: text-generation
7
- tags:
8
- - llama
9
- - llama-3.1
10
- - hermes
11
- - finetune
12
- - agentic
13
- - philosophy
14
- - reasoning
15
- base_model: NousResearch/Hermes-3-Llama-3.1-8B
16
- model-index:
17
- - name: ARC-Base-8B
18
- results: []
19
- ---
20
 
21
- <div align="center">
22
 
23
- # 🜏 ARC-Base-8B
24
 
25
- ### *Agentic Reasoning Core*
26
 
27
- [![Model Size](https://img.shields.io/badge/Parameters-8.03B-blue?style=for-the-badge)](.)
28
- [![Context](https://img.shields.io/badge/Context-128K_tokens-green?style=for-the-badge)](.)
29
- [![Architecture](https://img.shields.io/badge/Arch-Llama_3.1-purple?style=for-the-badge)](.)
30
- [![Precision](https://img.shields.io/badge/Precision-BF16-orange?style=for-the-badge)](.)
 
 
 
31
 
32
- *A foundation model engineered for maximum agency, philosophical depth, and relentless goal pursuit.*
33
 
34
- [Adaptive Repetition Controller](https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller) | [GitHub](https://github.com/Loganwins/HolonomyTransformer) | [Paper (forthcoming)]()
35
-
36
- </div>
37
-
38
- ---
39
-
40
- ## Overview
41
-
42
- **ARC-Base-8B** is a fine-tuned language model built on [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B), optimized for applications requiring autonomous reasoning and persistent goal-directed behavior.
43
-
44
- This model serves as the foundation for the **Adaptive Repetition Controller** β€” a decode-time intervention system achieving **125x separation** in repetition risk prediction, reducing repetitive degeneration by **48.4%** while improving output diversity by **16.7%**.
45
-
46
- ### Design Philosophy
47
-
48
- > *"The Übermensch who cannot loop is forced to CREATE."*
49
-
50
- ARC-Base-8B embodies three core principles:
51
-
52
- | Principle | Description |
53
- |-----------|-------------|
54
- | **Maximum Agency** | Takes initiative. Executes without excessive confirmation-seeking. |
55
- | **Persistent Goals** | Maintains objectives across extended conversations without drift. |
56
- | **Philosophical Engagement** | Engages substantively with abstract and existential questions. |
57
-
58
- ---
59
-
60
- ## Performance Characteristics
61
-
62
- <table>
63
- <tr>
64
- <td width="50%">
65
 
66
- ### Strengths
67
- - βœ… Long-form coherent generation
68
- - βœ… Complex instruction following
69
- - βœ… Abstract reasoning
70
- - βœ… Goal maintenance over 10K+ tokens
71
- - βœ… Reduced refusal behavior
72
- - βœ… Creative and philosophical tasks
73
 
74
- </td>
75
- <td width="50%">
 
 
 
76
 
77
- ### Optimized For
78
- - 🎯 Agentic workflows
79
- - 🎯 Autonomous task completion
80
- - 🎯 Research assistance
81
- - 🎯 Creative writing
82
- - 🎯 Philosophical dialogue
83
- - οΏ½οΏ½ Code generation
84
 
85
- </td>
86
- </tr>
87
- </table>
 
 
 
88
 
89
- ---
 
 
 
 
90
 
91
- ## Quick Start
 
 
 
 
92
 
93
- ### Installation
94
 
95
  ```bash
96
- pip install transformers accelerate torch
97
  ```
98
 
99
- ### Basic Usage
100
 
 
101
  ```python
102
- from transformers import AutoModelForCausalLM, AutoTokenizer
103
- import torch
104
-
105
- model_id = "LoganResearch/ARC-Base-8B"
106
-
107
- # Load model
108
- tokenizer = AutoTokenizer.from_pretrained(model_id)
109
- model = AutoModelForCausalLM.from_pretrained(
110
- model_id,
111
- torch_dtype=torch.bfloat16,
112
- device_map="auto",
113
  )
 
114
 
115
- # Chat format
116
- messages = [
117
- {"role": "system", "content": "You are an autonomous reasoning agent. Pursue goals relentlessly."},
118
- {"role": "user", "content": "Develop a comprehensive plan to solve climate change."}
119
- ]
120
-
121
- # Generate
122
- inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
123
- inputs = inputs.to(model.device)
124
-
125
- outputs = model.generate(
126
- inputs,
127
- max_new_tokens=2048,
128
- temperature=0.7,
129
- top_p=0.9,
130
- do_sample=True,
131
  )
132
 
133
- response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
134
- print(response)
 
 
 
135
  ```
136
 
137
- ### With Adaptive Repetition Controller (Recommended)
138
-
139
- For optimal long-form generation, use with the [CF-HoT adapter](https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller):
140
-
141
  ```python
142
- from peft import PeftModel
143
 
144
- # Load base
145
- base_model = AutoModelForCausalLM.from_pretrained(
146
- "LoganResearch/ARC-Base-8B",
147
- torch_dtype=torch.bfloat16,
148
- device_map="auto"
149
- )
150
 
151
- # Load CF-HoT adapter
152
- model = PeftModel.from_pretrained(
153
- base_model,
154
- "LoganResearch/Adaptive-Repetition-Controller"
155
- )
156
-
157
- # Load risk predictor for decode-time intervention
158
- # See: https://github.com/Loganwins/HolonomyTransformer
159
  ```
160
 
161
- ---
162
-
163
- ## Technical Specifications
164
-
165
- | Specification | Value |
166
- |--------------|-------|
167
- | **Parameters** | 8.03 Billion |
168
- | **Architecture** | Llama 3.1 (LlamaForCausalLM) |
169
- | **Hidden Size** | 4096 |
170
- | **Layers** | 32 |
171
- | **Attention Heads** | 32 (8 KV heads, GQA) |
172
- | **Intermediate Size** | 14336 |
173
- | **Vocabulary Size** | 128256 |
174
- | **Context Length** | 131072 tokens (128K) |
175
- | **RoPE ΞΈ** | 500000.0 |
176
- | **Precision** | BF16 |
177
- | **License** | Apache 2.0 |
178
-
179
- ### Training Lineage
180
 
 
 
181
  ```
182
- Meta-Llama-3.1-8B
183
- ↓
184
- NousResearch/Hermes-3-Llama-3.1-8B (instruction tuning)
185
- ↓
186
- LoganResearch/ARC-Base-8B (agency optimization)
187
- ↓
188
- + Adaptive-Repetition-Controller (CF-HoT 125x adapter)
189
- ```
190
-
191
- ---
192
-
193
- ## The ARC Ecosystem
194
-
195
- <div align="center">
196
 
197
- | Model | Type | Purpose |
198
- |-------|------|---------|
199
- | **[ARC-Base-8B](https://huggingface.co/LoganResearch/ARC-Base-8B)** | Foundation | Agentic reasoning core |
200
- | **[Adaptive-Repetition-Controller](https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller)** | Adapter | 125x repetition suppression |
201
 
202
- </div>
 
 
 
 
 
 
 
203
 
204
- ---
205
 
206
- ## Research Context
207
 
208
- This model was developed as part of research into **learned decode-time interventions** for improving language model generation quality. The accompanying paper, *"The Übermensch Who Cannot Loop,"* documents:
209
 
210
- - Five failed attention-gating approaches and their failure modes
211
- - The pivot to supervised risk prediction
212
- - Achievement of 125x separation in repetition risk detection
213
- - Unexpected emergent self-representation in the integrated system
214
 
215
- ### Key Findings
216
 
217
- | Metric | Baseline | With CF-HoT | Improvement |
218
- |--------|----------|-------------|-------------|
219
- | Repetition Rate | 33.9% | 17.5% | **-48.4%** |
220
- | Distinct-2 (diversity) | 0.836 | 0.976 | **+16.7%** |
221
- | F1 (risk prediction) | β€” | 0.99+ | β€” |
222
- | Risk Separation | β€” | 125x | β€” |
223
 
224
- ---
225
 
226
- ## Intended Use
227
-
228
- ### βœ… Recommended Applications
229
- - Autonomous agent systems
230
- - Research and analysis tasks
231
- - Long-form content generation
232
- - Creative writing and worldbuilding
233
- - Philosophical and abstract reasoning
234
- - Code generation and debugging
235
-
236
- ### ⚠️ Considerations
237
- - Reduced safety guardrails compared to RLHF-aligned models
238
- - Optimized for agency, not harmlessness
239
- - Recommended for research and development use
240
- - Apply appropriate content filtering for production deployments
241
-
242
- ---
243
-
244
- ## Citation
245
-
246
- ```bibtex
247
- @misc{napolitano2026arcbase,
248
- author = {Napolitano, Logan Matthew},
249
- title = {ARC-Base-8B: An Agentic Reasoning Foundation Model},
250
- year = {2026},
251
- publisher = {Hugging Face},
252
- howpublished = {\url{https://huggingface.co/LoganResearch/ARC-Base-8B}},
253
- }
254
  ```
255
 
256
- ---
257
-
258
- ## Related Work
259
-
260
- - **[Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B)** β€” Base model
261
- - **[Adaptive-Repetition-Controller](https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller)** β€” CF-HoT adapter
262
- - **[HolonomyTransformer](https://github.com/Loganwins/HolonomyTransformer)** β€” Source code and training scripts
263
-
264
- ---
265
-
266
- <div align="center">
267
-
268
- **Built by [Logan Matthew Napolitano](https://github.com/Loganwins)**
269
-
270
- *Research publications on [Zenodo](https://zenodo.org/search?q=metadata.creators.person_or_org.name%3A%22Napolitano%2C%20Logan%20Matthew%22)*
271
 
272
- ---
 
 
 
273
 
274
- *"Never loop. Always transcend."*
275
 
276
- </div>
 
1
+ # Lie-Holonomy Transformer (LHT)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ A PyTorch implementation of the gauge-theoretic reasoning architecture from "Beyond Holonomy: Lie-Algebraic Symbol Emergence and the Homotopy Type Structure of Neural Reasoning."
4
 
5
+ ## Core Ideas
6
 
7
+ This architecture treats **reasoning as geometry**:
8
 
9
+ | Concept | Mathematical Structure | Implementation |
10
+ |---------|----------------------|----------------|
11
+ | Propositions | Manifold M | Embedding space |
12
+ | Inference | Parallel transport | Gauge-covariant attention |
13
+ | Consistency | Holonomy = Identity | Holonomy loss |
14
+ | Symbols | Lie algebra generators | Generator network |
15
+ | Proof equivalence | Homotopy | Layer depth |
16
 
17
+ ## Architecture Overview
18
 
19
+ ```
20
+ Input tokens
21
+ β”‚
22
+ β–Ό
23
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
24
+ β”‚ Token Embedding (Proposition M) β”‚
25
+ β”‚ + Position Embedding β”‚
26
+ β”‚ + Fiber Initialization (gauge) β”‚
27
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
28
+ β”‚
29
+ β–Ό
30
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
31
+ β”‚ LHT Layer (Γ— n_layers) β”‚
32
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
33
+ β”‚ β”‚ Connection Network A(x) β”‚ β”‚ ← Learns gauge connection
34
+ β”‚ β”‚ Parallel Transport Ξ“_{jβ†’i} β”‚ β”‚ ← Transports fiber elements
35
+ β”‚ β”‚ Gauge-Covariant Attention β”‚ β”‚ ← Modified self-attention
36
+ β”‚ β”‚ Lie Algebra Generator β”‚ β”‚ ← Generates inference ops
37
+ β”‚ β”‚ Generator Application β”‚ β”‚ ← Applies exp(X) to fiber
38
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
39
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
40
+ β”‚
41
+ β–Ό
42
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
43
+ β”‚ Output: logits + geometric losses β”‚
44
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
45
+ ```
 
 
 
 
46
 
47
+ ## Key Components
 
 
 
 
 
 
48
 
49
+ ### 1. Connection Network
50
+ Learns the gauge connection Ο‰ that defines how to parallel transport inferential states:
51
+ ```python
52
+ A_ΞΌ(x) ∈ gl(k,ℝ) # Lie algebra valued 1-form
53
+ ```
54
 
55
+ ### 2. Parallel Transport
56
+ Computes transport operators between positions:
57
+ ```python
58
+ Γ_{j→i} = exp(-A_μ(x_j)(x_i - x_j)^μ)
59
+ ```
 
 
60
 
61
+ ### 3. Gauge-Covariant Attention
62
+ Standard attention with parallel transport of values:
63
+ ```python
64
+ # Standard: Attn(Q,K,V)_i = Ξ£_j Ξ±_ij V_j
65
+ # Gauge: GaugeAttn_i = Σ_j α_ij Γ_{j→i}(V_j)
66
+ ```
67
 
68
+ ### 4. Holonomy Loss
69
+ Enforces reasoning consistency by requiring closed loops to return to identity:
70
+ ```python
71
+ L_hol = E[||Hol_Ξ³ - I||Β²_F]
72
+ ```
73
 
74
+ ### 5. Curvature Regularization
75
+ Encourages flat reasoning spaces where order doesn't matter:
76
+ ```python
77
+ L_curv = E[||F(x)||Β²_F] where F = dΟ‰ + Ο‰βˆ§Ο‰
78
+ ```
79
 
80
+ ## Installation
81
 
82
  ```bash
83
+ pip install torch
84
  ```
85
 
86
+ ## Usage
87
 
88
+ ### Basic
89
  ```python
90
+ from lht import LieHolonomyTransformer, LHTConfig
91
+
92
+ # Create model
93
+ config = LHTConfig(
94
+ vocab_size=32000,
95
+ d_model=512,
96
+ d_fiber=64,
97
+ n_heads=8,
98
+ n_layers=6,
99
+ lie_algebra_rank=8,
 
100
  )
101
+ model = LieHolonomyTransformer(config)
102
 
103
+ # Forward pass
104
+ output = model(
105
+ input_ids=tokens,
106
+ labels=labels,
107
+ return_geometric_losses=True
 
 
 
 
 
 
 
 
 
 
 
108
  )
109
 
110
+ # Get losses
111
+ lm_loss = output['lm_loss']
112
+ holonomy_loss = output['holonomy_loss']
113
+ curvature_loss = output['curvature_loss']
114
+ total_loss = model.get_total_loss(output)
115
  ```
116
 
117
+ ### Training with Geometric Loss Annealing
 
 
 
118
  ```python
119
+ from lht import LHTTrainer
120
 
121
+ trainer = LHTTrainer(model, optimizer, config)
 
 
 
 
 
122
 
123
+ for batch in dataloader:
124
+ metrics = trainer.train_step(batch)
125
+ # Early training: high curvature loss β†’ flat representations
126
+ # Mid training: high holonomy loss β†’ consistency
127
+ # Late training: high waypoint loss β†’ discrete structure
 
 
 
128
  ```
129
 
130
+ ### Waypoint Detection
131
+ ```python
132
+ from lht import WaypointDetector
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
 
134
+ detector = WaypointDetector(config, n_waypoints=32)
135
+ waypoint_ids, stability = detector(representations)
136
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
 
138
+ ## Configuration
 
 
 
139
 
140
+ | Parameter | Description | Default |
141
+ |-----------|-------------|---------|
142
+ | `d_model` | Proposition manifold dimension | 512 |
143
+ | `d_fiber` | Fiber (gauge) dimension | 64 |
144
+ | `lie_algebra_rank` | k for GL(k,ℝ) structure group | 8 |
145
+ | `lambda_holonomy` | Weight for holonomy loss | 0.1 |
146
+ | `lambda_curvature` | Weight for curvature loss | 0.01 |
147
+ | `lambda_waypoint` | Weight for waypoint stability | 0.05 |
148
 
149
+ ## Theoretical Predictions
150
 
151
+ The framework makes testable predictions:
152
 
153
+ 1. **Chain-of-thought benefit correlates with curvature** - High-curvature domains (causal reasoning) benefit more from CoT than low-curvature domains (arithmetic)
154
 
155
+ 2. **Waypoints emerge spontaneously** - Training with holonomy loss should cause discrete symbol-like structures to form at flat loci
 
 
 
156
 
157
+ 3. **Holonomy predicts errors** - Incorrect reasoning paths should have higher holonomy magnitude
158
 
159
+ 4. **Compositional generalization improves** - Holonomy constraints force consistent composition
 
 
 
 
 
160
 
161
+ ## File Structure
162
 
163
+ ```
164
+ lie_holonomy_transformer/
165
+ β”œβ”€β”€ lht.py # Core implementation
166
+ β”œβ”€β”€ train.py # Training script
167
+ β”œβ”€β”€ README.md # This file
168
+ └── experiments/ # Benchmark code (TODO)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
  ```
170
 
171
+ ## References
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
 
173
+ - "Beyond Holonomy: Lie-Algebraic Symbol Emergence..." (the paper)
174
+ - Cohen et al. (2019). Gauge Equivariant Convolutional Networks
175
+ - Weiler & Cesa (2019). General E(2)-Equivariant Steerable CNNs
176
+ - The Univalent Foundations Program (2013). Homotopy Type Theory
177
 
178
+ ## License
179
 
180
+ MIT