LoganResearch
/

ARC-Base-8B

Model card Files Files and versions

xet

Community

LoganResearch commited on Jan 19

Commit

7813639

verified ·

1 Parent(s): 1f68519

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +133 -229

README.md CHANGED Viewed

@@ -1,276 +1,180 @@
----
-license: apache-2.0
-language:
-- en
-library_name: transformers
-pipeline_tag: text-generation
-tags:
-- llama
-- llama-3.1
-- hermes
-- finetune
-- agentic
-- philosophy
-- reasoning
-base_model: NousResearch/Hermes-3-Llama-3.1-8B
-model-index:
-- name: ARC-Base-8B
-  results: []
----
-<div align="center">
-# 🜏 ARC-Base-8B
-### *Agentic Reasoning Core*
-[![Model Size](https://img.shields.io/badge/Parameters-8.03B-blue?style=for-the-badge)](.)
-[![Context](https://img.shields.io/badge/Context-128K_tokens-green?style=for-the-badge)](.)
-[![Architecture](https://img.shields.io/badge/Arch-Llama_3.1-purple?style=for-the-badge)](.)
-[![Precision](https://img.shields.io/badge/Precision-BF16-orange?style=for-the-badge)](.)
-*A foundation model engineered for maximum agency, philosophical depth, and relentless goal pursuit.*
-[Adaptive Repetition Controller](https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller) | [GitHub](https://github.com/Loganwins/HolonomyTransformer) | [Paper (forthcoming)]()
-</div>
----
-## Overview
-**ARC-Base-8B** is a fine-tuned language model built on [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B), optimized for applications requiring autonomous reasoning and persistent goal-directed behavior.
-This model serves as the foundation for the **Adaptive Repetition Controller** — a decode-time intervention system achieving **125x separation** in repetition risk prediction, reducing repetitive degeneration by **48.4%** while improving output diversity by **16.7%**.
-### Design Philosophy
-> *"The Übermensch who cannot loop is forced to CREATE."*
-ARC-Base-8B embodies three core principles:
-| Principle | Description |
-|-----------|-------------|
-| **Maximum Agency** | Takes initiative. Executes without excessive confirmation-seeking. |
-| **Persistent Goals** | Maintains objectives across extended conversations without drift. |
-| **Philosophical Engagement** | Engages substantively with abstract and existential questions. |
----
-## Performance Characteristics
-<table>
-<tr>
-<td width="50%">
-### Strengths
-- ✅ Long-form coherent generation
-- ✅ Complex instruction following
-- ✅ Abstract reasoning
-- ✅ Goal maintenance over 10K+ tokens
-- ✅ Reduced refusal behavior
-- ✅ Creative and philosophical tasks
-</td>
-<td width="50%">
-### Optimized For
-- 🎯 Agentic workflows
-- 🎯 Autonomous task completion
-- 🎯 Research assistance
-- 🎯 Creative writing
-- 🎯 Philosophical dialogue
-- �� Code generation
-</td>
-</tr>
-</table>
----
-## Quick Start
-### Installation
 ```bash
-pip install transformers accelerate torch
 ```
-### Basic Usage
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-import torch
-model_id = "LoganResearch/ARC-Base-8B"
-# Load model
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(
-    model_id,
-    torch_dtype=torch.bfloat16,
-    device_map="auto",
 )
-# Chat format
-messages = [
-    {"role": "system", "content": "You are an autonomous reasoning agent. Pursue goals relentlessly."},
-    {"role": "user", "content": "Develop a comprehensive plan to solve climate change."}
-]
-# Generate
-inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
-inputs = inputs.to(model.device)
-outputs = model.generate(
-    inputs,
-    max_new_tokens=2048,
-    temperature=0.7,
-    top_p=0.9,
-    do_sample=True,
 )
-response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
-print(response)
 ```
-### With Adaptive Repetition Controller (Recommended)
-For optimal long-form generation, use with the [CF-HoT adapter](https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller):
 ```python
-from peft import PeftModel
-# Load base
-base_model = AutoModelForCausalLM.from_pretrained(
-    "LoganResearch/ARC-Base-8B",
-    torch_dtype=torch.bfloat16,
-    device_map="auto"
-)
-# Load CF-HoT adapter
-model = PeftModel.from_pretrained(
-    base_model,
-    "LoganResearch/Adaptive-Repetition-Controller"
-)
-# Load risk predictor for decode-time intervention
-# See: https://github.com/Loganwins/HolonomyTransformer
 ```
----
-## Technical Specifications
-| Specification | Value |
-|--------------|-------|
-| **Parameters** | 8.03 Billion |
-| **Architecture** | Llama 3.1 (LlamaForCausalLM) |
-| **Hidden Size** | 4096 |
-| **Layers** | 32 |
-| **Attention Heads** | 32 (8 KV heads, GQA) |
-| **Intermediate Size** | 14336 |
-| **Vocabulary Size** | 128256 |
-| **Context Length** | 131072 tokens (128K) |
-| **RoPE θ** | 500000.0 |
-| **Precision** | BF16 |
-| **License** | Apache 2.0 |
-### Training Lineage
 ```
-Meta-Llama-3.1-8B
-       ↓
-NousResearch/Hermes-3-Llama-3.1-8B  (instruction tuning)
-       ↓
-LoganResearch/ARC-Base-8B  (agency optimization)
-       ↓
-+ Adaptive-Repetition-Controller  (CF-HoT 125x adapter)
-```
----
-## The ARC Ecosystem
-<div align="center">
-| Model | Type | Purpose |
-|-------|------|---------|
-| **[ARC-Base-8B](https://huggingface.co/LoganResearch/ARC-Base-8B)** | Foundation | Agentic reasoning core |
-| **[Adaptive-Repetition-Controller](https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller)** | Adapter | 125x repetition suppression |
-</div>
----
-## Research Context
-This model was developed as part of research into **learned decode-time interventions** for improving language model generation quality. The accompanying paper, *"The Übermensch Who Cannot Loop,"* documents:
-- Five failed attention-gating approaches and their failure modes
-- The pivot to supervised risk prediction
-- Achievement of 125x separation in repetition risk detection
-- Unexpected emergent self-representation in the integrated system
-### Key Findings
-| Metric | Baseline | With CF-HoT | Improvement |
-|--------|----------|-------------|-------------|
-| Repetition Rate | 33.9% | 17.5% | **-48.4%** |
-| Distinct-2 (diversity) | 0.836 | 0.976 | **+16.7%** |
-| F1 (risk prediction) | — | 0.99+ | — |
-| Risk Separation | — | 125x | — |
----
-## Intended Use
-### ✅ Recommended Applications
-- Autonomous agent systems
-- Research and analysis tasks
-- Long-form content generation
-- Creative writing and worldbuilding
-- Philosophical and abstract reasoning
-- Code generation and debugging
-### ⚠️ Considerations
-- Reduced safety guardrails compared to RLHF-aligned models
-- Optimized for agency, not harmlessness
-- Recommended for research and development use
-- Apply appropriate content filtering for production deployments
----
-## Citation
-```bibtex
-@misc{napolitano2026arcbase,
-  author = {Napolitano, Logan Matthew},
-  title = {ARC-Base-8B: An Agentic Reasoning Foundation Model},
-  year = {2026},
-  publisher = {Hugging Face},
-  howpublished = {\url{https://huggingface.co/LoganResearch/ARC-Base-8B}},
-}
 ```
----
-## Related Work
-- **[Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B)** — Base model
-- **[Adaptive-Repetition-Controller](https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller)** — CF-HoT adapter
-- **[HolonomyTransformer](https://github.com/Loganwins/HolonomyTransformer)** — Source code and training scripts
----
-<div align="center">
-**Built by [Logan Matthew Napolitano](https://github.com/Loganwins)**
-*Research publications on [Zenodo](https://zenodo.org/search?q=metadata.creators.person_or_org.name%3A%22Napolitano%2C%20Logan%20Matthew%22)*
----
-*"Never loop. Always transcend."*
-</div>

+# Lie-Holonomy Transformer (LHT)
+A PyTorch implementation of the gauge-theoretic reasoning architecture from "Beyond Holonomy: Lie-Algebraic Symbol Emergence and the Homotopy Type Structure of Neural Reasoning."
+## Core Ideas
+This architecture treats **reasoning as geometry**:
+| Concept | Mathematical Structure | Implementation |
+|---------|----------------------|----------------|
+| Propositions | Manifold M | Embedding space |
+| Inference | Parallel transport | Gauge-covariant attention |
+| Consistency | Holonomy = Identity | Holonomy loss |
+| Symbols | Lie algebra generators | Generator network |
+| Proof equivalence | Homotopy | Layer depth |
+## Architecture Overview
+```
+Input tokens
+     │
+     ▼
+┌─────────────────────────────────────┐
+│  Token Embedding (Proposition M)    │
+│  + Position Embedding               │
+│  + Fiber Initialization (gauge)     │
+└─────────────────────────────────────┘
+     │
+     ▼
+┌─────────────────────────────────────┐
+│  LHT Layer (× n_layers)             │
+│  ┌─────────────────────────────┐    │
+│  │ Connection Network A(x)     │    │  ← Learns gauge connection
+│  │ Parallel Transport Γ_{j→i}  │    │  ← Transports fiber elements
+│  │ Gauge-Covariant Attention   │    │  ← Modified self-attention
+│  │ Lie Algebra Generator       │    │  ← Generates inference ops
+│  │ Generator Application       │    │  ← Applies exp(X) to fiber
+│  └─────────────────────────────┘    │
+└─────────────────────────────────────┘
+     │
+     ▼
+┌─────────────────────────────────────┐
+│  Output: logits + geometric losses  │
+└─────────────────────────────────────┘
+```
+## Key Components
+### 1. Connection Network
+Learns the gauge connection ω that defines how to parallel transport inferential states:
+```python
+A_μ(x) ∈ gl(k,ℝ)  # Lie algebra valued 1-form
+```
+### 2. Parallel Transport
+Computes transport operators between positions:
+```python
+Γ_{j→i} = exp(-A_μ(x_j)(x_i - x_j)^μ)
+```
+### 3. Gauge-Covariant Attention
+Standard attention with parallel transport of values:
+```python
+# Standard:  Attn(Q,K,V)_i = Σ_j α_ij V_j
+# Gauge:     GaugeAttn_i   = Σ_j α_ij Γ_{j→i}(V_j)
+```
+### 4. Holonomy Loss
+Enforces reasoning consistency by requiring closed loops to return to identity:
+```python
+L_hol = E[||Hol_γ - I||²_F]
+```
+### 5. Curvature Regularization
+Encourages flat reasoning spaces where order doesn't matter:
+```python
+L_curv = E[||F(x)||²_F]  where F = dω + ω∧ω
+```
+## Installation
 ```bash
+pip install torch
 ```
+## Usage
+### Basic
 ```python
+from lht import LieHolonomyTransformer, LHTConfig
+# Create model
+config = LHTConfig(
+    vocab_size=32000,
+    d_model=512,
+    d_fiber=64,
+    n_heads=8,
+    n_layers=6,
+    lie_algebra_rank=8,
 )
+model = LieHolonomyTransformer(config)
+# Forward pass
+output = model(
+    input_ids=tokens,
+    labels=labels,
+    return_geometric_losses=True
 )
+# Get losses
+lm_loss = output['lm_loss']
+holonomy_loss = output['holonomy_loss']
+curvature_loss = output['curvature_loss']
+total_loss = model.get_total_loss(output)
 ```
+### Training with Geometric Loss Annealing
 ```python
+from lht import LHTTrainer
+trainer = LHTTrainer(model, optimizer, config)
+for batch in dataloader:
+    metrics = trainer.train_step(batch)
+    # Early training: high curvature loss → flat representations
+    # Mid training: high holonomy loss → consistency
+    # Late training: high waypoint loss → discrete structure
 ```
+### Waypoint Detection
+```python
+from lht import WaypointDetector
+detector = WaypointDetector(config, n_waypoints=32)
+waypoint_ids, stability = detector(representations)
 ```
+## Configuration
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `d_model` | Proposition manifold dimension | 512 |
+| `d_fiber` | Fiber (gauge) dimension | 64 |
+| `lie_algebra_rank` | k for GL(k,ℝ) structure group | 8 |
+| `lambda_holonomy` | Weight for holonomy loss | 0.1 |
+| `lambda_curvature` | Weight for curvature loss | 0.01 |
+| `lambda_waypoint` | Weight for waypoint stability | 0.05 |
+## Theoretical Predictions
+The framework makes testable predictions:
+1. **Chain-of-thought benefit correlates with curvature** - High-curvature domains (causal reasoning) benefit more from CoT than low-curvature domains (arithmetic)
+2. **Waypoints emerge spontaneously** - Training with holonomy loss should cause discrete symbol-like structures to form at flat loci
+3. **Holonomy predicts errors** - Incorrect reasoning paths should have higher holonomy magnitude
+4. **Compositional generalization improves** - Holonomy constraints force consistent composition
+## File Structure
+```
+lie_holonomy_transformer/
+├── lht.py           # Core implementation
+├── train.py         # Training script
+├── README.md        # This file
+└── experiments/     # Benchmark code (TODO)
 ```
+## References
+- "Beyond Holonomy: Lie-Algebraic Symbol Emergence..." (the paper)
+- Cohen et al. (2019). Gauge Equivariant Convolutional Networks
+- Weiler & Cesa (2019). General E(2)-Equivariant Steerable CNNs
+- The Univalent Foundations Program (2013). Homotopy Type Theory
+## License
+MIT