LoganResearch commited on
Commit
15bc491
Β·
verified Β·
1 Parent(s): 3bdb9c3

Scientific model card - Logan Matthew Napolitano

Browse files
Files changed (1) hide show
  1. README.md +180 -0
README.md ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Lie-Holonomy Transformer (LHT)
2
+
3
+ A PyTorch implementation of the gauge-theoretic reasoning architecture from "Beyond Holonomy: Lie-Algebraic Symbol Emergence and the Homotopy Type Structure of Neural Reasoning."
4
+
5
+ ## Core Ideas
6
+
7
+ This architecture treats **reasoning as geometry**:
8
+
9
+ | Concept | Mathematical Structure | Implementation |
10
+ |---------|----------------------|----------------|
11
+ | Propositions | Manifold M | Embedding space |
12
+ | Inference | Parallel transport | Gauge-covariant attention |
13
+ | Consistency | Holonomy = Identity | Holonomy loss |
14
+ | Symbols | Lie algebra generators | Generator network |
15
+ | Proof equivalence | Homotopy | Layer depth |
16
+
17
+ ## Architecture Overview
18
+
19
+ ```
20
+ Input tokens
21
+ β”‚
22
+ β–Ό
23
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
24
+ β”‚ Token Embedding (Proposition M) β”‚
25
+ β”‚ + Position Embedding β”‚
26
+ β”‚ + Fiber Initialization (gauge) β”‚
27
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
28
+ β”‚
29
+ β–Ό
30
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
31
+ β”‚ LHT Layer (Γ— n_layers) β”‚
32
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
33
+ β”‚ β”‚ Connection Network A(x) β”‚ β”‚ ← Learns gauge connection
34
+ β”‚ β”‚ Parallel Transport Ξ“_{jβ†’i} β”‚ β”‚ ← Transports fiber elements
35
+ β”‚ β”‚ Gauge-Covariant Attention β”‚ β”‚ ← Modified self-attention
36
+ β”‚ β”‚ Lie Algebra Generator β”‚ β”‚ ← Generates inference ops
37
+ β”‚ β”‚ Generator Application β”‚ β”‚ ← Applies exp(X) to fiber
38
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
39
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
40
+ β”‚
41
+ β–Ό
42
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
43
+ β”‚ Output: logits + geometric losses β”‚
44
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
45
+ ```
46
+
47
+ ## Key Components
48
+
49
+ ### 1. Connection Network
50
+ Learns the gauge connection Ο‰ that defines how to parallel transport inferential states:
51
+ ```python
52
+ A_ΞΌ(x) ∈ gl(k,ℝ) # Lie algebra valued 1-form
53
+ ```
54
+
55
+ ### 2. Parallel Transport
56
+ Computes transport operators between positions:
57
+ ```python
58
+ Γ_{j→i} = exp(-A_μ(x_j)(x_i - x_j)^μ)
59
+ ```
60
+
61
+ ### 3. Gauge-Covariant Attention
62
+ Standard attention with parallel transport of values:
63
+ ```python
64
+ # Standard: Attn(Q,K,V)_i = Ξ£_j Ξ±_ij V_j
65
+ # Gauge: GaugeAttn_i = Σ_j α_ij Γ_{j→i}(V_j)
66
+ ```
67
+
68
+ ### 4. Holonomy Loss
69
+ Enforces reasoning consistency by requiring closed loops to return to identity:
70
+ ```python
71
+ L_hol = E[||Hol_Ξ³ - I||Β²_F]
72
+ ```
73
+
74
+ ### 5. Curvature Regularization
75
+ Encourages flat reasoning spaces where order doesn't matter:
76
+ ```python
77
+ L_curv = E[||F(x)||Β²_F] where F = dΟ‰ + Ο‰βˆ§Ο‰
78
+ ```
79
+
80
+ ## Installation
81
+
82
+ ```bash
83
+ pip install torch
84
+ ```
85
+
86
+ ## Usage
87
+
88
+ ### Basic
89
+ ```python
90
+ from lht import LieHolonomyTransformer, LHTConfig
91
+
92
+ # Create model
93
+ config = LHTConfig(
94
+ vocab_size=32000,
95
+ d_model=512,
96
+ d_fiber=64,
97
+ n_heads=8,
98
+ n_layers=6,
99
+ lie_algebra_rank=8,
100
+ )
101
+ model = LieHolonomyTransformer(config)
102
+
103
+ # Forward pass
104
+ output = model(
105
+ input_ids=tokens,
106
+ labels=labels,
107
+ return_geometric_losses=True
108
+ )
109
+
110
+ # Get losses
111
+ lm_loss = output['lm_loss']
112
+ holonomy_loss = output['holonomy_loss']
113
+ curvature_loss = output['curvature_loss']
114
+ total_loss = model.get_total_loss(output)
115
+ ```
116
+
117
+ ### Training with Geometric Loss Annealing
118
+ ```python
119
+ from lht import LHTTrainer
120
+
121
+ trainer = LHTTrainer(model, optimizer, config)
122
+
123
+ for batch in dataloader:
124
+ metrics = trainer.train_step(batch)
125
+ # Early training: high curvature loss β†’ flat representations
126
+ # Mid training: high holonomy loss β†’ consistency
127
+ # Late training: high waypoint loss β†’ discrete structure
128
+ ```
129
+
130
+ ### Waypoint Detection
131
+ ```python
132
+ from lht import WaypointDetector
133
+
134
+ detector = WaypointDetector(config, n_waypoints=32)
135
+ waypoint_ids, stability = detector(representations)
136
+ ```
137
+
138
+ ## Configuration
139
+
140
+ | Parameter | Description | Default |
141
+ |-----------|-------------|---------|
142
+ | `d_model` | Proposition manifold dimension | 512 |
143
+ | `d_fiber` | Fiber (gauge) dimension | 64 |
144
+ | `lie_algebra_rank` | k for GL(k,ℝ) structure group | 8 |
145
+ | `lambda_holonomy` | Weight for holonomy loss | 0.1 |
146
+ | `lambda_curvature` | Weight for curvature loss | 0.01 |
147
+ | `lambda_waypoint` | Weight for waypoint stability | 0.05 |
148
+
149
+ ## Theoretical Predictions
150
+
151
+ The framework makes testable predictions:
152
+
153
+ 1. **Chain-of-thought benefit correlates with curvature** - High-curvature domains (causal reasoning) benefit more from CoT than low-curvature domains (arithmetic)
154
+
155
+ 2. **Waypoints emerge spontaneously** - Training with holonomy loss should cause discrete symbol-like structures to form at flat loci
156
+
157
+ 3. **Holonomy predicts errors** - Incorrect reasoning paths should have higher holonomy magnitude
158
+
159
+ 4. **Compositional generalization improves** - Holonomy constraints force consistent composition
160
+
161
+ ## File Structure
162
+
163
+ ```
164
+ lie_holonomy_transformer/
165
+ β”œβ”€β”€ lht.py # Core implementation
166
+ β”œβ”€β”€ train.py # Training script
167
+ β”œβ”€β”€ README.md # This file
168
+ └── experiments/ # Benchmark code (TODO)
169
+ ```
170
+
171
+ ## References
172
+
173
+ - "Beyond Holonomy: Lie-Algebraic Symbol Emergence..." (the paper)
174
+ - Cohen et al. (2019). Gauge Equivariant Convolutional Networks
175
+ - Weiler & Cesa (2019). General E(2)-Equivariant Steerable CNNs
176
+ - The Univalent Foundations Program (2013). Homotopy Type Theory
177
+
178
+ ## License
179
+
180
+ MIT