pedromoreira22 commited on
Commit
09e2cfe
·
verified ·
1 Parent(s): efb67d5

Add comprehensive README with research overview and preliminary results

Browse files
Files changed (1) hide show
  1. README.md +158 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # EML Trainability Study: Can We Turn Theoretical Universality Into Practical Training?
2
+
3
+ ## Overview
4
+
5
+ This repository contains an empirical study of whether the **EML operator** `eml(x,y) = exp(x) - ln(y)` from [arXiv:2603.21852](https://arxiv.org/abs/2603.21852) can be made practically trainable for **symbolic regression** via gradient descent.
6
+
7
+ ### The Theoretical Discovery
8
+ The EML paper proved that **every elementary mathematical function** — addition, multiplication, trigonometry, logarithms, π, e, etc. — can be generated from just one binary operator and the constant 1:
9
+
10
+ ```
11
+ eml(x, y) = exp(x) − ln(y)
12
+ ```
13
+
14
+ This is analogous to how the NAND gate generates all Boolean logic. The grammar is trivially simple: `S → 1 | eml(S, S)`.
15
+
16
+ ### The Practical Problem
17
+ While mathematically universal, this crashes in code. Stacking exponentials 3-4 levels deep in floating-point arithmetic causes numbers to **explode to infinity** or **collapse to zero**. The paper itself reports:
18
+ - **Depth 1-2**: 100% recovery from random initialization
19
+ - **Depth 3-4**: ~25% recovery
20
+ - **Depth 5+**: <1% recovery
21
+ - **Depth 6**: 0% in 448 attempts
22
+
23
+ Yet paradoxically, when initialized **near the correct solution**, recovery is 100% even at depth 5-6. The basins of attraction exist — they're just needle-in-a-haystack from random init.
24
+
25
+ ## Research Questions
26
+
27
+ 1. **Which numerical stability techniques** most improve deep EML tree training?
28
+ 2. **What is the maximum recoverable tree depth** with enhanced methods?
29
+ 3. **Can EML-based SR recover real physics equations** (Feynman benchmark)?
30
+
31
+ ## Methods
32
+
33
+ ### Stability Techniques Tested
34
+
35
+ | Method | Description | Source |
36
+ |--------|-------------|--------|
37
+ | **Soft routing** | Standard softmax input selection (baseline) | EML paper §4.3 |
38
+ | **Gumbel-hard** | Straight-through Gumbel-softmax — hard selection in forward, soft gradients in backward | Jang et al. 2017 |
39
+ | **Bounded** | `tanh(output/R) * R` normalization after each node | Inspired by NALU (Trask 2018) |
40
+ | **Combined** | Saturating linear: `x / (1 + |x|/R)` + Gumbel-hard routing | Novel combination |
41
+
42
+ ### Key Innovations
43
+
44
+ 1. **Hard routing prevents intermediate explosion**: Soft routing creates weighted mixtures of {1, x, f} that can produce arbitrary intermediate values. Hard selection ensures only one input is chosen per EML node, preventing the "exp of a mixture" problem.
45
+
46
+ 2. **Multi-loss training**: MSE + correlation loss (captures function shape regardless of scale) + entropy regularization (encourages discrete routing decisions).
47
+
48
+ 3. **Temperature annealing**: Start with high temperature (smooth, exploratory) and anneal to near-zero (hard, discrete) over training.
49
+
50
+ 4. **Multi-restart search**: Since basins are narrow, we run 20-30 random initializations per configuration and report best + success rates.
51
+
52
+ ### Architecture: The Master Formula
53
+
54
+ Following the paper's §4.3, we implement the EML master formula as a full binary tree:
55
+ - **Leaf nodes** select from `{1, x₁, ..., xₖ}` (constant and input variables)
56
+ - **Internal nodes** select from `{1, x₁, ..., xₖ, f_left, f_right}` (also including child outputs)
57
+ - Each selection is parameterized by learnable logits passed through Gumbel-softmax
58
+ - Output affine transform `a * eml(left, right) + b` per node
59
+
60
+ Total parameters: `O(5 × 2ⁿ)` for depth n (as stated in the paper).
61
+
62
+ ## Experimental Design
63
+
64
+ ### Phase 1: Known EML Identities
65
+ Test recovery of functions with known EML decompositions:
66
+
67
+ | Function | EML Depth | EML Expression |
68
+ |----------|-----------|----------------|
69
+ | `exp(x)` | 1 | `eml(x, 1)` |
70
+ | `e` (constant) | 1 | `eml(1, 1)` |
71
+ | `ln(x)` | 3 | `eml(1, eml(eml(1,x), 1))` |
72
+ | `-x` | 2 | Via composition |
73
+ | `1/x` | 3 | Via composition |
74
+ | `x + y` | 4 | Via exp/ln identities |
75
+ | `x × y` | 4+ | Via exp/ln identities |
76
+ | `x²` | 4 | `exp(2·ln(x))` |
77
+ | `√x` | 4 | `exp(0.5·ln(x))` |
78
+ | `sin(x)` | 5+ | Requires complex intermediates |
79
+
80
+ ### Phase 2: Feynman Physics Equations
81
+ A curated set of physics equations from the [SRSD-Feynman benchmark](https://arxiv.org/abs/2206.10540):
82
+ - Gaussian distribution: `exp(-θ²/2)/√(2π)`
83
+ - Euclidean distance: `√((x₂-x₁)² + (y₂-y₁)²)`
84
+ - Inverse square law: `F = q₁q₂/(4πε₀r²)`
85
+ - Relativistic mass: `m₀/√(1-v²/c²)`
86
+ - Harmonic oscillator: `E = ½kx²`
87
+ - And more...
88
+
89
+ ### Phase 3: Depth Scaling Analysis
90
+ Systematic measurement of recovery rate vs. depth using EML-native targets.
91
+
92
+ ## Key Literature References
93
+
94
+ | Topic | Paper | Key Insight |
95
+ |-------|-------|-------------|
96
+ | EML operator | [2603.21852](https://arxiv.org/abs/2603.21852) | Universal primitive for elementary functions |
97
+ | Gumbel-softmax | Jang et al. 2017 | Differentiable discrete selection |
98
+ | NALU | [1808.00508](https://arxiv.org/abs/1808.00508) | Stable exp-log arithmetic cells |
99
+ | NAU | [2001.05016](https://arxiv.org/abs/2001.05016) | Fixing NALU's gradient issues |
100
+ | Gradient clipping | [1211.5063](https://arxiv.org/abs/1211.5063) | Controlling exploding gradients |
101
+ | BFloat16 training | [2010.06192](https://arxiv.org/abs/2010.06192) | Kahan summation for precision |
102
+ | AutoNumerics-Zero | [2312.08472](https://arxiv.org/abs/2312.08472) | Range reduction for transcendentals |
103
+ | Numerical stability | [2501.04697](https://arxiv.org/abs/2501.04697) | Grokking at the edge of stability |
104
+ | Tropical geometry | [2505.17190](https://arxiv.org/abs/2505.17190) | Max-plus limit of log-sum-exp |
105
+ | AI Feynman | Udrescu & Tegmark 2020 | Physics equations benchmark |
106
+ | SRSD | [2206.10540](https://arxiv.org/abs/2206.10540) | Feynman benchmark with proper data |
107
+ | PySR | Cranmer 2023 | Evolutionary symbolic regression |
108
+ | TPSR | [2303.06833](https://arxiv.org/abs/2303.06833) | Transformer + MCTS for SR |
109
+
110
+ ## Preliminary Results (CPU validation)
111
+
112
+ From our CPU sandbox testing:
113
+
114
+ | Function | Depth | Best R² | Method | Notes |
115
+ |----------|-------|---------|--------|-------|
116
+ | `exp(x)` | 1 | **0.9999** | Gumbel-hard | ✅ Trivially recovered |
117
+ | `e` (const) | 1 | **0.9999** | Gumbel-hard | ✅ Correct: `eml(1,1)` |
118
+ | `ln(x)` | 3 | -0.08 | All methods | ❌ All 10 restarts fail |
119
+ | `x²` | 4 | TBD | - | Awaiting GPU results |
120
+
121
+ ### Key Observation
122
+ **The depth-3 barrier is real and severe.** Even with hard routing (Gumbel-softmax), bounded normalization, curriculum learning, and multi-loss training, recovering `ln(x)` from random initialization fails consistently. This aligns with the paper's finding of ~25% success at depth 3-4 and suggests that:
123
+
124
+ 1. The loss landscape at depth 3+ has **exponentially many local minima** relative to the one correct basin
125
+ 2. Better optimization (second-order methods, population-based search) may help
126
+ 3. **Informed initialization** (starting near known decompositions) is likely required for practical use
127
+
128
+ ## GPU Experiment Status
129
+
130
+ 🔄 **Running**: Full experiment on T4 GPU with 3 phases and 4 stability methods.
131
+ Job: `69e7837acd8c002f31e00d75`
132
+
133
+ Results will be uploaded to the `results/` folder upon completion.
134
+
135
+ ## How to Reproduce
136
+
137
+ ```python
138
+ # Install dependencies
139
+ pip install torch numpy huggingface_hub
140
+
141
+ # Run the full experiment
142
+ python code/eml_experiment.py
143
+ ```
144
+
145
+ ## Citation
146
+
147
+ If you use this work, please cite the original EML paper:
148
+ ```
149
+ @article{eml2026,
150
+ title={All elementary functions from a single operator},
151
+ author={...},
152
+ journal={arXiv preprint arXiv:2603.21852},
153
+ year={2026}
154
+ }
155
+ ```
156
+
157
+ ## License
158
+ MIT