pedromoreira22
/

eml-trainability-study

Model card Files Files and versions

xet

Community

pedromoreira22 commited on Apr 21

Commit

09e2cfe

verified ·

1 Parent(s): efb67d5

Add comprehensive README with research overview and preliminary results

Browse files

Files changed (1) hide show

README.md +158 -0

README.md ADDED Viewed

	@@ -0,0 +1,158 @@

+# EML Trainability Study: Can We Turn Theoretical Universality Into Practical Training?
+## Overview
+This repository contains an empirical study of whether the **EML operator** `eml(x,y) = exp(x) - ln(y)` from [arXiv:2603.21852](https://arxiv.org/abs/2603.21852) can be made practically trainable for **symbolic regression** via gradient descent.
+### The Theoretical Discovery
+The EML paper proved that **every elementary mathematical function** — addition, multiplication, trigonometry, logarithms, π, e, etc. — can be generated from just one binary operator and the constant 1:
+```
+eml(x, y) = exp(x) − ln(y)
+```
+This is analogous to how the NAND gate generates all Boolean logic. The grammar is trivially simple: `S → 1 | eml(S, S)`.
+### The Practical Problem
+While mathematically universal, this crashes in code. Stacking exponentials 3-4 levels deep in floating-point arithmetic causes numbers to **explode to infinity** or **collapse to zero**. The paper itself reports:
+- **Depth 1-2**: 100% recovery from random initialization
+- **Depth 3-4**: ~25% recovery
+- **Depth 5+**: <1% recovery
+- **Depth 6**: 0% in 448 attempts
+Yet paradoxically, when initialized **near the correct solution**, recovery is 100% even at depth 5-6. The basins of attraction exist — they're just needle-in-a-haystack from random init.
+## Research Questions
+1. **Which numerical stability techniques** most improve deep EML tree training?
+2. **What is the maximum recoverable tree depth** with enhanced methods?
+3. **Can EML-based SR recover real physics equations** (Feynman benchmark)?
+## Methods
+### Stability Techniques Tested
+| Method | Description | Source |
+|--------|-------------|--------|
+| **Soft routing** | Standard softmax input selection (baseline) | EML paper §4.3 |
+| **Gumbel-hard** | Straight-through Gumbel-softmax — hard selection in forward, soft gradients in backward | Jang et al. 2017 |
+| **Bounded** | `tanh(output/R) * R` normalization after each node | Inspired by NALU (Trask 2018) |
+| **Combined** | Saturating linear: `x / (1 + |x|/R)` + Gumbel-hard routing | Novel combination |
+### Key Innovations
+1. **Hard routing prevents intermediate explosion**: Soft routing creates weighted mixtures of {1, x, f} that can produce arbitrary intermediate values. Hard selection ensures only one input is chosen per EML node, preventing the "exp of a mixture" problem.
+2. **Multi-loss training**: MSE + correlation loss (captures function shape regardless of scale) + entropy regularization (encourages discrete routing decisions).
+3. **Temperature annealing**: Start with high temperature (smooth, exploratory) and anneal to near-zero (hard, discrete) over training.
+4. **Multi-restart search**: Since basins are narrow, we run 20-30 random initializations per configuration and report best + success rates.
+### Architecture: The Master Formula
+Following the paper's §4.3, we implement the EML master formula as a full binary tree:
+- **Leaf nodes** select from `{1, x₁, ..., xₖ}` (constant and input variables)
+- **Internal nodes** select from `{1, x₁, ..., xₖ, f_left, f_right}` (also including child outputs)
+- Each selection is parameterized by learnable logits passed through Gumbel-softmax
+- Output affine transform `a * eml(left, right) + b` per node
+Total parameters: `O(5 × 2ⁿ)` for depth n (as stated in the paper).
+## Experimental Design
+### Phase 1: Known EML Identities
+Test recovery of functions with known EML decompositions:
+| Function | EML Depth | EML Expression |
+|----------|-----------|----------------|
+| `exp(x)` | 1 | `eml(x, 1)` |
+| `e` (constant) | 1 | `eml(1, 1)` |
+| `ln(x)` | 3 | `eml(1, eml(eml(1,x), 1))` |
+| `-x` | 2 | Via composition |
+| `1/x` | 3 | Via composition |
+| `x + y` | 4 | Via exp/ln identities |
+| `x × y` | 4+ | Via exp/ln identities |
+| `x²` | 4 | `exp(2·ln(x))` |
+| `√x` | 4 | `exp(0.5·ln(x))` |
+| `sin(x)` | 5+ | Requires complex intermediates |
+### Phase 2: Feynman Physics Equations
+A curated set of physics equations from the [SRSD-Feynman benchmark](https://arxiv.org/abs/2206.10540):
+- Gaussian distribution: `exp(-θ²/2)/√(2π)`
+- Euclidean distance: `√((x₂-x₁)² + (y₂-y₁)²)`
+- Inverse square law: `F = q₁q₂/(4πε₀r²)`
+- Relativistic mass: `m₀/√(1-v²/c²)`
+- Harmonic oscillator: `E = ½kx²`
+- And more...
+### Phase 3: Depth Scaling Analysis
+Systematic measurement of recovery rate vs. depth using EML-native targets.
+## Key Literature References
+| Topic | Paper | Key Insight |
+|-------|-------|-------------|
+| EML operator | [2603.21852](https://arxiv.org/abs/2603.21852) | Universal primitive for elementary functions |
+| Gumbel-softmax | Jang et al. 2017 | Differentiable discrete selection |
+| NALU | [1808.00508](https://arxiv.org/abs/1808.00508) | Stable exp-log arithmetic cells |
+| NAU | [2001.05016](https://arxiv.org/abs/2001.05016) | Fixing NALU's gradient issues |
+| Gradient clipping | [1211.5063](https://arxiv.org/abs/1211.5063) | Controlling exploding gradients |
+| BFloat16 training | [2010.06192](https://arxiv.org/abs/2010.06192) | Kahan summation for precision |
+| AutoNumerics-Zero | [2312.08472](https://arxiv.org/abs/2312.08472) | Range reduction for transcendentals |
+| Numerical stability | [2501.04697](https://arxiv.org/abs/2501.04697) | Grokking at the edge of stability |
+| Tropical geometry | [2505.17190](https://arxiv.org/abs/2505.17190) | Max-plus limit of log-sum-exp |
+| AI Feynman | Udrescu & Tegmark 2020 | Physics equations benchmark |
+| SRSD | [2206.10540](https://arxiv.org/abs/2206.10540) | Feynman benchmark with proper data |
+| PySR | Cranmer 2023 | Evolutionary symbolic regression |
+| TPSR | [2303.06833](https://arxiv.org/abs/2303.06833) | Transformer + MCTS for SR |
+## Preliminary Results (CPU validation)
+From our CPU sandbox testing:
+| Function | Depth | Best R² | Method | Notes |
+|----------|-------|---------|--------|-------|
+| `exp(x)` | 1 | **0.9999** | Gumbel-hard | ✅ Trivially recovered |
+| `e` (const) | 1 | **0.9999** | Gumbel-hard | ✅ Correct: `eml(1,1)` |
+| `ln(x)` | 3 | -0.08 | All methods | ❌ All 10 restarts fail |
+| `x²` | 4 | TBD | - | Awaiting GPU results |
+### Key Observation
+**The depth-3 barrier is real and severe.** Even with hard routing (Gumbel-softmax), bounded normalization, curriculum learning, and multi-loss training, recovering `ln(x)` from random initialization fails consistently. This aligns with the paper's finding of ~25% success at depth 3-4 and suggests that:
+1. The loss landscape at depth 3+ has **exponentially many local minima** relative to the one correct basin
+2. Better optimization (second-order methods, population-based search) may help
+3. **Informed initialization** (starting near known decompositions) is likely required for practical use
+## GPU Experiment Status
+🔄 **Running**: Full experiment on T4 GPU with 3 phases and 4 stability methods.
+Job: `69e7837acd8c002f31e00d75`
+Results will be uploaded to the `results/` folder upon completion.
+## How to Reproduce
+```python
+# Install dependencies
+pip install torch numpy huggingface_hub
+# Run the full experiment
+python code/eml_experiment.py
+```
+## Citation
+If you use this work, please cite the original EML paper:
+```
+@article{eml2026,
+  title={All elementary functions from a single operator},
+  author={...},
+  journal={arXiv preprint arXiv:2603.21852},
+  year={2026}
+}
+```
+## License
+MIT