| # EML Trainability Study: Can We Turn Theoretical Universality Into Practical Training? |
|
|
| ## Overview |
|
|
| This repository contains an empirical study of whether the **EML operator** `eml(x,y) = exp(x) - ln(y)` from [arXiv:2603.21852](https://arxiv.org/abs/2603.21852) can be made practically trainable for **symbolic regression** via gradient descent. |
|
|
| ### The Theoretical Discovery |
| The EML paper proved that **every elementary mathematical function** — addition, multiplication, trigonometry, logarithms, π, e, etc. — can be generated from just one binary operator and the constant 1: |
|
|
| ``` |
| eml(x, y) = exp(x) − ln(y) |
| ``` |
|
|
| This is analogous to how the NAND gate generates all Boolean logic. The grammar is trivially simple: `S → 1 | eml(S, S)`. |
|
|
| ### The Practical Problem |
| While mathematically universal, this crashes in code. Stacking exponentials 3-4 levels deep in floating-point arithmetic causes numbers to **explode to infinity** or **collapse to zero**. The paper itself reports: |
| - **Depth 1-2**: 100% recovery from random initialization |
| - **Depth 3-4**: ~25% recovery |
| - **Depth 5+**: <1% recovery |
| - **Depth 6**: 0% in 448 attempts |
|
|
| Yet paradoxically, when initialized **near the correct solution**, recovery is 100% even at depth 5-6. The basins of attraction exist — they're just needle-in-a-haystack from random init. |
|
|
| ## Research Questions |
|
|
| 1. **Which numerical stability techniques** most improve deep EML tree training? |
| 2. **What is the maximum recoverable tree depth** with enhanced methods? |
| 3. **Can EML-based SR recover real physics equations** (Feynman benchmark)? |
|
|
| ## Methods |
|
|
| ### Stability Techniques Tested |
|
|
| | Method | Description | Source | |
| |--------|-------------|--------| |
| | **Soft routing** | Standard softmax input selection (baseline) | EML paper §4.3 | |
| | **Gumbel-hard** | Straight-through Gumbel-softmax — hard selection in forward, soft gradients in backward | Jang et al. 2017 | |
| | **Bounded** | `tanh(output/R) * R` normalization after each node | Inspired by NALU (Trask 2018) | |
| | **Combined** | Saturating linear: `x / (1 + |x|/R)` + Gumbel-hard routing | Novel combination | |
|
|
| ### Key Innovations |
|
|
| 1. **Hard routing prevents intermediate explosion**: Soft routing creates weighted mixtures of {1, x, f} that can produce arbitrary intermediate values. Hard selection ensures only one input is chosen per EML node, preventing the "exp of a mixture" problem. |
|
|
| 2. **Multi-loss training**: MSE + correlation loss (captures function shape regardless of scale) + entropy regularization (encourages discrete routing decisions). |
|
|
| 3. **Temperature annealing**: Start with high temperature (smooth, exploratory) and anneal to near-zero (hard, discrete) over training. |
|
|
| 4. **Multi-restart search**: Since basins are narrow, we run 20-30 random initializations per configuration and report best + success rates. |
|
|
| ### Architecture: The Master Formula |
|
|
| Following the paper's §4.3, we implement the EML master formula as a full binary tree: |
| - **Leaf nodes** select from `{1, x₁, ..., xₖ}` (constant and input variables) |
| - **Internal nodes** select from `{1, x₁, ..., xₖ, f_left, f_right}` (also including child outputs) |
| - Each selection is parameterized by learnable logits passed through Gumbel-softmax |
| - Output affine transform `a * eml(left, right) + b` per node |
|
|
| Total parameters: `O(5 × 2ⁿ)` for depth n (as stated in the paper). |
|
|
| ## Experimental Design |
|
|
| ### Phase 1: Known EML Identities |
| Test recovery of functions with known EML decompositions: |
|
|
| | Function | EML Depth | EML Expression | |
| |----------|-----------|----------------| |
| | `exp(x)` | 1 | `eml(x, 1)` | |
| | `e` (constant) | 1 | `eml(1, 1)` | |
| | `ln(x)` | 3 | `eml(1, eml(eml(1,x), 1))` | |
| | `-x` | 2 | Via composition | |
| | `1/x` | 3 | Via composition | |
| | `x + y` | 4 | Via exp/ln identities | |
| | `x × y` | 4+ | Via exp/ln identities | |
| | `x²` | 4 | `exp(2·ln(x))` | |
| | `√x` | 4 | `exp(0.5·ln(x))` | |
| | `sin(x)` | 5+ | Requires complex intermediates | |
|
|
| ### Phase 2: Feynman Physics Equations |
| A curated set of physics equations from the [SRSD-Feynman benchmark](https://arxiv.org/abs/2206.10540): |
| - Gaussian distribution: `exp(-θ²/2)/√(2π)` |
| - Euclidean distance: `√((x₂-x₁)² + (y₂-y₁)²)` |
| - Inverse square law: `F = q₁q₂/(4πε₀r²)` |
| - Relativistic mass: `m₀/√(1-v²/c²)` |
| - Harmonic oscillator: `E = ½kx²` |
| - And more... |
|
|
| ### Phase 3: Depth Scaling Analysis |
| Systematic measurement of recovery rate vs. depth using EML-native targets. |
|
|
| ## Key Literature References |
|
|
| | Topic | Paper | Key Insight | |
| |-------|-------|-------------| |
| | EML operator | [2603.21852](https://arxiv.org/abs/2603.21852) | Universal primitive for elementary functions | |
| | Gumbel-softmax | Jang et al. 2017 | Differentiable discrete selection | |
| | NALU | [1808.00508](https://arxiv.org/abs/1808.00508) | Stable exp-log arithmetic cells | |
| | NAU | [2001.05016](https://arxiv.org/abs/2001.05016) | Fixing NALU's gradient issues | |
| | Gradient clipping | [1211.5063](https://arxiv.org/abs/1211.5063) | Controlling exploding gradients | |
| | BFloat16 training | [2010.06192](https://arxiv.org/abs/2010.06192) | Kahan summation for precision | |
| | AutoNumerics-Zero | [2312.08472](https://arxiv.org/abs/2312.08472) | Range reduction for transcendentals | |
| | Numerical stability | [2501.04697](https://arxiv.org/abs/2501.04697) | Grokking at the edge of stability | |
| | Tropical geometry | [2505.17190](https://arxiv.org/abs/2505.17190) | Max-plus limit of log-sum-exp | |
| | AI Feynman | Udrescu & Tegmark 2020 | Physics equations benchmark | |
| | SRSD | [2206.10540](https://arxiv.org/abs/2206.10540) | Feynman benchmark with proper data | |
| | PySR | Cranmer 2023 | Evolutionary symbolic regression | |
| | TPSR | [2303.06833](https://arxiv.org/abs/2303.06833) | Transformer + MCTS for SR | |
|
|
| ## Preliminary Results (CPU validation) |
|
|
| From our CPU sandbox testing: |
|
|
| | Function | Depth | Best R² | Method | Notes | |
| |----------|-------|---------|--------|-------| |
| | `exp(x)` | 1 | **0.9999** | Gumbel-hard | ✅ Trivially recovered | |
| | `e` (const) | 1 | **0.9999** | Gumbel-hard | ✅ Correct: `eml(1,1)` | |
| | `ln(x)` | 3 | -0.08 | All methods | ❌ All 10 restarts fail | |
| | `x²` | 4 | TBD | - | Awaiting GPU results | |
|
|
| ### Key Observation |
| **The depth-3 barrier is real and severe.** Even with hard routing (Gumbel-softmax), bounded normalization, curriculum learning, and multi-loss training, recovering `ln(x)` from random initialization fails consistently. This aligns with the paper's finding of ~25% success at depth 3-4 and suggests that: |
|
|
| 1. The loss landscape at depth 3+ has **exponentially many local minima** relative to the one correct basin |
| 2. Better optimization (second-order methods, population-based search) may help |
| 3. **Informed initialization** (starting near known decompositions) is likely required for practical use |
|
|
| ## GPU Experiment Status |
|
|
| 🔄 **Running**: Full experiment on T4 GPU with 3 phases and 4 stability methods. |
| Job: `69e7837acd8c002f31e00d75` |
|
|
| Results will be uploaded to the `results/` folder upon completion. |
|
|
| ## How to Reproduce |
|
|
| ```python |
| # Install dependencies |
| pip install torch numpy huggingface_hub |
| |
| # Run the full experiment |
| python code/eml_experiment.py |
| ``` |
|
|
| ## Citation |
|
|
| If you use this work, please cite the original EML paper: |
| ``` |
| @article{eml2026, |
| title={All elementary functions from a single operator}, |
| author={...}, |
| journal={arXiv preprint arXiv:2603.21852}, |
| year={2026} |
| } |
| ``` |
|
|
| ## License |
| MIT |
|
|