# EML Trainability Study: Can We Turn Theoretical Universality Into Practical Training? ## Overview This repository contains an empirical study of whether the **EML operator** `eml(x,y) = exp(x) - ln(y)` from [arXiv:2603.21852](https://arxiv.org/abs/2603.21852) can be made practically trainable for **symbolic regression** via gradient descent. ### The Theoretical Discovery The EML paper proved that **every elementary mathematical function** — addition, multiplication, trigonometry, logarithms, π, e, etc. — can be generated from just one binary operator and the constant 1: ``` eml(x, y) = exp(x) − ln(y) ``` This is analogous to how the NAND gate generates all Boolean logic. The grammar is trivially simple: `S → 1 | eml(S, S)`. ### The Practical Problem While mathematically universal, this crashes in code. Stacking exponentials 3-4 levels deep in floating-point arithmetic causes numbers to **explode to infinity** or **collapse to zero**. The paper itself reports: - **Depth 1-2**: 100% recovery from random initialization - **Depth 3-4**: ~25% recovery - **Depth 5+**: <1% recovery - **Depth 6**: 0% in 448 attempts Yet paradoxically, when initialized **near the correct solution**, recovery is 100% even at depth 5-6. The basins of attraction exist — they're just needle-in-a-haystack from random init. ## Research Questions 1. **Which numerical stability techniques** most improve deep EML tree training? 2. **What is the maximum recoverable tree depth** with enhanced methods? 3. **Can EML-based SR recover real physics equations** (Feynman benchmark)? ## Methods ### Stability Techniques Tested | Method | Description | Source | |--------|-------------|--------| | **Soft routing** | Standard softmax input selection (baseline) | EML paper §4.3 | | **Gumbel-hard** | Straight-through Gumbel-softmax — hard selection in forward, soft gradients in backward | Jang et al. 2017 | | **Bounded** | `tanh(output/R) * R` normalization after each node | Inspired by NALU (Trask 2018) | | **Combined** | Saturating linear: `x / (1 + |x|/R)` + Gumbel-hard routing | Novel combination | ### Key Innovations 1. **Hard routing prevents intermediate explosion**: Soft routing creates weighted mixtures of {1, x, f} that can produce arbitrary intermediate values. Hard selection ensures only one input is chosen per EML node, preventing the "exp of a mixture" problem. 2. **Multi-loss training**: MSE + correlation loss (captures function shape regardless of scale) + entropy regularization (encourages discrete routing decisions). 3. **Temperature annealing**: Start with high temperature (smooth, exploratory) and anneal to near-zero (hard, discrete) over training. 4. **Multi-restart search**: Since basins are narrow, we run 20-30 random initializations per configuration and report best + success rates. ### Architecture: The Master Formula Following the paper's §4.3, we implement the EML master formula as a full binary tree: - **Leaf nodes** select from `{1, x₁, ..., xₖ}` (constant and input variables) - **Internal nodes** select from `{1, x₁, ..., xₖ, f_left, f_right}` (also including child outputs) - Each selection is parameterized by learnable logits passed through Gumbel-softmax - Output affine transform `a * eml(left, right) + b` per node Total parameters: `O(5 × 2ⁿ)` for depth n (as stated in the paper). ## Experimental Design ### Phase 1: Known EML Identities Test recovery of functions with known EML decompositions: | Function | EML Depth | EML Expression | |----------|-----------|----------------| | `exp(x)` | 1 | `eml(x, 1)` | | `e` (constant) | 1 | `eml(1, 1)` | | `ln(x)` | 3 | `eml(1, eml(eml(1,x), 1))` | | `-x` | 2 | Via composition | | `1/x` | 3 | Via composition | | `x + y` | 4 | Via exp/ln identities | | `x × y` | 4+ | Via exp/ln identities | | `x²` | 4 | `exp(2·ln(x))` | | `√x` | 4 | `exp(0.5·ln(x))` | | `sin(x)` | 5+ | Requires complex intermediates | ### Phase 2: Feynman Physics Equations A curated set of physics equations from the [SRSD-Feynman benchmark](https://arxiv.org/abs/2206.10540): - Gaussian distribution: `exp(-θ²/2)/√(2π)` - Euclidean distance: `√((x₂-x₁)² + (y₂-y₁)²)` - Inverse square law: `F = q₁q₂/(4πε₀r²)` - Relativistic mass: `m₀/√(1-v²/c²)` - Harmonic oscillator: `E = ½kx²` - And more... ### Phase 3: Depth Scaling Analysis Systematic measurement of recovery rate vs. depth using EML-native targets. ## Key Literature References | Topic | Paper | Key Insight | |-------|-------|-------------| | EML operator | [2603.21852](https://arxiv.org/abs/2603.21852) | Universal primitive for elementary functions | | Gumbel-softmax | Jang et al. 2017 | Differentiable discrete selection | | NALU | [1808.00508](https://arxiv.org/abs/1808.00508) | Stable exp-log arithmetic cells | | NAU | [2001.05016](https://arxiv.org/abs/2001.05016) | Fixing NALU's gradient issues | | Gradient clipping | [1211.5063](https://arxiv.org/abs/1211.5063) | Controlling exploding gradients | | BFloat16 training | [2010.06192](https://arxiv.org/abs/2010.06192) | Kahan summation for precision | | AutoNumerics-Zero | [2312.08472](https://arxiv.org/abs/2312.08472) | Range reduction for transcendentals | | Numerical stability | [2501.04697](https://arxiv.org/abs/2501.04697) | Grokking at the edge of stability | | Tropical geometry | [2505.17190](https://arxiv.org/abs/2505.17190) | Max-plus limit of log-sum-exp | | AI Feynman | Udrescu & Tegmark 2020 | Physics equations benchmark | | SRSD | [2206.10540](https://arxiv.org/abs/2206.10540) | Feynman benchmark with proper data | | PySR | Cranmer 2023 | Evolutionary symbolic regression | | TPSR | [2303.06833](https://arxiv.org/abs/2303.06833) | Transformer + MCTS for SR | ## Preliminary Results (CPU validation) From our CPU sandbox testing: | Function | Depth | Best R² | Method | Notes | |----------|-------|---------|--------|-------| | `exp(x)` | 1 | **0.9999** | Gumbel-hard | ✅ Trivially recovered | | `e` (const) | 1 | **0.9999** | Gumbel-hard | ✅ Correct: `eml(1,1)` | | `ln(x)` | 3 | -0.08 | All methods | ❌ All 10 restarts fail | | `x²` | 4 | TBD | - | Awaiting GPU results | ### Key Observation **The depth-3 barrier is real and severe.** Even with hard routing (Gumbel-softmax), bounded normalization, curriculum learning, and multi-loss training, recovering `ln(x)` from random initialization fails consistently. This aligns with the paper's finding of ~25% success at depth 3-4 and suggests that: 1. The loss landscape at depth 3+ has **exponentially many local minima** relative to the one correct basin 2. Better optimization (second-order methods, population-based search) may help 3. **Informed initialization** (starting near known decompositions) is likely required for practical use ## GPU Experiment Status 🔄 **Running**: Full experiment on T4 GPU with 3 phases and 4 stability methods. Job: `69e7837acd8c002f31e00d75` Results will be uploaded to the `results/` folder upon completion. ## How to Reproduce ```python # Install dependencies pip install torch numpy huggingface_hub # Run the full experiment python code/eml_experiment.py ``` ## Citation If you use this work, please cite the original EML paper: ``` @article{eml2026, title={All elementary functions from a single operator}, author={...}, journal={arXiv preprint arXiv:2603.21852}, year={2026} } ``` ## License MIT