muooon commited on
Commit
f6d2c5c
·
verified ·
1 Parent(s): 8a80d17

Upload 2 files

Browse files
Files changed (2) hide show
  1. emo-v36-paper(ENG).txt +21 -71
  2. emo-v36-paper(JPN).txt +2 -2
emo-v36-paper(ENG).txt CHANGED
@@ -1,76 +1,44 @@
1
- Theoretical Foundations of Autonomous Optimization in EmoNAVI v3.6, EmoSENS v3.7.
2
 
3
- — Improving Regret Bound via High-Order Moment Approximation and Dynamic Distance Estimation —
4
-
5
- This study demonstrates the theoretical unification of emo-based optimizers from EmoNAVI (v3.6) to EmoSens (v3.7) and establishes a general framework for autonomous optimization using Multi-EMA, emoDrive, and emoPulse.
6
 
7
 
8
  1. Introduction
9
 
10
- In the optimization of deep learning, the dynamic adjustment of the learning rate is the most critical factor determining convergence performance. Conventional optimizers such as Adam and AMSGrad utilize the 1st and 2nd moments of gradients, but their ability to directly estimate the steepness (curvature) of the local loss landscape or the distance D to the optimal solution has been limited.
 
 
11
 
12
- This paper demonstrates that the "Emotion Scalar σt​" and "emoDrive" mechanism introduced in EmoNAVI v3.6 function mathematically as an approximation of high-order moments and an online implementation of D-adaptation (and COCOB theory) (Defazio & Mishchenko, 2023). We prove that these mechanisms achieve both extremely low hyperparameter sensitivity and robust convergence.
13
 
14
 
15
- 2. Mathematical Redefinition and High-Order Moment Approximation
16
 
17
- 2.1 Proxy Metric Generation via Multi-EMA
18
 
19
- EmoNAVI maintains three stages of Exponential Moving Averages (short, medium, long):
20
  EMAshort,t​=(1−αs​)EMAshort,t−1​+αs​Lt​
21
 
22
- The operation of taking the difference ΔEMA=EMAlong​−EMAshort between EMAs with different smoothing coefficients α corresponds to an approximation of higher-order derivatives of the loss function L over the time axis.
23
 
24
- Approximation of 3rd and 4th Moments: ΔEMA captures the rate of change in gradient fluctuations (changes in curvature).
25
 
26
- Historization of the 5th Moment: The emotion scalar σt​=tanh(ΔEMA/scale) is a statistic that non-linearly compresses this high-order information into the range of [−1,1]. By recursively including this in the update equation, the long-term "smoothness" of the landscape is reflected in the parameter updates.
27
 
28
 
29
- 3. Dynamic Distance Estimation (D-adaptation) via emoDrive
30
 
31
- 3.1 Online Approximation of D-Estimation
32
 
33
- D-adaptation algorithms estimate the optimal distance D from the initial point and make the learning rate proportional to D. In EmoNAVI, emoDrive fulfills this role of D.
34
 
35
- Acceleration Zone (High Confidence): In regions where σt is stable, the system determines that the current search direction is correct (i.e., on a linear path toward the optimal solution w) and boosts the effective step size by up to 8x or more. This is equivalent to an operation that exponentially increases the estimated distance D^.
36
 
37
- Suppression Zone (Low Confidence): During abrupt changes where ∣σt​∣>0.75, updates are suppressed on the order of O(1−∣σt​∣). This acts as a safety mechanism against sudden spikes in the local Lipschitz constant Lt​, corresponding to "resetting the betting amount when losing" in COCOB (Orabona & Tommasi, 2017).
38
 
39
- Note on High-Order Moments: 3rd: Skewness; 4th: Kurtosis; 5th: Temporal "fluctuation of fluctuations."
40
  ※ Higher-order moments are formed not by a single step but by “temporal integration.”
41
 
42
- 3.2 Mathematical Definition of Autonomous Control Using emoPulse and Encoding Normalization
43
-
44
- emoPulse: Generating Dynamic Learning Rates Based on Time-Series SNR
45
- The emoPulse mechanism introduced in EmoNAVI v3.7 and later (common to Sens, Airy, Cats) replaces the conventional fixed hyperparameter η₀ (base learning rate) with a dynamic scalar value calculated from the time-series statistics of the loss function. This represents a more pure integration of “regret minimization” and “estimation of the remaining distance to the optimal solution” in online learning.
46
-
47
- 1. Estimation of statistical noise and displacement distance emoPulse is generated using the following two internal variables:
48
- Noise Estimate (Nt): The EMA of the absolute value of emotional trust, quantifying the “hesitation (high-frequency oscillations)” in the current gradient direction.
49
- Nt​=(1−βnoise​)Nt−1​+βnoise​∣trustt​∣
50
-
51
- Distance Estimate (dt): The EMA of the positive confidence component, estimating the “effective progress (low-frequency trend)” toward the optimal solution w*.
52
- dt​=(1−βdist​)dt−1​+βdist​max(trustt​,0)
53
-
54
- Here, βnoise and βdist are smoothing coefficients that determine the observation window along the time axis.
55
-
56
- 2. Definition of emoPulse The effective learning rate ηt is defined as follows by normalizing the estimated distance dt with noise Nt and clipping it using the hyperparameter emoScope (field of view):
57
- emoPulset​=min(Nt​dt​​,upper_bound)
58
-
59
- (※For implementation purposes, the denominator is set to Nt + ϵ to stabilize it, with an upper limit such as 1×10⁻³.)
60
-
61
- 3. Theoretical Significance: Adaptive Control Based on Signal-to-Noise Ratio (SNR)
62
- This mechanism autonomously selects the following behaviors during each phase of learning:
63
-
64
- Discovery Phase: As the gradient points in a consistent direction, dt increases and Nt is suppressed, causing emoPulse to increase and promoting rapid convergence.
65
-
66
- Convergence Phase: As the gradient begins to oscillate near the solution, Nt rapidly increases while dt decreases. This causes emoPulse to automatically shrink, achieving a smooth landing without requiring conventional Learning Rate Decay.
67
-
68
- 4. Unification of Intent: Sign-based Normalization
69
- The stride length ηt determined by emoPulse is ultimately applied to parameter updates in the following format. Specifically, EmoSens employs “sign encoding,” which applies the sign function to both first and second moments.
70
- Δwt​=−emoPulset​⋅σdrive,t​⋅sign(vt​​+ϵ)sign(mt​)​
71
-
72
- The intent of this mathematical model is to abandon reliance on precise numerical calculations (floating-point numbers) and extract only the consistency of the gradient's “direction” (intent). Coded normalization enables updates independent of the gradient's absolute value, theoretically eliminating the impact of quantization noise in low-precision environments.
73
-
74
 
75
  4. Convergence Proof and Regret Analysis
76
 
@@ -93,21 +61,10 @@ Theoretical Foundations of Autonomous Optimization in EmoNAVI v3.6, EmoSENS v3
93
 
94
  Definition: "Emotion" in EmoNAVI is a high-order moment-based dynamic gating mechanism that transforms the statistical reliability of gradients into non-linear weights.
95
 
96
- 4.3 Adaptive Upper Bound of Regret in Second-Generation Emo Systems
97
-
98
- With the introduction of emoPulse, the aforementioned Regret Bound is updated to include the dynamic emoPulse rather than the static η₀:
99
- R(T)≤O(E[t=1∑T​emoPulset​⋅∥gt​∥2⋅(1−∣σt​∣)2])
100
-
101
- In this equation, since emoPulset is proportional to 1/Nt, the accumulation of regret is automatically suppressed in unstable terrain (high Noise) regions, while ηt increases in stable regions to accelerate convergence. This represents the evolution of AdaBound's dynamic clipping into autonomous pulses driven by time-series statistics. It provides mathematical justification for achieving “true learning rate freedom”—possessing adaptability inversely proportional to terrain complexity (noise) and independence from initial learning rate settings.
102
-
103
 
104
  5. Conclusion
105
 
106
- EmoNAVI v3.6, through the intuitive metaphor of the emotion scalar, realizes "landscape perception via high-order moments" and "adaptive step control via D-adaptation" within a single loop. This analysis demonstrates that EmoNAVI is not merely a collection of heuristics, but a theoretically sound next-generation optimizer that integrates the frontiers of online learning theory (COCOB/D-adapt).
107
-
108
- EmoSens v3.7 integrates the “judgment” based on the emotion scalar with **‘order’ through encoding** and **“pulse” through emoPulse**. This analysis demonstrates that the method is not merely an empirical rule, but a theoretical framework that highly integrates higher-order moment approximation and SNR-based adaptive control.
109
-
110
- This research has revealed that the emo-based optimizers, progressing from EmoNAVI to EmoSENS, can be positioned as a unified autonomous optimization framework centered on Multi-EMA, emoDrive, and emoPulse.
111
 
112
 
113
  Acknowledgements
@@ -128,15 +85,6 @@ Supplementary Material (1): Modifications to Update Equations
128
 
129
  EmoLynx (Lion-type): Decoupled weight-decay for improved stability.
130
 
131
- Emo Style Second Generation (Common to EmoSens / EmoAiry / EmoCats)
132
- The emoPulse mechanism transforms learning rate specification from “speed” to the concept of “Scope,” monitors the SNR of the loss landscape, and enables fully automatic convergence without a manual scheduler.
133
-
134
- EmoSens (Adam-type): Completely overcomes divergence in low-precision environments through “dual encoding” that signs the inverse of the second moment.
135
-
136
- EmoAiry (Adafactor type): Extends encoding to one-dimensional vectors (Bias, LayerNorm, etc.) in addition to matrix decomposition.
137
-
138
- EmoCats (Lion variant): Completely discard the second moment and separate weight decay from the update rule.
139
-
140
 
141
  Supplementary Material (2): Formal Proof of emoDrive Boundedness
142
 
@@ -187,12 +135,14 @@ Supplementary Material (2): Formal Proof of emoDrive Boundedness
187
  0<B_{\mathrm{low}}\leq 0.25.
188
 
189
  4. Conclusion
 
190
  From the above evaluations, we have proven that emoDrive satisfies the following boundedness condition in all regions:
191
  0<(1-|\sigma _{\max }|)\leq \mathrm{emoDrive}\leq 6.6.
192
  (Even when |\sigma _t| approaches 1, implementation details such as eps ensure that a small positive value is maintained.)
193
  The existence of this bounded multiplicative coefficient provides the mathematical foundation that allows EmoNAVI to retain the Adam‑type convergence rate O(1/T) while achieving constant‑factor acceleration.
194
 
195
  5. summary
 
196
  EmoNAVI encapsulates three forms of "intelligence" within a single update loop:
197
 
198
  Observational Intelligence (Multi-EMA): Captures the "undulations" of the loss landscape within a temporal spread, rather than at a single point.
 
1
+ Paper: Theoretical Basis for Autonomous Optimization in EmoNAVI v3.6
2
 
3
+ — Improving Regret Bound via Higher-Order Moment Approximation and Dynamic Distance Estimation —
 
 
4
 
5
 
6
  1. Introduction
7
 
8
+ In deep learning optimization, dynamic adjustment of the learning rate is the most critical factor determining convergence performance.
9
+
10
+ While conventional methods like Adam and AMSGrad utilize the first and second moments of gradients, their ability to directly estimate the steepness (curvature) of the local loss landscape or the distance D to the optimal solution has been limited.
11
 
12
+ This paper demonstrates that the “emotional scalar σt and emoDrive mechanism introduced in EmoNAVI v3.6 function mathematically as an approximation of higher-order moments and an online implementation of D-adaptation (and COCOB theory) (Defazio & Mishchenko, 2023), achieving both extremely low hyperparameter sensitivity and robust convergence
13
 
14
 
15
+ 2. Mathematical Redefinition of Implementation and Higher-Order Moment Approximation
16
 
17
+ 2.1 Generating Proxy Indicators Using Multi-EMA
18
 
19
+ EmoNAVI maintains a three-tier exponential moving average (short, medium, long).
20
  EMAshort,t​=(1−αs​)EMAshort,t−1​+αs​Lt​
21
 
22
+ Here, the operation of taking the difference ΔEMA = EMAlongEMAshort between EMAs with different smoothing coefficients α corresponds to approximating higher-order derivatives of the loss function L over time.
23
 
24
+ Approximation of Third and Fourth Moments: ΔEMA captures the rate of change in gradient (change in curvature).
25
 
26
+ Fifth-order moment history: The emotional scalar σt = tanh(ΔEMA/scale) is a statistic that nonlinearly compresses this higher-order information into the range [−1,1]. By recursively incorporating it into the update formula, the smoothness of the long-term terrain is reflected in the parameter update.
27
 
28
 
29
+ 3. Dynamic Distance Estimation via emoDrive (D-adaptation)
30
 
31
+ 3.1 Online Approximation for D-Estimation
32
 
33
+ D-adaptation algorithms estimate the optimal distance D from the initial point and scale the learning rate proportionally to D. In EmoNAVI, emoDrive fulfills the role of this D.
34
 
35
+ Acceleration Zone (High Confidence): In regions where σt is stable, the current search direction is deemed correct (lying on the straight path toward the optimal solution w*), and the effective step size is boosted to at least 8 times its original value. This operation is equivalent to exponentially increasing the estimated distance D^.
36
 
37
+ Suppression Zone (Low Confidence): During abrupt changes where ∣σt∣ > 0.75, updates are suppressed at an order of magnitude of O(1−∣σt). This serves as a safety mechanism against sudden increases in the local Lipschitz constant Lt​, equivalent to the “reset of the betting amount after a losing streak” in COCOB (Orabona & Tommasi, 2017).
38
 
39
+ The higher-order moments referred to here are: 3rd: skewness 4th: kurtosis 5th: the “variation of variation” in the time dimension
40
  ※ Higher-order moments are formed not by a single step but by “temporal integration.”
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  4. Convergence Proof and Regret Analysis
44
 
 
61
 
62
  Definition: "Emotion" in EmoNAVI is a high-order moment-based dynamic gating mechanism that transforms the statistical reliability of gradients into non-linear weights.
63
 
 
 
 
 
 
 
 
64
 
65
  5. Conclusion
66
 
67
+ EmoNAVI v3.6 achieves **“terrain mapping via higher-order moments and adaptive step control via D-adaptation”** within a single loop through the intuitive metaphor of an emotion scalar. This analysis demonstrates that EmoNAVI is not merely a collection of empirical rules, but a next-generation optimizer with high theoretical consistency, highly integrating the cutting edge of online learning theory (COCOB/D-adapt).
 
 
 
 
68
 
69
 
70
  Acknowledgements
 
85
 
86
  EmoLynx (Lion-type): Decoupled weight-decay for improved stability.
87
 
 
 
 
 
 
 
 
 
 
88
 
89
  Supplementary Material (2): Formal Proof of emoDrive Boundedness
90
 
 
135
  0<B_{\mathrm{low}}\leq 0.25.
136
 
137
  4. Conclusion
138
+
139
  From the above evaluations, we have proven that emoDrive satisfies the following boundedness condition in all regions:
140
  0<(1-|\sigma _{\max }|)\leq \mathrm{emoDrive}\leq 6.6.
141
  (Even when |\sigma _t| approaches 1, implementation details such as eps ensure that a small positive value is maintained.)
142
  The existence of this bounded multiplicative coefficient provides the mathematical foundation that allows EmoNAVI to retain the Adam‑type convergence rate O(1/T) while achieving constant‑factor acceleration.
143
 
144
  5. summary
145
+
146
  EmoNAVI encapsulates three forms of "intelligence" within a single update loop:
147
 
148
  Observational Intelligence (Multi-EMA): Captures the "undulations" of the loss landscape within a temporal spread, rather than at a single point.
emo-v36-paper(JPN).txt CHANGED
@@ -5,7 +5,7 @@
5
 
6
  1. 緒言
7
 
8
- ディープラーニングの最適化において、学習率の動的調整は収束性能を決定づける最重要課題である。従来の Adam や AMSGrad は勾配の 1次・2次モーメントを利用するが、局所的な損失地形の急峻さ (曲率) や最適解までの距離 D を直接推定する機能は限定的であった。 本稿では、EmoNAVI v3.6 が導入した「感情スカラー σt​」および「emoDrive」機構が、数学的には高次モーメントの近似と D-adaptation (および COCOB 理論) のオンライン実装 (Defazio & Mishchenko, 2023) として機能し、極めて低いハイパーパラメータ感度と頑健な収束性を両立することを証明する。
9
 
10
 
11
  2. 実装の数学的再定義と高次モーメント近似
@@ -18,7 +18,6 @@
18
  ここで、異なる平滑化係数 α を持つ EMA の差分 ΔEMA=EMAlong​−EMAshort​ を取る操作は、損失関数 L の時間軸における高次微分の近似に相当する。
19
 
20
  3次・4次モーメントの近似: ΔEMA は勾配の変動率 (曲率の変化) を捉える。
21
-
22
  5次モーメントの履歴化: 感情スカラー σt​=tanh(ΔEMA/scale) は、これらの高次の情報を [−1,1] に非線形圧縮した統計量であり、これを更新式に再帰的に含めることで、長長期的な地形の「滑らかさ」をパラメータ更新に反映させている。
23
 
24
 
@@ -32,6 +31,7 @@
32
  抑制ゾーン (信頼度低) : ∣σt​∣>0.75 となる急変時には、O(1−∣σt​∣) のオーダーで更新を抑制する。これは局所リプシッツ定数 Lt​ の急増に対する安全装置であり、COCOB における「負け越した際の Betting 額のリセット」(Orabona & Tommasi, 2017)に相当する。
33
 
34
  ここでいう高次momentは、3次:歪度 (skewness) 、4次:尖度 (kurtosis) 、5次:時間方向の“変動の変動”
 
35
 
36
 
37
  4. 収束性の証明と Regret 解析
 
5
 
6
  1. 緒言
7
 
8
+ ディープラーニングの最適化において、学習率の動的調整は収束性能を決定づける最重要課題である。 従来の Adam や AMSGrad は勾配の 1次・2次モーメントを利用するが、局所的な損失地形の急峻さ (曲率) や最適解までの距離 D を直接推定する機能は限定的であった。 本稿では、EmoNAVI v3.6 が導入した「感情スカラー σt​」および「emoDrive」機構が、数学的には高次モーメントの近似と D-adaptation (および COCOB 理論) のオンライン実装 (Defazio & Mishchenko, 2023) として機能し、極めて低いハイパーパラメータ感度と頑健な収束性を両立することを証明する。
9
 
10
 
11
  2. 実装の数学的再定義と高次モーメント近似
 
18
  ここで、異なる平滑化係数 α を持つ EMA の差分 ΔEMA=EMAlong​−EMAshort​ を取る操作は、損失関数 L の時間軸における高次微分の近似に相当する。
19
 
20
  3次・4次モーメントの近似: ΔEMA は勾配の変動率 (曲率の変化) を捉える。
 
21
  5次モーメントの履歴化: 感情スカラー σt​=tanh(ΔEMA/scale) は、これらの高次の情報を [−1,1] に非線形圧縮した統計量であり、これを更新式に再帰的に含めることで、長長期的な地形の「滑らかさ」をパラメータ更新に反映させている。
22
 
23
 
 
31
  抑制ゾーン (信頼度低) : ∣σt​∣>0.75 となる急変時には、O(1−∣σt​∣) のオーダーで更新を抑制する。これは局所リプシッツ定数 Lt​ の急増に対する安全装置であり、COCOB における「負け越した際の Betting 額のリセット」(Orabona & Tommasi, 2017)に相当する。
32
 
33
  ここでいう高次momentは、3次:歪度 (skewness) 、4次:尖度 (kurtosis) 、5次:時間方向の“変動の変動”
34
+ ※ 高次モーメントは単一のステップによってではなく「時間的積分」によって形成される。
35
 
36
 
37
  4. 収束性の証明と Regret 解析