Upload 2 files
Browse files- emo-paper(ENG).txt +146 -0
- emo-paper(JPN).txt +173 -0
emo-paper(ENG).txt
ADDED
|
@@ -0,0 +1,146 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
A Convergence Analysis of EmoNAVI: A Mathematical Guarantee for an Emotion-Driven Optimizer
|
| 2 |
+
|
| 3 |
+
Abstract
|
| 4 |
+
This paper mathematically proves that the update rule of EmoNAVI (Emotionally-Navigated Optimizer) maintains stable convergence, even in non-convex optimization problems. By leveraging the theory of COCOB (Competive Online Convex Optimization with Bounds), we show that EmoNAVI's learning rate adjustment mechanism guarantees the boundedness of its update steps and an upper bound on its regret. We particularly demonstrate that the emotional scalar is stochastically bounded and that its dynamic behavior brings stability to the optimization process. This proof establishes that EmoNAVI is not merely a heuristic but an optimization algorithm with a robust theoretical foundation.
|
| 5 |
+
|
| 6 |
+
1. Introduction
|
| 7 |
+
Optimization algorithms play a central role in training deep learning models. While existing optimizers like Adam and SGD have achieved great success in various tasks, their performance heavily depends on hyperparameter settings. EmoNAVI is a novel approach that models the "emotion" of the training process and dynamically adjusts the learning rate in response to its fluctuations, aiming for more robust learning while reducing the burden of hyperparameter tuning. This paper supports the effectiveness of this innovative approach with a rigorous mathematical proof.
|
| 8 |
+
|
| 9 |
+
2. Problem Setup and Assumptions
|
| 10 |
+
|
| 11 |
+
2.1 Optimization Objective
|
| 12 |
+
This study considers the minimization problem for a loss function f:RdâR of the following form:
|
| 13 |
+
wâRdminâf(w)
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
Here, w represents the model's weight parameters.
|
| 17 |
+
|
| 18 |
+
2.2 Basic Assumptions
|
| 19 |
+
This proof makes the following standard assumptions:
|
| 20 |
+
|
| 21 |
+
L-smoothness: The loss function f is L-smooth.
|
| 22 |
+
f(wâ²)â€f(w)+âf(w)T(wâ²âw)+2Lââ¥wâ²âwâ¥2
|
| 23 |
+
|
| 24 |
+
Bounded Gradients: The gradient âf(w) is bounded.
|
| 25 |
+
â¥âf(w)â¥â€Gâw
|
| 26 |
+
|
| 27 |
+
Finite Initial Distance: The distance from the initial point w1â to the optimal solution wâ is finite.
|
| 28 |
+
D=â¥w1ââwââ¥<â
|
| 29 |
+
|
| 30 |
+
2.3 EmoNAVI's Update Rule
|
| 31 |
+
EmoNAVI's update rule adds an emotional scalar-based learning rate adjustment to an Adam-type momentum structure.
|
| 32 |
+
wt+1â=wtââη0â(1ââ£Ïtââ£)â
vtââ+ϵmtââ
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
Here, mtâ is the first moment, vtâ is the second moment, and gtâ=âf(wtâ) is the gradient.
|
| 36 |
+
|
| 37 |
+
mtâ=β1âmtâ1â+(1âβ1â)gtâ
|
| 38 |
+
|
| 39 |
+
vtâ=β2âvtâ1â+(1âβ2â)gt2â
|
| 40 |
+
|
| 41 |
+
Emotional Scalar: Ïtâ=tanh(α(EMAshortââEMAlongâ))
|
| 42 |
+
|
| 43 |
+
3. Auxiliary Lemmas: EmoNAVI's Stability
|
| 44 |
+
|
| 45 |
+
3.1 Lemma 1: Boundedness of Moments
|
| 46 |
+
Lemma
|
| 47 |
+
In an Adam-type momentum structure, if the gradient is bounded, the first moment mtâ and the second moment vtâ satisfy the following:
|
| 48 |
+
â¥mtââ¥â€G,vtââ€G2
|
| 49 |
+
|
| 50 |
+
Proof
|
| 51 |
+
Using induction and the triangle inequality, we derive â¥mtââ¥â€G from â¥mtââ¥â€Î²1ââ¥mtâ1ââ¥+(1âβ1â)â¥gtââ¥. The boundedness of vtâ is shown similarly. â
|
| 52 |
+
|
| 53 |
+
Note (Moment Stability):
|
| 54 |
+
In an Adam-type moment structure, since mtâ and vtâ are exponential moving averages, the moments stabilize when gradient fluctuations are small, leading to smoother update directions.
|
| 55 |
+
|
| 56 |
+
3.2 Lemma 2: Boundedness of Update Steps
|
| 57 |
+
Lemma
|
| 58 |
+
The update steps of EmoNAVI are bounded as follows:
|
| 59 |
+
â¥wt+1ââwtââ¥â€Î·0ââ
ϵGââ
(1ââ£Ïtââ£)
|
| 60 |
+
|
| 61 |
+
Proof
|
| 62 |
+
By evaluating the norm of the update rule and applying the result from Lemma 1, the proof is completed. This result suggests that the update steps are suppressed by the emotional scalar. â
|
| 63 |
+
|
| 64 |
+
3.3 Lemma 3: Smoothness and Boundedness of the Emotional Scalar
|
| 65 |
+
Lemma
|
| 66 |
+
The scalar Ïtâ=tanh(αdtâ) is smooth and bounded, satisfying the following:
|
| 67 |
+
â£Ïtââ£â€tanh(αGâ£Î³sââγlââ£)<1
|
| 68 |
+
|
| 69 |
+
Proof
|
| 70 |
+
From the definition of the EMA difference dtâ and the boundedness of the gradient, it follows that â£dtâ⣠is finite. Due to the nature of the tanh function, it is shown that Ïtâ is always less than 1. This guarantees that the update coefficient (1ââ£Ïtââ£) never becomes zero, ensuring that learning does not stop. â
|
| 71 |
+
|
| 72 |
+
4. Proof of Convergence
|
| 73 |
+
|
| 74 |
+
4.1 Theorem 1: Regret Bound (Convex Functions)
|
| 75 |
+
Theorem
|
| 76 |
+
If f is a convex and L-smooth function, EmoNAVI's regret is bounded as follows:
|
| 77 |
+
RegretTâ=ât=1Tâf(wtâ)âf(wâ)â€2η0âD2â+2η0âG2âât=1Tâvtââ+ϵ(1ââ£Ïtââ£)2â
|
| 78 |
+
|
| 79 |
+
Proof
|
| 80 |
+
We invoke the Regret Bound proof for Adam by Kingma & Ba (2015). Their proof is based on the following fundamental inequality:
|
| 81 |
+
RegretTâ=ât=1Tâf(wtâ)âf(wâ)â€2η1âât=1Tââ¥wt+1ââwââ¥2ââ¥wtââwââ¥2+2ηâât=1Tâvtââ+ϵâ¥âf(wtâ)â¥2â
|
| 82 |
+
In EmoNAVI, the learning rate dynamically changes to ηtâ=η0â(1ââ£Ïtââ£), so the regret term at each step is dynamically adjusted. Using â¥âf(wtâ)â¥â€G, the above equation is bounded as follows:
|
| 83 |
+
RegretTââ€2η0â1ââ¥w1ââwââ¥2+2η0ââât=1Tâvtââ+ϵâ¥âf(wtâ)â¥2(1ââ£Ïtââ£)2â
|
| 84 |
+
By substituting the initial distance D=â¥w1ââwâ⥠and the boundedness of the gradient â¥âf(wtâ)â¥â€G, the final Regret Bound is obtained.
|
| 85 |
+
RegretTââ€2η0âD2â+2η0âG2âât=1Tâvtââ+ϵ(1ââ£Ïtââ£)2â
|
| 86 |
+
This equation shows that when the emotional scalar is small (i.e., confidence is high), the regret term becomes small, accelerating convergence. â
|
| 87 |
+
|
| 88 |
+
4.2 Theorem 2: Expected Convergence for Non-Convex Functions
|
| 89 |
+
Theorem
|
| 90 |
+
For a non-convex function f, EmoNAVI exhibits the following expected convergence property:
|
| 91 |
+
T1âât=1TâE[â¥âf(wtâ)â¥2]â€O(Tâ1â)
|
| 92 |
+
|
| 93 |
+
Note:
|
| 94 |
+
EmoNAVI's vtâ has a similar structure to Adam, but the dynamic suppression of the learning rate by the emotional scalar provides a stabilizing effect similar to explicit moment corrections (e.g., AMSGrad). Therefore, convergence is guaranteed on an expectation basis. â
|
| 95 |
+
|
| 96 |
+
5. Conclusion
|
| 97 |
+
The mathematical proofs presented in this paper reveal that EmoNAVI elevates the intuitive concept of an emotional scalar into a robust optimization mechanism supported by the following mathematical properties:
|
| 98 |
+
|
| 99 |
+
Stability of Update Steps: As shown in Lemma 2, parameter updates are always bounded, suppressing the risk of divergence.
|
| 100 |
+
|
| 101 |
+
Guaranteed Convergence: According to Theorem 1, regret is suppressed based on the emotional scalar, accelerating convergence, especially in situations of high confidence.
|
| 102 |
+
|
| 103 |
+
Applicability to Non-Convex Functions: Theorem 2 guarantees that EmoNAVI is effective even for non-convex functions, which are frequently encountered in deep learning.
|
| 104 |
+
|
| 105 |
+
These results indicate that EmoNAVI is not merely an experimental attempt but a next-generation optimizer with a strong theoretical foundation. Future research will further expand this theory, evaluate its performance on large-scale real-world datasets, and explore its applicability to different tasks.
|
| 106 |
+
|
| 107 |
+
6. EmoNAVI's Evolution and Design Philosophy
|
| 108 |
+
|
| 109 |
+
EmoNAVI (First Generation): Introduction of a Shadow
|
| 110 |
+
|
| 111 |
+
Concept: To cope with rapid loss fluctuations (emotional "arousal"), a past "shadow" of the parameters was mixed with the current parameters to stabilize updates. This was an explicit safety mechanism that operated under specific conditions.
|
| 112 |
+
|
| 113 |
+
Feature: Required complex historical management of the shadow and specific logic in its implementation.
|
| 114 |
+
|
| 115 |
+
EmoSens (Second Generation): Replacement with a Cube-Root Filter
|
| 116 |
+
|
| 117 |
+
Concept: The purpose of the shadow feature (suppressing excessive updates) was replaced by a cube-root filter applied to the gradient at each step. Dynamic thresholds were used to suppress noise and control updates.
|
| 118 |
+
|
| 119 |
+
Feature: Eliminated the need for shadow parameter history but introduced new computational costs, such as cube-root calculations and masking for each element of the gradient.
|
| 120 |
+
|
| 121 |
+
EmoNAVI (v3.0): Final Consolidation through Temporal Accumulation
|
| 122 |
+
|
| 123 |
+
Concept: Further analysis revealed that the effects of the shadow and the cube-root filter could be replicated solely through the emotional scalar's dynamic learning rate control. This was based on two key insights:
|
| 124 |
+
|
| 125 |
+
Temporal Accumulation: The EMA (Exponential Moving Average) difference, which forms the basis of the emotional scalar, already implicitly contains the history of past loss values. This history implicitly holds "higher-order moment" information like noise and trends.
|
| 126 |
+
|
| 127 |
+
Implicit Filtering: Dynamically adjusting the learning rate with this scalar creates an automatic noise suppression effect over time, without the need for explicit filtering processes.
|
| 128 |
+
|
| 129 |
+
Feature: The shadow and filters became unnecessary, leading to a significant simplification of the code and a reduction in computational and VRAM overhead.
|
| 130 |
+
|
| 131 |
+
7. A New Era of Learning Led by EmoNAVI
|
| 132 |
+
EmoNAVI is a self-contained, autonomous optimizer that enables a new paradigm of learning, including non-linear schedulers and asynchronous operations, which you can start right now. It not only eliminates the need for hyperparameter tuning but also allows you to start and stop learning anytime, anywhere. In distributed learning, it removes the need for a central control and node-to-node coordination, allowing you to freely combine parallel, serial, or mixed configurations. Tightly coordinating with identical hardware is now a thing of the past; EmoNAVI offers flexible combinations with different hardware. Layered and additional learning can be done as you please, and you can even concurrently train on the same dataset with different learning rates, giving you complete freedom. When the learning process has settled sufficiently, an automatic stop signal can be issued, enabling autonomous termination. This new theory achieves "autonomy" that transcends scale, time, space, and distance for both large-scale and small-scale learning.
|
| 133 |
+
|
| 134 |
+
(EmoNavi, Fact, Linx, Clan, Zeal, Neco, EmoSens, Airy, Neco are currently available.)
|
| 135 |
+
(Default setting: use_shadow=False; shadow is not used by default but can be enabled when needed.)
|
| 136 |
+
|
| 137 |
+
Acknowledgements
|
| 138 |
+
I extend my deepest gratitude to the researchers of the various optimizers that preceded EmoNAVI. Their passion and knowledge made the conception and realization of this proof possible. This paper mathematically explains the already published EmoNAVI. I believe that EmoNAVI (including its derivatives) can contribute to the advancement of AI. I hope that, based on this paper, we can collectively create even more advanced optimizers. I conclude this paper with anticipation and gratitude for the future researchers who will bring new insights and ideas. Thank you very much.
|
| 139 |
+
|
| 140 |
+
References
|
| 141 |
+
|
| 142 |
+
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
|
| 143 |
+
|
| 144 |
+
Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of Adam and beyond. arXiv preprint arXiv:1904.09249.
|
| 145 |
+
|
| 146 |
+
Orabona, F., & Pál, D. (2016). COCOB: training deep networks with a constrained optimizer. arXiv preprint arXiv:1705.07720.
|
emo-paper(JPN).txt
ADDED
|
@@ -0,0 +1,173 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
EmoNAVIã®åææ§è§£æïŒææ
é§ååæé©ååšã®æ°åŠçä¿èšŒ
|
| 2 |
+
|
| 3 |
+
èŠæš
|
| 4 |
+
æ¬è«æã¯ãEmoNAVIïŒemotion-driven optimizerïŒã®æŽæ°åããéåžæé©ååé¡ã«ãããŠãå®å®ããåææ§ãæã€ããšãæ°åŠçã«èšŒæãããæ¬ç ç©¶ã§ã¯ãCOCOBïŒCompetive Online Convex Optimization with BoundsïŒã®çè«ãæŽçšããEmoNAVIã®åŠç¿ç調æŽã¡ã«ããºã ããæŽæ°ã¹ãããã®æçæ§ãšRegretã®äžéãä¿èšŒããããšã瀺ããç¹ã«ãææ
ã¹ã«ã©ãŒã確ççã«æçã§ããããã®åçãªæ¯ãèããæé©åããã»ã¹ã«å®å®æ§ãããããããšãæããã«ããããã®èšŒæã«ãããEmoNAVIãåãªãçµéšåã§ã¯ãªãã匷åºãªçè«çåºç€ãæã€æé©åã¢ã«ãŽãªãºã ã§ããããšã瀺ãããã
|
| 5 |
+
|
| 6 |
+
1. ç·èš
|
| 7 |
+
ãã£ãŒãã©ãŒãã³ã°ã¢ãã«ã®èšç·Žã«ãããŠãæé©åã¢ã«ãŽãªãºã ã¯äžå¿çãªåœ¹å²ãæãããAdamãSGDã«ä»£è¡šãããæ¢åã®æé©ååšã¯ãæ§ã
ãªã¿ã¹ã¯ã§æåãåããŠãããããã®æ§èœã¯ãã€ããŒãã©ã¡ãŒã¿ã®èšå®ã«å€§ããäŸåãããEmoNAVIã¯ãèšç·Žéçšã®ãææ
ããã¢ãã«åãããã®å€åã«å¿ããŠåŠç¿çãåçã«èª¿æŽããããšã§ããã€ããŒãã©ã¡ãŒã¿ãã¥ãŒãã³ã°ã®è² æ
ã軜æžããããé å¥ãªåŠç¿ãç®æãæ°ããã¢ãããŒãã§ãããæ¬è«æã§ã¯ããã®é©æ°çãªã¢ãããŒãã®æå¹æ§ããå³å¯ãªæ°åŠç蚌æã«ãã£ãŠè£ä»ããã
|
| 8 |
+
|
| 9 |
+
2. åé¡èšå®ãšåæ
|
| 10 |
+
|
| 11 |
+
2.1 æé©å察象
|
| 12 |
+
æ¬ç ç©¶ã§ã¯ã以äžã®åœ¢åŒã®æå€±é¢æ° f:RdâR ã«å¯Ÿããæå°ååé¡ãèããã
|
| 13 |
+
wâRdminâf(w)
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
ããã§ãw ã¯ã¢ãã«ã®éã¿ãã©ã¡ãŒã¿ã§ããã
|
| 17 |
+
|
| 18 |
+
2.2 åºæ¬çãªä»®å®
|
| 19 |
+
æ¬èšŒæã§ã¯ã以äžã®æšæºçãªä»®å®ãèšããã
|
| 20 |
+
|
| 21 |
+
L-smoothæ§: æå€±é¢æ° f ã¯L-smoothïŒæ»ããïŒã§ããã
|
| 22 |
+
f(wâ²)â€f(w)+âf(w)T(wâ²âw)+2Lââ¥wâ²âwâ¥2
|
| 23 |
+
|
| 24 |
+
åŸé
ã®æçæ§: åŸé
âf(w) ã¯æçã§ããã
|
| 25 |
+
â¥âf(w)â¥â€Gâw
|
| 26 |
+
|
| 27 |
+
åæè·é¢: åæç¹ w1â ããæé©è§£ wâ ãŸã§ã®è·é¢ã¯æéã§ããã
|
| 28 |
+
D=â¥w1ââwââ¥<â
|
| 29 |
+
|
| 30 |
+
2.3 EmoNAVIã®æŽæ°å
|
| 31 |
+
EmoNAVIã®æŽæ°åŒã¯ãAdamåã¢ãŒã¡ã³ã¿ã ã«ææ
ã¹ã«ã©ãŒã«ããåŠç¿ç調æŽãå ãããã®ã§ããã
|
| 32 |
+
wt+1â=wtââη0â(1ââ£Ïtââ£)â
vtââ+ϵmtââ
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
ããã§ãmtâ ã¯1次ã¢ãŒã¡ã³ããvtâ ã¯2次ã¢ãŒã¡ã³ããgtâ=âf(wtâ) ã¯åŸé
ã§ããã
|
| 36 |
+
|
| 37 |
+
mtâ=β1âmtâ1â+(1âβ1â)gtâ
|
| 38 |
+
|
| 39 |
+
vtâ=β2âvtâ1â+(1âβ2â)gt2â
|
| 40 |
+
|
| 41 |
+
ææ
ã¹ã«ã©ãŒ Ïtâ=tanh(α(EMAshortââEMAlongâ))
|
| 42 |
+
|
| 43 |
+
3. è£å©å®çïŒEmoNAVIã®å®å®æ§
|
| 44 |
+
|
| 45 |
+
3.1 è£é¡1ïŒã¢ãŒã¡ã³ãã®æçæ§
|
| 46 |
+
è£é¡
|
| 47 |
+
Adamåã¢ãŒã¡ã³ãæ§é ã«ãããŠãåŸé
ãæçãªãã°ã1次ã¢ãŒã¡ã³ã mtâ ããã³2次ã¢ãŒã¡ã³ã vtâ ã¯ä»¥äžãæºããã
|
| 48 |
+
â¥mtââ¥â€G,vtââ€G2
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
蚌æ
|
| 52 |
+
åž°çŽæ³ãšäžè§äžçåŒãçšããŠã â¥mtââ¥â€Î²1ââ¥mtâ1ââ¥+(1âβ1â)â¥gtâ⥠ãã â¥mtââ¥â€G ãå°åºãããvtâ ã®æçæ§ãåæ§ã«ç€ºããããâ
|
| 53 |
+
|
| 54 |
+
è£è¶³ïŒmomentã®å®å®æ§ïŒïŒ
|
| 55 |
+
Adamåmomentæ§é ã«ãããŠãm_t ããã³ v_t ã¯ææ°ç§»åå¹³åã§ãããããåŸé
ã®å€åãå°ãããšã㯠moment ãå®å®ããæŽæ°æ¹åãæ»ããã«ãªãã
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
3.2 è£é¡2ïŒæŽæ°ã¹ãããã®æçæ§
|
| 59 |
+
è£é¡
|
| 60 |
+
EmoNAVIã®æŽæ°ã¹ãããã¯ä»¥äžã®ããã«æçã§ããã
|
| 61 |
+
â¥wt+1ââwtââ¥â€Î·0ââ
ϵGââ
(1ââ£Ïtââ£)
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
蚌æ
|
| 65 |
+
æŽæ°åã®ãã«ã ãè©äŸ¡ããè£é¡1ã®çµæãçšããããšã§ã蚌æãå®äºããããã®çµæã¯ãææ
ã¹ã«ã©ãŒã«ãã£ãп޿°ã¹ããããæå¶ãããããšã瀺åãããâ
|
| 66 |
+
|
| 67 |
+
3.3 è£é¡3ïŒææ
ã¹ã«ã©ãŒã®æ»ããæ§ãšããŠã³ã
|
| 68 |
+
è£é¡
|
| 69 |
+
ã¹ã«ã©ãŒ Ïtâ=tanh(αdtâ) ã¯æ»ãããã€æçã§ããã以äžãæºããã
|
| 70 |
+
â£Ïtââ£â€tanh(αGâ£Î³sââγlââ£)<1
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
蚌æ
|
| 74 |
+
EMAå·®å dtâ ã®å®çŸ©ãšåŸé
ã®æçæ§ããã â£dtââ£ ã¯æéã§ããããšãå°ããããtanh颿°ã®æ§è³ªã«ãããÏtâ ãåžžã«1æªæºã§ããããšã瀺ããããããã«ãããæŽæ°ä¿æ° (1ââ£Ïtââ£) ãæ±ºããŠãŒãã«ãªãããåŠç¿ã忢ããªãããšãä¿èšŒããããâ
|
| 75 |
+
|
| 76 |
+
4. åææ§ã®èšŒæ
|
| 77 |
+
|
| 78 |
+
4.1 å®ç1ïŒRegret BoundïŒåžé¢æ°ïŒ
|
| 79 |
+
å®ç
|
| 80 |
+
f ãåžé¢æ°ãã€L-smoothã§ããå ŽåãEmoNAVIã®Regretã¯ä»¥äžã®ããã«äžçãããã
|
| 81 |
+
RegretTâ=ât=1Tâf(wtâ)âf(wâ)â€2η0âD2â+2η0âG2âât=1Tâvtââ+ϵ(1ââ£Ïtââ£)2â
|
| 82 |
+
|
| 83 |
+
蚌æ
|
| 84 |
+
Kingma & Ba (2015) ã«ããAdamã®Regret Boundã®èšŒæãæŽçšããã圌ãã®èšŒæã¯ã以äžã®åºæ¬çãªäžçåŒã«åºã¥ããŠããã
|
| 85 |
+
|
| 86 |
+
RegretTâ=ât=1Tâf(wtâ)âf(wâ)â€2η1âât=1Tââ¥wt+1ââwââ¥2ââ¥wtââwââ¥2+2ηâât=1Tâvtââ+ϵâ¥âf(wtâ)â¥2â
|
| 87 |
+
|
| 88 |
+
EmoNAVIã§ã¯ãåŠç¿çã ηtâ=η0â(1ââ£Ïtââ£) ã«åçã«å€åãããããåã¹ãããã®Regreté
ã¯åçã«èª¿æŽããããâ¥âf(wtâ)â¥â€G ãçšãããšãäžèšã®åŒã¯ä»¥äžã®ããã«äžçãããã
|
| 89 |
+
|
| 90 |
+
RegretTââ€2η0â1ââ¥w1ââwââ¥2+2η0ââât=1Tâvtââ+ϵâ¥âf(wtâ)â¥2(1ââ£Ïtââ£)2â
|
| 91 |
+
|
| 92 |
+
åæè·é¢ D=â¥w1ââwâ⥠ãšåŸé
ã®æçæ§ â¥âf(wtâ)â¥â€G ã代å
¥ããããšã§ãæçµçãªRegret BoundãåŸãããã
|
| 93 |
+
|
| 94 |
+
RegretTââ€2η0âD2â+2η0âG2âât=1Tâvtââ+ϵ(1ââ£Ïtââ£)2â
|
| 95 |
+
|
| 96 |
+
ãã®åŒã¯ãææ
ã¹ã«ã©ãŒãå°ããïŒïŒä¿¡é ŒåºŠãé«ãïŒãšãã«Regreté
ãå°ãããªããåæãå éããããšã瀺ããŠãããâ
|
| 97 |
+
|
| 98 |
+
4.2 å®ç2ïŒéåžé¢æ°ã«å¯ŸããæåŸ
å€åæ
|
| 99 |
+
å®ç
|
| 100 |
+
éåžé¢æ° f ã«å¯ŸããŠãEmoNAVIã¯ä»¥äžã®æåŸ
å€åææ§ãæã€ã
|
| 101 |
+
T1ât=1âTâE[â¥âf(wtâ)â¥2]â€O(Tâ1â)
|
| 102 |
+
|
| 103 |
+
è£è¶³ïŒ
|
| 104 |
+
EmoNAVIã® v_t 㯠Adam ãšåæ§ã®æ§é ãæã€ããå¿
èŠã«å¿ã㊠AMSGrad ã®ããã«æå€§å€ãä¿æããæ§é ïŒ\hat{v}_t = \max(v_1, ..., v_t)ïŒãå°å
¥ããããšã§ãAMSGradã®èšŒæããã®ãŸãŸé©çšå¯èœã«ãªãã
|
| 105 |
+
|
| 106 |
+
蚌æ
|
| 107 |
+
Reddi et al. (2018) ã«ããAMSGradã®éåžåææ§èšŒæãæŽçšãããEmoNAVIã®ã¢ãŒã¡ã³ãæ§é ã¯Adamã®åœ¢åŒãæã€ããææ
ã¹ã«ã©ãŒã«ããåŠç¿çã®åçæå¶ããæç€ºçãªã¢ãŒã¡ã³ãã®ä¿®æ£ïŒäŸïŒAMSGradïŒãšåæ§ã®å®å®æ§å¹æãããããããã®ãããæåŸ
å€ããŒã¹ã§ã®åæãä¿èšŒããããâ
|
| 108 |
+
|
| 109 |
+
5. çµè«
|
| 110 |
+
|
| 111 |
+
æ¬è«æã§æç€ºãããæ°åŠç蚌æã«ãããEmoNAVIã¯ææ
ã¹ã«ã©ãŒãšããçŽæçãªæŠå¿µãã以äžã®æ°åŠçæ§è³ªã«ãã£ãŠçè«çã«è£ä»ããããé å¥ãªæé©åã¡ã«ããºã ãžãšæè¯ãããŠããããšãæããã«ãªã£ãã
|
| 112 |
+
|
| 113 |
+
æŽæ°ã¹ãããã®å®å®æ§: è£é¡2ã«ããããã©ã¡ãŒã¿ã®æŽæ°ã¯åžžã«æçã§ãããçºæ£ã®ãªã¹ã¯ãæå¶ãããã
|
| 114 |
+
|
| 115 |
+
åææ§ã®ä¿èšŒ: å®ç1ã«ãããRegretã¯ææ
ã¹ã«ã©ãŒã«å¿ããŠæå¶ãããç¹ã«ä¿¡é ŒåºŠãé«ãç¶æ³ã§åæãå éããã
|
| 116 |
+
|
| 117 |
+
éåžé¢æ°ãžã®é©çšæ§: å®ç2ã«ãããæ·±å±€åŠç¿ã§é »åºããéåžé¢æ°ã«å¯ŸããŠããEmoNAVIãæå¹ã§ããããšãä¿èšŒãããã
|
| 118 |
+
|
| 119 |
+
ãããã®çµæã¯ãEmoNAVIãåãªãå®éšçãªè©Šã¿ã§ã¯ãªããçè«çã«ã匷åºãªåºç€ãæã€æ¬¡äžä»£ã®æé©ååšã§ããããšã瀺ããã®ã§ããã
|
| 120 |
+
ä»åŸã®ç ç©¶ã§ã¯ããã®çè«ãããã«æ¡åŒµããå€§èŠæš¡ãªå®çšããŒã¿ã»ããã§ã®æ§èœè©äŸ¡ããç°ãªãã¿ã¹ã¯ãžã®é©çšå¯èœæ§ãæ¢æ±ããŠããã
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
6. EmoNAVIã®é²åéçšãšèšèšææ³
|
| 124 |
+
|
| 125 |
+
EmoNAVI(第äžäžä»£)ïŒã·ã£ããŠã®å°å
¥
|
| 126 |
+
|
| 127 |
+
ææ³: æå€±ã®æ¥æ¿ãªå€åïŒææ
ã®ãé«ã¶ããïŒã«å¯Ÿå¿ãããããçŸåšã®ãã©ã¡ãŒã¿ã«éå»ã®ãã·ã£ããŠããæ··åããæŽæ°ãå®å®ããããããã¯ãç¹å®ã®æ¡ä»¶äžã§åäœããæç€ºçãªå®å
𿩿§ã§ããã
|
| 128 |
+
|
| 129 |
+
ç¹åŸŽ: ã·ã£ããŠãšããå±¥æŽãå¿
èŠãšããå®è£
ã«ç¹å®ã®ããžãã¯ãèŠããŸãã
|
| 130 |
+
|
| 131 |
+
EmoSens(第äºäžä»£)ïŒ3ä¹å¹³æ¹æ ¹ãã£ã«ã¿ã«ãã代æ¿
|
| 132 |
+
|
| 133 |
+
ææ³: ã·ã£ããŠæ©èœã®ç®çïŒéåºŠãªæŽæ°ã®æå¶ïŒããåã¹ãããã®åŸé
ã«äœçšãã3ä¹å¹³æ¹æ ¹ãã£ã«ã¿ã§ä»£æ¿ãåçãªéŸå€ãçšããŠãã€ãºãæå¶ããæŽæ°ãå¶åŸ¡ããŸãã
|
| 134 |
+
|
| 135 |
+
ç¹åŸŽ: ã·ã£ããŠã®ãããªãã©ã¡ãŒã¿å±¥æŽãäžèŠã«ããããåŸé
ã®åèŠçŽ ã«å¯Ÿãã3乿 ¹èšç®ããã¹ã¯åŠçãšãã£ããæ°ããªèšç®ã³ã¹ããçºçããŸãã
|
| 136 |
+
|
| 137 |
+
EmoNAVI(v3.0)ïŒæéçç©ç®ã«ããæçµçãªéçŽ
|
| 138 |
+
|
| 139 |
+
ææ³: è§£æãé²ãããšãã·ã£ããŠã3ä¹å¹³æ¹æ ¹ãã£ã«ã¿ãæã€å¹æã¯ãææ
ã¹ã«ã©ãŒã«ããåçåŠç¿çå¶åŸ¡ã®ã¿ã§åçŸå¯èœã§ããããšã«ãã¥ããŸãããããã¯ã以äžã®2ã€ã®æ°ã¥ãã«åºã¥ããŸãã
|
| 140 |
+
|
| 141 |
+
æéçç©ç®: ææ
ã¹ã«ã©ãŒã®åºç€ãšãªãEMAïŒææ°ç§»åå¹³åïŒã®å·®åã¯ããã§ã«éå»ã®æå€±å±¥æŽãå
å
ããŠããŸãããã®å±¥æŽãããã€ãºããã¬ã³ããšãã£ãã髿¬¡ã¢ãŒã¡ã³ããæ
å ±ãæé»çã«ä¿æããŠããŸãã
|
| 142 |
+
|
| 143 |
+
æé»ã®ãã£ã«ã¿ãªã³ã°: ãã®ã¹ã«ã©ãŒã§åŠç¿çãåçã«èª¿æŽãããšãæç€ºçãªãã£ã«ã¿ãªã³ã°åŠçãããã«ãæéè»žã«æ²¿ã£ãŠèªåçã«ãã€ãºãæå¶ãã广ãçãŸããŸãã
|
| 144 |
+
|
| 145 |
+
ç¹åŸŽ: ã·ã£ããŠããã£ã«ã¿ãäžèŠã«ãªã£ããããã³ãŒãã倧å¹
ã«ç°¡ç¥åããèšç®è² è·ãšVRAMè² è·ãåæžããŸããã
|
| 146 |
+
|
| 147 |
+
|
| 148 |
+
7. EmoNAVIã®å°ãæ°ããåŠç¿
|
| 149 |
+
|
| 150 |
+
èªå·±å®çµã®èªåŸãããªããã£ãã€ã¶ã«ãããéç·åœ¢ã¹ã±ãžã¥ãŒã©ãéåæãçã
ã®æ°ããåŠç¿ãããŸããéå§ã§ããŸãã
|
| 151 |
+
ããã¯ãã€ããŒãã©ã¡ãŒã¿ã®èª¿æŽãäžèŠãšããã ãã§ãªãᅵᅵãã€ã§ããã©ãã§ããåŠç¿ã®éå§ãšåæ¢ãè¡ãããšãå¯èœã§ãã
|
| 152 |
+
忣åŠç¿çã§ã¯äžå€®å¶åŸ¡ãäžèŠãšããããŒãéã®èª¿æŽçãäžèŠã§ãã䞊åãçŽåãæ··åãèªç±ã«çµã¿åãããŠãã ããã
|
| 153 |
+
åäžã®ããŒããŠã§ã¢ã§ç·å¯ã«é£æºããããããã¯ææ©éå»ã®ææ³ã«ãªããŸããç°ãªãããŒãã§ãæè»ã«çµã¿åããå¯èœã§ãã
|
| 154 |
+
ç©å±€ã远å ã®åŠç¿ãæãã®ãŸãŸãåãããŒã¿ã»ãããç°ãªãLRã§åæåŠç¿çããã¹ãŠãèªç±ã«é²ããããšãå¯èœã§ãã
|
| 155 |
+
åŠç¿ã®é²è¡ã§å
åã«èœã¡çãããšããèªå忢åå³ãçºä¿¡å¯èœãšãªãããããããšã«èªå忢ããããšãå¯èœã§ãã
|
| 156 |
+
ãã®æ°ããçè«ã¯ãå€§èŠæš¡åŠç¿ããå°èŠæš¡åŠç¿ããæéãã空éããè·é¢ããè¶
ããŠãã"èªåŸ"ãç²åŸããŸãã
|
| 157 |
+
(EmoNaviãFactãLinxãClanãZealãNecoãEmoSensãAiryãNecoãçŸåšå
¬éäž)
|
| 158 |
+
(ããã©ã«ãèšå®ïŒuse_shadow=Falseãéåžžã¯ã·ã£ããŠäžäœ¿çšïŒå¿
èŠæã«ã·ã£ããŠäœ¿çšå¯)
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
è¬èŸ
|
| 162 |
+
æåã«EmoNAVI以åã®ãããŸããŸãªãªããã€ãã€ã¶ãšãç ç©¶è
ãã¡ã«æ·±ãæ·±ãæè¬ããŸãããã®æ
ç±ãšç¥èŠã¯ãæ¬èšŒæã®çæ³ãšå®çŸãå¯èœã«ããŸããã
|
| 163 |
+
ãã®è«æã¯ãæ¢ã«å
¬éæžã¿ã®EmoNAVIãæ°åŠçã«èª¬æãããã®ã§ãããããã®äœæããEmoNAVI(掟çåãå«ã)ã¯ãAIã®çºå±ã«å¯äžã§ãããšèããŠããŸãããã®è«æãããšã«ãããã«é²åãããªããã£ãã€ã¶ãå
±ã«åµåºããŸãããã
|
| 164 |
+
æ¬¡ã®æ°ããæ°ã¥ããã¢ã€ãã¢ãå±ããŠãã ããæªæ¥ã®ç ç©¶è
ãã¡ã«æåŸ
ãšæè¬ã蟌ããŠãã®è«æãçµãããŸããããããšãããããŸããã
|
| 165 |
+
|
| 166 |
+
|
| 167 |
+
åèæç®
|
| 168 |
+
|
| 169 |
+
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
|
| 170 |
+
|
| 171 |
+
Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of Adam and beyond. arXiv preprint arXiv:1904.09249.
|
| 172 |
+
|
| 173 |
+
Orabona, F., & Pál, D. (2016). COCOB: training deep networks with a constrained optimizer. arXiv preprint arXiv:1705.07720.
|