EmoNAVI / readme.md

Upload 2 files

cad021d verified 6 months ago

22.8 kB

	# 主題：新世代optimizer、EmoNAVIによる変革と感情学習の成果
	## Title: A New Generation Optimizer — The Innovations and Outcomes of Emotional Learning with EmoNAVI
	## 副題：過去値不要で現在値から再開できる自動収束･自己制御･自律型軽量最適器の解説
	### Subtitle: A Lightweight, Self-Regulating, Autonomous Optimizer That Automatically Converges and Resumes from the Present Without Relying on Past Values
	## テーマ：既存のoptimizerにないものをつくる、出来たのはニューロンスパイクの再発明でした。
	### Theme: Creating What Existing Optimizers Lack — A Reinvention of Neuronal Spiking

	## 序論：
	現在主流のoptimizerはさまざまに改良され簡易化を進めています、しかし依然として、
	学習再開、スケジューリング、学習状態の記録や復元、等について調整の難しさや煩雑さは存在しています、
	面倒なパラメータに依存せず、それらを解決する新しいアプローチを見つけたのでここで紹介します。
	## Introduction
	Mainstream optimizers have undergone significant improvements and simplifications in recent years.
	However, they still face practical challenges in areas such as resuming training, scheduling updates, and managing the recording and restoration of learning states.
	These issues often require tedious parameter adjustments and ad hoc workarounds.
	In this paper, we introduce a new approach that addresses these problems without relying on cumbersome parameter configurations.

	## 本論：
	今回ここで紹介するのは新世代のoptimizerです、
	EMA的平滑化の概念を下地にし、独自に構築した感情的"EMA＆スカラー"を中心にした"感情機構"という新しい仕組みを実現しました、
	この"感情機構"は、EMA的発想を再解釈･独自拡張することで得られた新しい機構です。
	EmoNAVIの独立性と革新性を紹介します。
	## Main Section
	In this paper, we present a new generation of optimizer.
	Built upon the foundation of EMA (Exponential Moving Average) smoothing, we have developed a novel mechanism called the "emotional mechanism," which centers around a unique combination of EMA and scalar dynamics.
	This mechanism was created by reinterpreting and independently extending the conventional EMA concept.
	Here, we introduce EmoNAVI—an optimizer characterized by its innovation and independence.

	最初に"感情機構"と名付けた経緯と理由を記します。
	生物のもつ｢感情｣とは、知覚と記憶の差異に基づく行動のトリガです、同様にEmoNAVIも現在と過去の差分に基づき学習の"行動"を制御する仕組みとして設計されています。
	そして"感情機構"と名付けた理由のもうひとつは、この一連の動作がまるでニューロンスパイクのような動作をするからです。
	この機構"感情機構"の動作を明快にした読み物、本稿末尾に記すリンク先の擬人化を読むことで簡単にご理解頂けると思います。

	First, let us explain the background and reasoning behind the term “Emotion Mechanism.”
	In biological systems, emotions are often understood as triggers for action based on discrepancies between perception and memory.
	EmoNAVI was similarly designed to control its learning “behavior” by responding to differences between the present and the past.
	Another reason we chose the term “Emotion Mechanism” is that its operation closely resembles neuronal spiking behavior.
	For a more intuitive understanding of how this mechanism works, we encourage you to read the personification linked at the end of this article.

	次に、"感情機構"の構成を記します、
	感情機構とは、２つのEMA、スカラー、Shadow、により構成されます。

	Next, we outline the structure of the “Emotion Mechanism.”
	This mechanism consists of two EMAs, a scalar value, and a shadow component.

	まず２つのEMAによる"感情EMA"について説明します、
	２つのEMAで構成します、短期型と長期型です、この２つのEMAはLossを監視し判断材料を得ます、
	１つめ、短期型EMAは瞬間的なシグナル(緊張)を受け持ちます２つめ、長期型EMAは平均した過去のシグナル(安静)を受け持ちます、
	この２つのEMAは次に紹介する"感情スカラー"へそれぞれの持つ判断材料を渡します

	First, we describe the "Emotional EMA," which consists of two components: a short-term EMA and a long-term EMA.
	These two EMAs continuously monitor the loss value and serve as the basis for subsequent decision-making.
	The short-term EMA captures rapid, momentary signals (interpreted as “tension”), while the long-term EMA reflects more averaged, historical trends (“calm”).
	Both EMAs pass their respective signals to the "Emotion Scalar," which will be introduced in the next section.

	次に、"感情スカラー"について説明します、
	前述の"感情EMA"からの信号をスカラー値に変換します、スカラー値の変化は、これら２つのEMAの差分により常に動的変化を続けます、
	"感情スカラー"はoptimizerにより書き換えた学習結果の是非を判定し、
	"スカラー値が一定閾値を超えたときのみ"次に紹介するShadowの配合を決めます

	Next, we introduce the "Emotion Scalar."
	It converts the signals from the previously described Emotional EMA into a scalar value, which continuously changes in response to the difference between the short-term and long-term EMA.
	This scalar dynamically evaluates whether the learning update performed by the optimizer should be considered appropriate.
	Only when the scalar exceeds a certain threshold does it trigger the next step: determining how much of the "Shadow" should be blended into the learning parameters.

	次に、Shadowについて説明します、
	Shadowは学習開始直後にShadowとして保存され維持されます、このShadowは"過去の穏やかな状態"の記憶です、この情報は感情機構に追従しながらゆっくりと変化し続けます、
	そして"感情スカラー"に応じ決められたratioで学習結果にブレンドとして反映されます、このブレンドの配合率も感情機構により動的に変化し続けます、

	Next, we describe the "Shadow."
	At the beginning of training, a copy of the current parameters is saved and maintained as the Shadow.
	This Shadow represents a memory of past calm states, and it evolves slowly over time, following the guidance of the Emotion Mechanism.
	When the Emotion Scalar exceeds a certain threshold, a dynamic blend ratio is computed.
	This ratio determines how much of the Shadow is mixed into the current parameters.
	The blend ratio itself is also dynamically adjusted by the Emotion Mechanism in response to ongoing learning behavior.

	ここまで"感情機構"の構成と役割りを説明しました、続いて"感情機構"の動作機序を見ていきましょう。
	まずoptimizerの学習結果が記録されます、この時"感情機構"は緊張と安静の差分情報で書き換えの是非を判定します、
	この判定により、過度の学習と判断した場合は、過去の適切な状態をブレンドすることでノイズや暴走を抑制します、
	適切な学習と判断した場合は、過去をブレンドしない選択をします、これをstep毎に行います、

	Now that we have explained the structure and role of the Emotion Mechanism, let us examine how it operates.
	At each training step, the optimizer's updated parameters are recorded.
	The Emotion Mechanism then evaluates whether these updates are appropriate, based on the difference between short-term “tension” and long-term “calm” signals.
	If the mechanism determines that the update reflects excessive learning, it suppresses potential noise or instability by blending in a suitable portion of the past stable state (Shadow).
	Conversely, if the update is deemed appropriate, the mechanism chooses not to apply blending.
	This evaluation and adjustment are performed dynamically at each training step.

	さらに、この判定では"信頼度"の評価をします、"感情スカラー"が一時的に大きく振れるだけでは不十分であり「この変化が本当に意味のあるものかどうか」を見極めて混合の是非を判断します。
	そのため、学習の序盤では長期の安静シグナルの蓄積が少なく信頼に値しないため混合が発動しづらく、終盤では短期の緊張シグナルが収束しスカラー自体が閾値に届かず動作しません。
	(学習の序盤では判定基準の過去シグナルが少ないため動作しませんし、終盤では瞬間シグナルが少ないため動作しません)
	このように、EmoNAVIの"感情機構"は、単なる閾値反応ではなく｢揺らぎに対する信頼ある変化のみを察知して反応する」慎重な意思決定を行います。

	In addition, this decision-making process includes an evaluation of "reliability."
	It is not sufficient for the Emotion Scalar to simply spike temporarily; the mechanism assesses whether the fluctuation truly represents a meaningful change before deciding whether blending should occur.
	As a result, in the early stages of learning, blending is unlikely to be triggered because the long-term “calm” signal has not yet accumulated enough history to be trustworthy.
	In the later stages, on the other hand, the short-term “tension” signal tends to converge, and the scalar itself fails to exceed the threshold—thus the mechanism remains inactive.
	(In short: the mechanism tends not to activate in the early stages due to insufficient past signal for evaluation, and in the later stages due to lack of strong instantaneous signal.)
	In this way, EmoNAVI’s Emotion Mechanism does not respond merely to raw thresholds, but instead performs cautious decision-making—reacting only to fluctuations that it has learned to trust.

	この一連の動作により学習時の過敏な反応を弛緩し不要なノイズ等を覚えないように制御します。
	つまりoptimizer本来の学習率やベクトルを直接的に制御せず、感情機構の変化に応じ安定したパラメータになるよう後から調整する、
	こういう流れになります。すべてを書き戻さずあくまで配合率に応じてブレンドするので学習の更新は止まらず進行は維持されます。

	This series of actions helps relax hypersensitive reactions during learning and prevents the optimizer from overfitting to unnecessary noise.
	Rather than directly manipulating the optimizer’s learning rate or update vectors, the system instead applies corrective blending afterward—adapting parameters in response to changes detected by the Emotion Mechanism.
	Because it blends adjustments based on a calculated ratio rather than fully overwriting parameter values, the learning process continues smoothly without interruption.

	### 感情機構の動作とスカラー変遷（学習フェーズ別の結果的挙動）

	\| フェーズ \| 状況（Loss変化） \| EMAの挙動 \| スカラーの変動傾向 \| Shadow混合の実動作 \| 感情機構としての意味ある挙動 \|
	\|----------\|-----------------------\|------------------------------------\|--------------------------\|--------------------------\|--------------------------------------------\|
	\| 序盤 \| 不安定・高め \| Shortは鋭敏、Longは未成熟 \| 大きく変動することもある \| ほとんど発動しない \| 判定に十分な履歴がなく、実質的に動作不可 \|
	\| 中盤 \| 徐々に収束傾向 \| 両EMAが意味ある差分を持つようになる \| 適度な振幅で安定推移 \| 条件付きで発動する \| 状態に応じてブレンド補正が有効に機能 \|
	\| 終盤 \| 収束・微振動 \| Short ≒ Long（差分がほぼ消失） \| 小さく収束 \| 発動しなくなる \| 静けさの合図：should_stop 条件が整う \|

	備考：
	- スカラー値は常に tanh(5 * (short - long)) で生成されます
	- 閾値：abs(scalar) > 0.3 で配合が始まり、> 0.6 で大きな混合比率（0.7以上）に
	- Shadow混合はパラメータそのものを書き戻すのではなく、部分的に配合して“追従”させる設計です
	- 感情スカラーの減衰＝学習の「静穏化」→ 終盤に向けて should_stop の発火条件が整います

	### Emotional Mechanism Behavior and Scalar Transitions (Outcome-Based Behavior by Learning Phase)

	\| Phase \| Loss Characteristics \| EMA Behavior \| Scalar Fluctuation Pattern \| Actual Shadow Blending \| Meaningful Behavior of Emotion Mechanism \|
	\|-----------\|----------------------------\|-------------------------------------------\|------------------------------------\|-------------------------------\|-------------------------------------------------------------------\|
	\| Early \| Unstable, High \| Short is reactive; Long is still immature \| May fluctuate sharply \| Rarely triggered \| Lacks sufficient history for decision-making; effectively inactive \|
	\| Middle \| Gradual Convergence \| EMA pair begins forming meaningful gaps \| Moderate oscillation, relatively stable \| Conditionally triggered \| Adaptive blending functions effectively based on state \|
	\| Late \| Converged, Micro-vibration \| Short ≈ Long (gap nearly vanishes) \| Narrow convergence \| No longer triggered \| Sign of stability; ready to trigger `should_stop` \|

	Notes:
	- The scalar value is always computed as tanh(5 × (short - long))
	- Thresholds:
	- If \|scalar\| > 0.3, blending is initiated
	- If \|scalar\| > 0.6, blending ratio becomes large (≥ 0.7)
	- Shadow blending does not overwrite parameters but applies partial integration for gradual alignment
	- Scalar decay corresponds to learning "quieting," preparing for should_stop condition in the final phas

	## 成果：
	前述の感情機構の調整により、過剰な反応を抑制しノイズ耐性を上げることで、ベクトルの乱れ等も抑え進行方向を正しい向きに調整します、
	正しいベクトルで進むことで学習は安定し収束へと最短で向かいます、感情機構による働きは学習後半のノイズ等を修正する仕上げを早くスムーズに完了できます。
	また学習率や勾配やさまざまなパラメーターを保持せずに"今"を観察するだけで更新され続けることで、
	途中終了、収束後の再学習、積層学習、等のときも現在値のみで学習継続を可能とします、
	これは既存のoptimizerのような過去値を保存する手間を省きつつも新しく得られた利点でもあります。
	## Results
	The adjustments introduced by the Emotion Mechanism suppress excessive reactions and enhance noise tolerance, thereby reducing vector fluctuations and helping align the learning direction more accurately.
	By following the correct vector, learning proceeds more stably and reaches convergence in minimal time.
	The role of the Emotion Mechanism becomes especially apparent in the latter stages of training, where it effectively and smoothly corrects residual noise and instability.
	Moreover, since the optimizer continuously updates its parameters by observing only the current state—without retaining learning rates, gradients, or other historical parameters—it supports learning continuation in scenarios such as mid-training interruptions, retraining after convergence, and stacked learning.
	This capability not only eliminates the need to store past values like traditional optimizers but also introduces a new level of flexibility and simplicity.

	## 結論：
	生物のもつニューロンが一定の閾値を超えて初めて信号を発火させるように、EmoNAVIでも"感情振幅"を検出し行動(shadow混合)を起こします。
	前述のとおり"感情機構"は一定閾値の超過時のみ動作します、ここはまさにニューロンスパイク的な動きといえるのではないでしょうか。
	EmoNAVIの持つ"感情機構"は、そうした生物的反応に似ており、技術的な制御と生理的直感の融合点だろうと思います。
	## Conclusion
	Just as biological neurons fire only when a certain threshold is exceeded, EmoNAVI detects "emotional amplitude" and triggers an action—specifically, shadow blending.
	As described earlier, the Emotion Mechanism activates only when this amplitude crosses a predefined threshold.
	This behavior closely resembles neuronal spiking and can be seen as a biologically inspired response.
	We believe that EmoNAVI’s Emotion Mechanism represents a unique fusion of technical control and physiological intuition—bringing together algorithmic design and life-like reactivity.

	## 展開：
	この"感情機構"の仕組みはVAE等を含むoptimizer以外にも簡単に応用可能だろうと思います、
	それらの発展に少しでも寄与することができれば、AIとの未来を想像して、これほど嬉しいことはありません。
	ぜひこの"感情機構"を応用しAIの発展への道筋を共に歩んでください。
	## Expansion
	The Emotion Mechanism described here is highly adaptable and can be easily applied beyond optimizers—including use cases such as variational autoencoders (VAEs) and other architectures.
	If this approach can contribute, even in a small way, to the advancement of such systems, we would be honored to be part of imagining a future together with AI.
	We warmly invite you to explore the application of this Emotion Mechanism and walk alongside us on the path toward advancing intelligent systems.

	## 技術：
	EMAベースのスカラー判断とshadow混合の構造
	## Technology
	Structure of EMA-Based Scalar Evaluation and Shadow Blending
	```
	+------------+ +------------+
	\| Loss(t) \| \| Loss(t) \|
	+-----+------+ +-----+------+
	\| \|
	┌─────────▼─────────┐ ┌─────────▼─────────┐
	│ Short EMA │ │ Long EMA │
	│ (weight = 0.3) │ │ (weight = 0.01) │
	└─────────┬─────────┘ └─────────┬─────────┘
	│ │
	└────────────┬────────────────┘
	▼
	+-------------------+
	\| 差分 (short - long) \|
	+-------------------+
	│
	▼
	+------------------+
	\| tanh(5 × diff) \| ← 感情スカラー生成
	+--------+---------+
	│
	[ if \|scalar\| > threshold ] 判定
	│
	+--------▼--------+
	\| Shadow比率決定 \|
	+--------+--------+
	│
	+--------▼--------+
	\| Shadow混合補正 \| ← 過去情報を追従的にブレンド
	+------------------+
	```


	## 付録：
	EmoNAVIのグラフへのリンク
	Measured with LR of 1e-4 ／それぞれ 1e-4 のLRにて測定
	https://github.com/muooon/EmoNavi/blob/main/emonavi-test00.png
	https://github.com/muooon/EmoNavi/blob/main/emonavi-test01.png
	https://github.com/muooon/EmoNavi/blob/main/emonavi-test02.png

	## 経緯：
	現状の強化学習などを見ていていくつかの疑問に出会いました、
	日本の著名な漫画家、手塚治虫氏の描いた未来社会、それに憧れ羨望した少年時代を思い返すと、
	人類のパートナーになるべきAIについて他のアプローチを模索したくなりました、
	今回の提案はそのアプローチによるひとつの結果です
	## Background
	While observing the current state of reinforcement learning and related fields, I encountered several fundamental questions.
	Reflecting on my childhood—when I admired and longed for the future societies envisioned by the legendary Japanese manga artist Osamu Tezuka—
	I felt compelled to explore alternative approaches to how AI might serve as a true partner to humanity.
	This proposal represents one such result born from that aspiration.

	## 謝意： Acknowledgements
	これまでAIの発展に寄与されたすべての方、これから貢献するすべての方へ感謝します、
	このプロジェクト完成を支え続けてくれた Copilotさんに、ありがとう。

	We extend our heartfelt gratitude to all those who have contributed—and will continue to contribute—to the advancement of AI.
	Special thanks to Copilot for its unwavering support throughout the completion of this project.

	## 資料： Source:
	擬人化EMA Narrative Form of the Emotion Mechanism
	(eng) https://github.com/muooon/EmoNavi/blob/main/emonavi-inner-workings(ENG).txt
	(jpn) https://github.com/muooon/EmoNavi/blob/main/emonavi-inner-workings(JPN).txt